A Comparative Study On Machine Learning Techniques Using Titanic Dataset

This document presents a comparative study on 14 machine learning techniques applied to the Titanic dataset to analyze survival likelihood and feature correlations. The techniques included logistic regression, k-nearest neighbors, naive Bayes, support vector machines, decision trees, bagging, AdaBoost, extra trees, random forest, gradient boosting, calibrated gradient boosting, artificial neural networks, and two voting ensembling techniques. The study found that gradient boosting and one voting technique achieved higher F-measure rates than those obtained from Kaggle, demonstrating more successful predictive performance.

Uploaded by

fitoj aka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

182 views6 pages

A Comparative Study On Machine Learning Techniques Using Titanic Dataset

Uploaded by

fitoj aka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

A Comparative Study on Machine Learning

Techniques using Titanic Dataset

Ekin Ekinci*, Sevinç İlhan Omurca*, Neytullah Acun*
*
Kocaeli University, Faculty of Engineering, Computer Engineering Department
Umuttepe Campus, Kocaeli, Turkey
{ekin.ekinci, silhan}@.edu.tr, [email protected]

Abstract— The Titanic disaster resulting in the sinking of the He reported performance metrics across different cases
British passenger ship with the loss of 722 passengers and crew comparison and concluded that, the maximum accuracy
occurred in the North Atlantic on April 15, 1912. Although it has obtained from Multiple Linear Regression is 78.426%; the
been many years since this maritime disaster took place, research maximum accuracy obtained from Logistic Regression is
on understanding what impacts individual’s survival or death
80.756%.
has been attracting researchers' attention. In this study, we
propose to apply fourteen different machine learning techniques, Datla [2] compared the results of Decision tree and
including Logistic Regression (LR), k-Nearest Neighbors (kNN), Random Forests algorithms for Titanic dataset. Decision tree
Naïve Bayes (NB), Support Vector Machines, Decision Tree, is resulted 0.84% correctly classified instances, while Random
Bagging, AdaBoost, Extra Trees, Random Forest (RF), Gradient Forests resulted 0.81%. As the feature engineering steps, they
Boosting (GB), Calibrated GB, Artificial Neural Networks created new variables such as “survived”, “child”, “new_fare”,
(ANN), Voting (GB, ANN, kNN) and, Voting (GB, RF, NB, LR, “title”, “Familysize”, “FamilyIdentity” which are not included
kNN) to Titanic dataset, which is publicly available, to analyze in feature list of Titanic dataset and also replaced a missing
likelihood of survival and learn what features have a correlation value by the mean value of a given feature.
towards survival of passengers and crew. Also, obtained F-
There are several studies in the literature that compared
measure score from machine learning techniques are compared
with each other and the F-measure score which is obtained different classification algorithms on multiple dataset. Meyer
Kaggle. As a result of this study, more successful F-measure rates et al., [3] compared SVM implementation to 16 classification
have been obtained with GB and Voting than Kaggle. algorithms and for titanic dataset they achieved %20.81
and %21.27 error rates with neural networks and SVM
Keywords— Machine learning, classification, data analysis, respectively as minimum errors. Ratsch et al. [4] compared
Titanic, Kaggle Adaboost classifiers to SVM and RBF classifiers. For titanic
dataset, %22.4 error rate is obtained from SVM as the
I. INTRODUCTION minimum error rate. Li et al. [5] used SVM as a component
The inevitable development of technology has both classifier for Adaboost. They used titanic dataset as one of the
facilitated our life and brought some difficulties with it. One experimental data and the minimum error rate they obtained
of the benefits brought by the technology is that a wide range is %21.8.
of data can be obtained easily when requested. However, it is The rest of the paper is organized as follows: Section 2
not always possible to acquire the right information. Raw data presents the techniques employed in experimental studies.
that is easily accessed from the internet sources alone does not Experimental setup and results are given in sections 3. Section
make sense and it should be processed to serve an information 4 concludes the paper with a discussion.
retrieval system. In this regard, feature engineering methods
and machine learning algorithms are plays an important role II. METHODOLOGY
in this process.
A. Logistic Regression
The aim of this study is to get as reliable results as possible
from the raw and missing data by using machine learning and LR is one of the most popular methods used to classify
feature engineering methods. Therefore one of the most binary data. LR is based on the assumption that the value of
popular datasets in data science, Titanic is used. This dataset dependent variable is predicted by using independent
records various features of passengers on the Titanic, variables. In the model, Y is the dependent variable we are
including who survived and who didn't. It is realized that trying to predict by observing X which is the input or set of
some missing and uncorrelated features decreased the the independent variables ( , … , ) . The value of Y that
performance of prediction. For a detailed data analysis, the corresponds to the people as either survived (Y=1) or not
effect of the features has been investigated. Thus some new survived (Y=-1) and is summarized by (X=x). From this
features are added to the dataset and some existing features definition, the conditional probability follows a logistic
are removed from the dataset. distribution given by ( = 1| = ). This function called as
Chatterjee [1] applied multiple logistic regression and regression function we need to predict Y.
logistic regression to check whether a passenger is survived.
B. K Nearest Neighbors high accuracy, good generalization performance and reduce in
kNN is one of the most common, simplest and non- variance and bias are achieved.
parametric classification algorithms when there is little or no G. AdaBoost
prior knowledge about the distribution of the data. Using the
distance metrics to measure the closeness between training AdaBoost is one of the most used and effective ensemble
samples and the test sample, kNN assigns the test sample with learning methods. The base notion behind AdaBoost is that a
class of its k nearest training samples. In terms of closeness, strong classifier can be created by linearly combining a
the kNN is mostly based on the Euclidean distance. The number of weak classifiers [9]. In the training process
Euclidean distance between training sample = AdaBoost increases the weights of misclassified data points
( , ,…, ) with N features, test sample = while is decreasing weights of correctly classified data points.
( , ,…, ) with N features and = 2 is That is, AdaBoost reweights all training data in its every
iteration. Weak classifiers are applied in serially then
generated classification models are combined according to
( , ) = (∑ ( − ) ) /
. (1)
weighted majority voting.
When m = 1 the distance is called as Manhattan and > H. Extra Trees
2 the distance called as Minkowski.
Extra tree (The Extremely Randomized Decision Tree) is a
C. Naïve Bayes decision tree ensemble classification method. Extra Trees are
based on the randomization. For each node of the tree splitting
NB, which is known as effective inductive learning
rules are randomly drawn then the best performing rule based
algorithm, achieves efficient and fast classification in machine
on a score is associated with that node [10]. For each tree that
learning applications. The algorithm is based on Bayes
composed extra trees whole dataset is used for training.
theorem assuming all features are independent given the value
of the class variable [6]. This is conditional independence İ. Random Forest
assumption and true in real world applications. Due to this
RF is a classification algorithm developed by Breiman and
assumption NB performs well on high dimensional and
Cutler that uses an ensemble of tree predictors [11]. It is one
complex datasets.
of the most accurate learning algorithms and for many
D. Support Vector Machines datasets; it achieves a highly accurate classifier. In RF, each
tree is constructed by bootstrapping the training data and for
SVM, which was developed by Vapnik in 1995, is based on
each split randomly selected subset of features are used [12].
principle of structural risk minimization that exhibits good
Splitting is made based on purity measure. This classification
generalization performance. With SVM, finding an optimal
method estimates missing data and large proportion of the
separating hyperplane between classes by focusing on the
data are missing it still maintains accuracy.
support vectors is proposed [7]. This hyperplane separates the
training data by a maximal margin. SVM solves nonlinear J. Gradient Boosting
problems by mapping the data points into a high-dimensional
GB was developed by Friedman (2001) is a powerful
space.
machine learning algorithm that has shown considerable
E. Decision Tree success in a wide range of real world applications. GB handles
boosting as a method for function estimation, in terms of
Decision trees with their fairly simple structure to create are
numerical optimization in function space [13].
one of the most used classifiers. A decision tree is a tree
structured model with decision nodes and prediction nodes. K. Artificial Neural Networks
Decision nodes are used to branch and prediction nodes
Multilayer perceptron (MLP) is a kind of ANN has ability
specify class labels. C4.5 is a kind of decision tree algorithm
to solve nonlinear classification problems with high accuracy
builds a decision tree from training data by using the
and good generalization performance. The MLP has been
information gain. When building decision trees C4.5 uses
applied to a wide variety of tasks such as feature selection,
divide and conquer approach.
pattern recognition, optimization and so on. A MLP can be
F. Bagging considered as a directed graph in which artificial neurons are
presented with nodes and directed and weighted edges
Bagging is one of the oldest and easiest techniques for
connects nodes to each other [14]. Nodes are organized into
creating an ensemble of classifiers, improves accuracy by
layers: an input layer, one or more hidden layers and an output
resampling of the training set [8]. The fundamental
layer. MLP uses backpropagation to classify data points and
assumption behind the bagging is to use multiple training sets
by using backpropagation error is propagated in backward
instead of using a single one to prevent results which depend
direction to adjust weights.
on a training set. A base single classifier is applied in parallel
to generated training sets then generated classification models
are combined according to majority voting. With bagging,
L. Voting
For obtaining accurate classification results, a bunch of
classifiers are assembled for artificial and real-world datasets.
Voting is the simplest method which combines predictions
from multiple classifiers and made a single contribution. [15].
While majority voting is resulting with class with the most
votes, weighted voting makes a weighted linear combination
of classifiers and decides class with the highest aggregate.

III. EXPERIMENTS

A. Dataset
Titanic: Machine Learning from Disaster competition
dataset [16] was provided by Kaggle. The Titanic dataset
consist of a training set that includes 891 passengers and a test
set that includes 418 passengers which are different from the
Fig. 1 Distribution of sex feature.
passengers in training set. A description of the features is
given in Table I. 2) Embarked: When we consider the distribution of the
TABLE I “Embarked” feature, there are 644, 168, 77 passengers
NUMBER OF FEATURES IN THE DATASET boarding from the port “S”, “C” and “Q” on the ship
respectively. The survival rates of passengers boarding from
Feature Value of Feature Feature Characteristic
these ports are given in Fig. 2. When this figure is analyzed, C
PassengerId 1-891 Integer
is the port with the highest survival rate of 55%. Thus, this can
Survived 0,1 Integer
Pclass 1-3 Integer be interpreted like “embarked” feature gives important clues
Name of about survival.
Name Object
passengers
Sex Male, female Object
Age 0-80 Real
SibSp 0-8 Integer
Parch 0-6 Integer
Ticket Ticket number Object
Fare 0-512 Real
Cabin Cabin number Object
Embarked S, C, Q Object

While the features such as PassengerId, Survived, Pclass,

Age, SibSp, Parch and Fare are numeric values, Name, Sex
and Embarked can take nominal values; the features such as
Ticket, Cabin can take numeric and nominal values.
For a detailed feature engineering we first analyzed the
features.
1) Sex: When we consider the distribution of the “Sex” Fig. 2 Distribution of Embarked feature.
feature, there are 314 female and 577 male passengers. 233 of 3) Pclass: “Pclass” feature describes three different classes
female passengers have been rescued and others have lost of passengers. There are 216 passengers belong to the class 1,
their lives. On the other hand, 109 of male passengers have 184 passengers in class2 and finally 491 passengers in class 3.
been rescued and others have lost their lives. If we analyze The survival rates of passengers due to “Pclass” feature are
these distributions it is realized that the survival rate of given in Fig. 3. The passengers with the highest survival rates
women is higher than that of men. It has been concluded that are the first class passengers with 63%. This ratio also shows
the effect of this feature on predicting the class label is that wealthy people are alive.
significant.
figure. When the correlation scores are evaluated it is
observed that the correlation between “Survived” and “Sex” is
highest while the correlation between “Survived” and “Age”
is the minimum. Apart from that, “Sibsp” and “Parch”
features are correlated by 0.41. Accordingly by combining
these two features a new feature can be created.

Fig. 3 Distribution of Pclass feature.

4) Age: When the “age” feature” is considered it is seen that,

the age of passengers are range from 0 to 80. If we group the
passengers by specific age ranges such as 0-13, 14-60 and 61-
Fig. 5 Correlation between data.
80 then we realized that most of the passengers in the 0-13 age
group are survived and a large majority of passengers in the 7) Family size: In machine learning applications, features
age group 61-80 lost their lives. This statistical information extension methods as well as feature reduction methods can
proves that the first children were rescued when the ship also improve the classification performance. In this study, a
started to sink. feature named “Family_size” is created in addition to the
existing features. This feature is calculated by adding the
value of Sibsp feature to the value of Parch feature. After that,
we have distinguished this feature with two groups. In the
first group consist of passengers whose family_size is 0, 4, 5,
6, 7 or 10 and in the second group there are passengers whose
family_size is 1, 2 or 3. It is observed that most of the first
group lost their lives and the majority of the second group is
survived. These results show that the number of family
members strengthens the possibility of survival.

Fig. 4 Distribution of Age feature.

5) Fare: The “fare” feature specifies the fare paid by the

passenger and it changes between 0-512. If we distinguish this
feature with two groups as 0-90 and 91-512, then it is seen
that most of the passengers who paid between 0 and 90 lost
their lives and the majority of the passengers who paid
between 91and 512 survived.
Fig. 6 Distribution of Family_size feature
6) Correlation between data: In classification task of
machine learning, the correlation which is often used as a B. Preprocessing Steps
preliminary technique to discover relationships between
variables can be a key to improve the accuracy of a prediction In this study data cleaning, data integration, data
model. In classification models, the positive or negative transformation are applied as preprocessing steps. The missing
correlation between feature values can be used to discover values of Age and Fare features are filled by meadian values
which ways the independent features influence intuitive of these features. The missing values of “Embarked” feature
forecasting. The correlation between features of Titanic are filled by “C” value. The PassengerId, Name, Ticket and
dataset is shown in Fig. 5. Due to the embarked and sex Cabin features are removed from the feature set.
features have nominal values they are not included in this
C. Experimental Results Support Vector
Machines 0.787 0.71 0.766
All algorithms are run in order to analyze likelihood of
survival and learn what features have a correlation towards IV. CONCLUSIONS
survival of passengers and crew. When applying algorithms to
Obtaining valuable results from the raw and missing data
Titanic dataset, we have seen that to make the algorithm
by using machine learning and feature engineering methods is
accurate, some more adjustments on some model parameters
very important for knowledge-based world. In this paper, we
are required.
have proposed models for predicting whether a person
Logistic regression is applied with a penalty term which is
survived the Titanic disaster or not. First, a detailed data
decided as “l2”. For kNN, number of neighbors is selected as
analysis is conducted to investigate features that have
8 and Minkowski is selected as distance measure. Naïve
correlation or are non-informative. And as a preprocessing
Bayes algorithm is used based on Bernoulli distribution. In
step some new features are added to dataset such as
SVM, the penalty parameter is important to control level of
family_size and some of them are excluded such as name,
misclassification and determined as 3. Gini is used as impurity
ticket and cabin. Secondly, in classification step 14 different
measure in C4.5 decision tree. Apart from that maximum
machine learning algorithms are used for classifying the
depth is a parameter that makes the search space finite and
dataset formed in preprocessing step.
also prevents decision tree from growing to an extremely large
The proposed model can predict the survival of passengers
size and decided as 25. In Bagging, maximum bag size is
and crew with 0.82 F-measure score with Voting (GB, ANN,
determined as 2.6%. Decision tree is used as the base
kNN).
estimator in bagging and adaboost. Number of trees in the
As a conclusion, this paper presents a comparative study on
forest is selected 15 and 200 for Extra Trees and Random
machine learning techniques to analyze Titanic dataset to
Forest classifiers respectively. In Gradient Boosting, Logistic
learn what features effect the classification results and which
Regression is used as loss function to be optimized. Gradient
techniques are robust.
Boosting is calibrated with sigmoid function in Calibrated
Gradient Boosting. MLP with backpropogation is used as REFERENCES
Artificial Neural Networks. In the first voting algorithm, GB, [1] T. Chatterjee, “Prediction of Survivors in Titanic Dataset: A
ANN and kNN are voted, in the second GB, RF, NB, LR and Comparative Study using Machine Learning Algorithms,”
KNN are voted. International Journal of Emerging Research in Management
Algorithms are evaluated according to accuracy and F- &Technology, vol. 6, pp. 1-5, June 2017.
[2] M. V. Datla, “Bench Marking of Classification Algorithms: Decision
measure. We compare our F-measure scores with F-measure Trees and Random Forests – A Case Study using R,” in Proc. I-TACT-
scores obtained from Kaggle. The performances of the 15, 2015, pp. 1-7.
algorithms are listed in Table II. It is observed that the best [3] D. Meyer, F. Leisch and K. Hornik, “The support vector machine
performance is provided with Voting (GB, ANN, kNN) with under test,” Neurocomputing, vol. 55, pp. 169-186, Sept. 2003.
[4] G. Rätsch, T. Onoda, and K.-R. Müller, “Soft Margins for AdaBoost,”
F-measure score of 0.82. Compared with Kaggle, more Machine Learning,vol. 42, pp. 287-320, Mar. 2001.
successful F-measure rates have been obtained with GB and [5] X. Li, L. Wang, and E. Sung, “AdaBoost with SVM-based component
Voting. With Calibration we expect to see better results but classifiers,” Engineering Applications of Artificial Intelligence, vol. 21,
our calibration doesn’t yield increase in F-measure score pp. 785-795, Aug. 2008.
[6] S. İlhan Omurca and E. Ekinci, “An alternative evaluation of post
while Kaggle yields. traumatic stress disorder with machine learning methods,” in Proc.
TABLE II INISTA 2015, 2015, pp. 1-7.
COMPARISON OF ACCURACY, F-MEASURE AND KAGGLE SCORES OF [7] C. Cortes, and V. Vapnik, “Suppoprt-Vector Networks,” Machine
ALGORITHMS Learning, vol. 20, pp. 273-297, 1995.
[8] G. Liang, X. Zhu, and C. Zhang, “An empirical stıdy of bagging
Accuracy F- predictors for different learning algorithms,” in Proc. AAAI'11, 2011,
Algorithm Kaggle pp. 1802-1803.
measure
Voting (GB, ANN, [9] Y. Ma, X. Ding, Z. Wang and N. Wang, “Robust prcise eye loation
under probabilistic framework,” in Proc. Sixth IEEE International
kNN) 0.869 0.82 0.794
Conference on Automatic Face and Gesture Recognition, 2004, pp.
Gradient Boosting 0.869 0.815 0.789 339-344.
Calibrated (GB) 0.866 0.81 0.813 [10] C. Desir, C. Petitjean, L. Heutte, M. Salaun and L. Thiberville,
Voting (GB, RF,NB,LR “Classification of Endomicroscopic Images of the Lung Based on
kNN) 0.851 0.79 0.789 Random Subwindows and Extra-Trees,” IEEE Transactions on
Random Forest 0.848 0.781 0.789 Biomedical Engineering, vol. 59, pp. 2677-2683, Sep. 2012.
[11] L. Breiman, “Random Forests,” Machine Learning, vol. 45, pp.5-32,
Artificial Neural
2001.
Networks 0.813 0.743 0.766 [12] R. Diaz-Uriarte and S. Alvadez de Andres, “Gene selection and
AdaBoost 0.814 0.741 0.78 classification of microarray data using random forest,” BMC
Decission Tree 0.817 0.738 0.789 Bioinformatics, vol. 7, p. 3, Jan. 2006.
Bagging 0.806 0.731 0.775 [13] S. B. Taieb and R. J. Hyndman, “A gradient boosting approach to the
Logistic Regression 0.802 0.728 0.766 Kaggle load forecasting competition,” International Journal of
Forecasting, vol. 30, pp. 382-394, Apr. 2014.
Naive Bayes 0.789 0.714 0.762
[14] I. Maglogiannis, K. Karpouzis, B. A. Wallace, and J. Soldatos, Eds.,
Extra Trees 0.815 0.713 0.785 Supervised Machine Learning: A Review of Classification Techniques,
k Nearest Neighbors 0.802 0.712 0.665
ser. Emerging Artificial Intelligence Applications in Computer
Engineering. IOS Press, 2007, vol. 160.
[15] E. Bauer and R. Kohavi, “An Empirical Comparison of Voting
Classification Algorithms: Bagging, Boosting, and Variants,” Machine
Learning, vol. 36, pp. 105-139, 1999.
[16] (2018) The Kaggle website. [Online] Available:
http://www.kaggle.com/

Titanic Survival Analysis
No ratings yet
Titanic Survival Analysis
61 pages
Titanic: Machine Learning From Disaster: Source
No ratings yet
Titanic: Machine Learning From Disaster: Source
1 page
Predictive Modeling of Titanic Survivors
No ratings yet
Predictive Modeling of Titanic Survivors
12 pages
Titanic Disaster Using Machine Learning
No ratings yet
Titanic Disaster Using Machine Learning
7 pages
Thesis Slide
No ratings yet
Thesis Slide
24 pages
Acknowledgement
No ratings yet
Acknowledgement
24 pages
Using Titanic Dataset For Comprehensive Machine Learning Model Training
No ratings yet
Using Titanic Dataset For Comprehensive Machine Learning Model Training
3 pages
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
No ratings yet
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
5 pages
Titanic Survival Prediction Using ML Miniproject
No ratings yet
Titanic Survival Prediction Using ML Miniproject
21 pages
CEP Final
No ratings yet
CEP Final
11 pages
Report TSP
No ratings yet
Report TSP
13 pages
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
No ratings yet
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
7 pages
ML Mini Project 2
No ratings yet
ML Mini Project 2
26 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Titanic
No ratings yet
Titanic
7 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
34 pages
Titanic Miniproject
No ratings yet
Titanic Miniproject
1 page
Titanic
No ratings yet
Titanic
3 pages
Titanic
No ratings yet
Titanic
3 pages
LamTang TitanicMachineLearningFromDisaster
No ratings yet
LamTang TitanicMachineLearningFromDisaster
5 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
Rouse Final
No ratings yet
Rouse Final
8 pages
MCA - Project Documentation Guidelines 2024-2025
No ratings yet
MCA - Project Documentation Guidelines 2024-2025
26 pages
DAL Assignment 2 Endsem
No ratings yet
DAL Assignment 2 Endsem
8 pages
Titanic Report ML Report
No ratings yet
Titanic Report ML Report
14 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
Titanic ML Review Paper
No ratings yet
Titanic ML Review Paper
2 pages
ML Report-1
No ratings yet
ML Report-1
13 pages
A Mathematical Essay On Logistic Regression: Awik Dhar
No ratings yet
A Mathematical Essay On Logistic Regression: Awik Dhar
4 pages
Machine Learnig - Mini Project
No ratings yet
Machine Learnig - Mini Project
5 pages
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Neural Network Project
No ratings yet
Neural Network Project
4 pages
Iml Project
No ratings yet
Iml Project
13 pages
Titanic Machine Learning From Disaster: M.A.D.-Python Team: Dylan Kenny, Matthew Kiggans, Aleksandr Smirnov
No ratings yet
Titanic Machine Learning From Disaster: M.A.D.-Python Team: Dylan Kenny, Matthew Kiggans, Aleksandr Smirnov
11 pages
Maneesha Nidigonda Minor Project .Ipynb
No ratings yet
Maneesha Nidigonda Minor Project .Ipynb
35 pages
M1 - 4Mlsp - Machine Learning: Project: Binary Classification Webapp
No ratings yet
M1 - 4Mlsp - Machine Learning: Project: Binary Classification Webapp
2 pages
Coding Titanicmain
No ratings yet
Coding Titanicmain
58 pages
Titanic Classification Project
No ratings yet
Titanic Classification Project
17 pages
A Detailed Analysis of The Supervised Machine Learning Algorithms
No ratings yet
A Detailed Analysis of The Supervised Machine Learning Algorithms
5 pages
Mini Project ml111
No ratings yet
Mini Project ml111
2 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
7 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Individual Asignment Ucs551
70% (10)
Individual Asignment Ucs551
15 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Titanic PuneethRegonda
No ratings yet
Titanic PuneethRegonda
8 pages
Titanic Minor Synposis
No ratings yet
Titanic Minor Synposis
2 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
Ipl Matches Documentation
No ratings yet
Ipl Matches Documentation
28 pages
ML Aniket
No ratings yet
ML Aniket
18 pages
Titanic
No ratings yet
Titanic
6 pages
Machine Learning
100% (1)
Machine Learning
62 pages
Titanic Prediction
No ratings yet
Titanic Prediction
53 pages
Titanic
No ratings yet
Titanic
12 pages
Titanic Classification Project
No ratings yet
Titanic Classification Project
17 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Machine Learning With Python (Vasavi)
No ratings yet
Machine Learning With Python (Vasavi)
20 pages
ML Mini Project
No ratings yet
ML Mini Project
17 pages
Titanic Survival
No ratings yet
Titanic Survival
13 pages
ML Mini Project - Docx New (A)
No ratings yet
ML Mini Project - Docx New (A)
17 pages
Network Secutiy Planning Achitecture-MIT
No ratings yet
Network Secutiy Planning Achitecture-MIT
96 pages
Predicting Titanic Survivors Using Artificial Neural Network
No ratings yet
Predicting Titanic Survivors Using Artificial Neural Network
6 pages
Titanic Survivals Artificial Neural Network
No ratings yet
Titanic Survivals Artificial Neural Network
6 pages
Network Security For Virutal Machine in Cloud Computing
No ratings yet
Network Security For Virutal Machine in Cloud Computing
4 pages
Cryptography and Network Security 2010
No ratings yet
Cryptography and Network Security 2010
4 pages
Assignment DataSet2b
No ratings yet
Assignment DataSet2b
2 pages
Digital Twin - Old Wine in A New Bottle
No ratings yet
Digital Twin - Old Wine in A New Bottle
20 pages
4 - Foundations of Technical Analysis Computational Algorithms, Statistical Inference, and Empirical Implementation
No ratings yet
4 - Foundations of Technical Analysis Computational Algorithms, Statistical Inference, and Empirical Implementation
62 pages
Ayanendranath Basu: Interdisciplinary Statistical Research Unit (ISRU) Indian Statistical Institute Kolkata
No ratings yet
Ayanendranath Basu: Interdisciplinary Statistical Research Unit (ISRU) Indian Statistical Institute Kolkata
34 pages
Formula Sheet Biostatistics
No ratings yet
Formula Sheet Biostatistics
2 pages
Qed WP 1456
No ratings yet
Qed WP 1456
58 pages
Grade 05 eNAT Class
No ratings yet
Grade 05 eNAT Class
47 pages
g7 Research Module 56 q2.pdf Edited
No ratings yet
g7 Research Module 56 q2.pdf Edited
8 pages
Final Push Trig & Stats
No ratings yet
Final Push Trig & Stats
24 pages
Stat For MGT II New (1) - 1
No ratings yet
Stat For MGT II New (1) - 1
67 pages
ANN and ANFIS Performance Prediction Models For Hydraulic Impact Hammers
No ratings yet
ANN and ANFIS Performance Prediction Models For Hydraulic Impact Hammers
7 pages
Research 7 Q4 Module Answer Keys
No ratings yet
Research 7 Q4 Module Answer Keys
4 pages
Stop Oversampling For Class Imbalance Learning - A Review (OJO) - AHMAD S. TARAWNEH, AHMAD B. HASSANAT, GHADA AWAD ALTARAWNEH, ABDULLAH ALMUHAIMEED
No ratings yet
Stop Oversampling For Class Imbalance Learning - A Review (OJO) - AHMAD S. TARAWNEH, AHMAD B. HASSANAT, GHADA AWAD ALTARAWNEH, ABDULLAH ALMUHAIMEED
18 pages
Unit Two Review - SPR 18 - KEY
No ratings yet
Unit Two Review - SPR 18 - KEY
7 pages
STAT 520 Forecasting and Time Series: Lecture Notes
No ratings yet
STAT 520 Forecasting and Time Series: Lecture Notes
311 pages
Practical Research 2 3Rd Summative Test (Endterm)
No ratings yet
Practical Research 2 3Rd Summative Test (Endterm)
4 pages
Group 11 Mini Study Presentation
No ratings yet
Group 11 Mini Study Presentation
46 pages
Statistical Process Control Study (Cp-Cpk-X-R-Chart)
No ratings yet
Statistical Process Control Study (Cp-Cpk-X-R-Chart)
1 page
Bhaskar BRM File
No ratings yet
Bhaskar BRM File
91 pages
Project Ikea
No ratings yet
Project Ikea
30 pages
BCA Full 3 Years
No ratings yet
BCA Full 3 Years
4 pages
Kotze Writing An Academic Journal Article
No ratings yet
Kotze Writing An Academic Journal Article
158 pages
Validation Evidence of The Motivation For Teaching Scale in Secondary Education
No ratings yet
Validation Evidence of The Motivation For Teaching Scale in Secondary Education
12 pages
How To Choose The Best Sampling Method
No ratings yet
How To Choose The Best Sampling Method
3 pages
Effect Sizes For Paired Data Should Use The Change Score Variability Rather Than The Pre-Test Variability
No ratings yet
Effect Sizes For Paired Data Should Use The Change Score Variability Rather Than The Pre-Test Variability
6 pages
Data Visualization On Melbourne Housing Dataset
No ratings yet
Data Visualization On Melbourne Housing Dataset
11 pages
Preferences On Listening To Music, Academic Performance and Stress Coping of The Grade 12 Abm Senior Highschool in Our Lady of Fatima University
100% (1)
Preferences On Listening To Music, Academic Performance and Stress Coping of The Grade 12 Abm Senior Highschool in Our Lady of Fatima University
33 pages
Unit 5 Descriptive Statistics Measures of Central Tendency
No ratings yet
Unit 5 Descriptive Statistics Measures of Central Tendency
6 pages
Quantitative Methods OUBS 027125 Revision Notes: Tutor: Ms Mushira Laloo
No ratings yet
Quantitative Methods OUBS 027125 Revision Notes: Tutor: Ms Mushira Laloo
12 pages
Simulation With Arena Chapter 2 - Fundamental Simulation Concepts Slide 1 of 46
No ratings yet
Simulation With Arena Chapter 2 - Fundamental Simulation Concepts Slide 1 of 46
24 pages

A Comparative Study On Machine Learning Techniques Using Titanic Dataset

Uploaded by

A Comparative Study On Machine Learning Techniques Using Titanic Dataset

Uploaded by

A Comparative Study on Machine Learning

Techniques using Titanic Dataset

While the features such as PassengerId, Survived, Pclass,

Fig. 3 Distribution of Pclass feature.

4) Age: When the “age” feature” is considered it is seen that,

Fig. 4 Distribution of Age feature.

5) Fare: The “fare” feature specifies the fare paid by the

You might also like