0% found this document useful (0 votes)

253 views5 pages

Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques

The document is the abstract of a paper presented at the 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) in 2023. The paper explores using machine learning techniques like logistic regression, random forest, and decision trees to analyze the Titanic passenger data and predict survival outcomes. Specifically, it uses 891 passenger records for model training and 418 for testing. The goal is to accurately predict survival based on passenger attributes like class, gender, age to help understand the root causes of the tragedy and avoid future crises.

Uploaded by

Qasim Raza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

253 views5 pages

Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques

Uploaded by

Qasim Raza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)

IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

Exploratory Data Analysis of Titanic Survival

Prediction using Machine Learning Techniques
Anshika Gupta Deepak Arora Shivam Tiwari
Dept. of Computer Science & Dept. of Computer Science & Dept. of Computer Science &
Engineering, ASET Lucknow, Engineering, ASET Lucknow, Engineering,
Amity University Uttar Pradesh, India Amity University Uttar Pradesh, India GL Bajaj Institute Technology &
[email protected] [email protected] Management,
2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) | 978-1-6654-5630-2/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICAAIC56838.2023.10141243

Greater Noida, India

[email protected]

Abstract— It is very important to find out the root causes of Titanic data set is used to analyze the survival of Titanic
human tragedies in the past so that future crises can be based on a statistical analysis of supervised machine
avoided. The incident of 15 April 1912 is an example of a learning techniques like Logistic Regression, Rando m
human tragedy in which around 1500 passengers and working Forest, Decision Tree, K- nearest neighbor, etc.
staff lost their lives. Continuous research in today's time shows
that if the proper statistical assessment was done then it would As per the record, different types of people were present
probably be possible that human devastation could be reduced. on the ship so all predict ions go around the types of
In today's era, many new and powerful technologies are people so that it can be assured maximu m survival.
available, with the help of which accurate statistical Before starting, to train the model it is impo rtant to pre-
calculations can be done. In this research study, Titanic process the data set in all aspects like missing values,
survivals have been studied based on machine learning similar formatting, Outliers, etc [2][3]. For better
techniques. In the study, out of the total entities, 891 entities understanding, a clear p icture of doing research work a
have been used for training and 418 entities have been used for flowchart is given in figure 2.
the test set and the comparative study of different machine
learning algorithms gives importance to this research study.

Keywords—Machine Learning, Prediction, Pattern

Recognition, S tatistical Analysis.

I. INT RODUCT ION

Machine learn ing, a powerfu l sub-domain of artificial
intelligence, evaluates important operations such as
prediction with the help of statistics or can say, it especially
uses mathematics to uncover hidden patterns in data sets.
Importantly, where t raditional programming requires logical
code, machine learn ing provides logic as output.[1] See
figure 1.

Figure 1: Difference between Traditional Programming and

M achine Learning Figure 2: Workflow of Data analysis

Fig1:In the conventional approach, the programmer

feeds the program with data and log ic and the p rogram Figure 2 describes the workflow of the Data analysis and
outputs. But, with mach ine learn ing, the programmer how the author reaches the conclusion. It is better to
provides the model with data and output, and the model calculate all the statistical approaches in an ordered manner
then outputs logic or program. In this research work, a

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 418

Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on September 14,2023 at 07:10:14 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

so that it can be easy to understand the accuracy in a very 5 2012 Predictive It proposed a predictive
authentic way. M odeling modelling approach using the
The scope of this paper is to exp lore and analy ze the Titanic Titanic dataset and offers a
comprehensive review of
dataset using machine learning techniques to predict the
related concepts and methods.
survival of passengers. The paper focuses on the application
6 2017 Predictive This GitHub repository contains
of supervised learning algorith ms such as logistic M odeling a predictive model for the Jack
regression, random forest, stochastic gradient descent, Dies competition, which is
decision tree, and k-nearest neighbor to classify the based on the Titanic dataset and
passengers into two categories, survived or not survived. serves as a similar benchmark
The study aims to co mpare the performance of these for machine learning.
algorith ms based on different evaluation metrics such as 7 2017 Predictive This study uses various
accuracy, F1-score, recall, and precision. Additionally, the M odeling machine learning algorithms to
analyze the Titanic disaster and
paper discusses the data preparation process, feature
identifies the most important
engineering, and data visualization to gain insights into the
factors for survival prediction.
dataset. The results obtained from this study can potentially 8 1995 Neural The paper investigates the
aid in imp roving the safety protocols and emergency Networks impact of sigmoid function
procedures of future maritime transport systems. parameters on the
backpropagation learning
algorithm in artificial neural
II. LIT ERAT URE REVIEW networks.
The Titanic dataset has been extensively used in various 9 2009 Decision This paper presents an
Trees implementation of the ID3
studies to explore the predictive modeling techniques,
decision tree learning algorithm
feature selection methods, and machine learning algorith ms. and provides a tutorial on how
The dataset has served as a benchmark for co mpetitions and to apply it to predictive
tutorials, and its analys is has identified the most significant modeling tasks.
factors for survival pred iction. The studies reviewed in this 10 2018 Predictive This study compares the
literature review demonstrate the versatility and importance M odeling performance of different
of the Titanic dataset in advancing the field of machine machine learning techniques on
learning and predictive modeling. Fro m the survey of data the Titanic dataset and
mining techniques to the investigation of sigmoid function identifies the most accurate
method for survival prediction.
parameters in artificial neural networks, the studies have
11 2014 SVM The paper proposes two
explored a wide range of concepts and methods relevant to methods for selecting Gaussian
machine learning and predictive modeling. kernel parameters in one-class
SVM and applies them to fault
Table 1: Summarized view of Literature Review detection in industrial systems.
12 2014 Predictive This study uses various
Method M odeling machine learning algorithms to
Ref. Year Assessment
Used classify Titanic passenger data
1 2019 Data Set Kaggle.com provides the and predict their chances of
Provider Titanic dataset and platform for survival in the disaster.
the M achine Learning from 13 2012 Predictive This website provides a tutorial
Disaster competition, which M odeling on predictive modeling using
serves as a popular benchmark the Titanic dataset and offers a
for predictive modeling. comprehensive review of
2 2013 Data M ining This paper provides a related concepts and methods.
comprehensive survey of data 14 2011 Sentiment The paper presents a sentiment
mining techniques, including Analysis analysis method for Twitter
supervised and unsupervised data and shows its effectiveness
learning, and their applications in predicting the polarity of
in various fields. tweets.
3 2007 Feature The paper proposes a spectral 15 2020 Predictive This study uses a machine
Selection feature selection method for M odeling learning approach to predict the
both supervised and prognosis of breast cancer
unsupervised learning tasks, patients and identifies the most
which can improve the accuracy important features for outcome
and efficiency of predictive prediction.
modeling.
4 2018 Predictive This study uses the Titanic
M odeling dataset to predict the survivors III. DESCRIPT ION OF DAT A AND EXPERIMENT AL SET UP
of the disaster and compares the
performance of various The dataset utilized in this study has been obtained fro m the
machine learning algorithms. Kaggle website. www.kaggle.co m. The data collection
contains two colu mns and eleven rows. Parch Ticket, P -
class, name, PassengerId, survived, sex, age, SibSp, fare,

cabin, and embarked are the colu mns. Along with features, is a more straightforward and efficient solution for b inary
our goal value is survival. The dataset categorizes family and linear classification problems.
ties in this manner. Sib lings include b lood, adopted, and
step-relatives (mistresses and fiancés were not considered B. Random Forest is a supervised learning. It can be used in
spouses) This is how the dataset defines familial t ies. Both mach ine learning to get solutions for both regression and
the mother and father are parents. Child refers to a son, classification models. "A Random Forest is a classifier that
daughter, stepdaughter, or stepbrother. Because some incorporates numerous decision trees on different subsets of
children only traveled with a babysitter, their parch value the supplied dataset and takes the average to enhance the
was zero. predicted accuracy of that dataset "[6] as the name indicates.
Before any form of data analytics is done to it, the Author İn titanic dataset output instead of relying on a single
needs to clean the dataset. some missing values are also in decision tree, the random forest collects predictions fro m
the dataset that should need to handle. Missing values in each tree and forecasts the ultimate result using the
variables like Embarked, Cabin, and age are filled in using a predictions that received the most votes.
random selection fro m the existing age. In this scenario, the
cabin column is removed and replaced with the mode value C. Stochastic Gradient Descent (SGD) the Author takes a
fro m the Embarked miss value column. With the column titanic dataset and approach that optimizes the gradient
mean, fill in the missing value in the age column. descent throughout each search once a random weight
vector is chosen. Grad ient descent is a method that searches
Data exploration and analysis
across a vast or infin ite hypothesis space where 1)
In the first stage, we will do an exploratory analysis of data hypotheses are constantly parameterized and 2) mistakes are
on our problem. The dataset is examined through differentiable dependent on the parameters. The weights are
exploratory data analysis to identify the characteristics that initialized in SGD by the given dataset (titanic dataset), and
impact the survival rate. By creating a correlation between the algorithm updates the weight vector with a single data
each attribute and survival, the data is thoroughly reviewed. point. When an error computation is finished, the gradient
fig.3 demonstrates how sex affects the survival rate. descent progressively updates it to enhance convergence.

250 233 D. Decision Tree is a method of supervised learning that

may be applied to tackle classification and regression
200 problems like t itanic dataset, however, it is most frequently
emp loyed to address classification concerns [7]. It has a
150 tree-like structure, with internal nodes standing in for
109 dataset properties, branches for decision-making processes,
100
and leaf nodes for results. The Decision Node and the Leaf
Node are the two nodes that make up a decision tree. While
Leaf nodes are the results of such decisions and have no
50
branches, Decision nodes are used to make decisions and
contain several branches. The tests or assessments are based
0
on the characteristics of the supplied dataset.
female male
E. K -nearest Neighbors (KNN) is a technique for supervised
learning that can be applied to both classification and
Figure 3: SUM of Survived by Sex regression. By calculating the distance between the test data
Fig. 3 shows females have a higher chance of survival than and all of the training points present in the given titanic
males, as illustrated in Fig. 6. The female and male survival dataset, KNN tries to predict the correct class for the test
rates were calcu lated to be 74.20382% and 18.89081%, data. Then decide which K places are most similar to the test
respectively. Other attributes, such as fare, cabin, tit le, data [8][9]. The KNN algorith m determines which class has
family, P-class, embarked, and surviving, have a similar the highest probability by calculating the chance that test
lin k. The title was created using the property’s name.' Sibps data will belong to each of the "K" train ing data classes. The
and Parch were united. We are now ab le to determine the value in a regression situation is the average of the 'K'
importance of every attribute in passenger survivability in selected training points.
this manner.

IV. MACHINE LEARNING MODELS V. M ODEL PERFORMANCE EVALUAT ION

Several machine learning algorith ms are used to validate A "confusion matrix" is used to test the model's accuracy. A
and predict survival. confusion matrix is a design table that tells us how well an
algorithm or model performs.
A. Logistic Regression is a classification model rather than
a regression model and is a basic classification model that A. Confusion Matrix: can be used to assess the classification
produces great results with linearly separable classes [4][5]. model's accuracy [10][11]. It calculates the precise nu mber
We use here the logistic regression model because our of accurate or inaccurate forecasts based on the data's actual
output is either survival or non-survival. Logistic regression outcome. M is the number of values, and M*M is the

matrix's order. The matrix data is widely used to evaluate

the execution of such models. Table 2. Performance M atrix Representation

B. Accuracy: Accuracy is also a parameter used to evaluate Algorithm Accuracy F1- Recall Precision
S core
classification models [12][13]. The percentage of accurate
Logistic 78% 0.78 0.78 0.79
predictions done by our model is known as accuracy. The
Regression
official defin ition of accuracy is as follows: The fo llo wing Random 82% 0.82 0.81 0.82
illustrates how accuracy in binary classificat ion can also be Forest
quantified in form of positives and negatives: S tochastic 58% 0.45 0.58 0.63
Gradient
Where TP stands for True Positives (TP), TN for True Descent
Negatives (TN), FP for False Positives (FP), and FN fo r Decision 79% 0.79 0.79 0.79
False Negatives (FN). For examp le, if the X test set had 100 Tree
images and our model accurately pred icted 80 of them, we K-nearest 66% 0.64 0.66 0.67
neighbor
would get a score of 80/ 100. 0.8, or 80% precision. True
Positives + False Positives/ (True Positives + False
Negative+ True Negatives + False Negative) is the machine It is made very apparent that when using a different feature
learning accuracy formula. modeling approach, the models' accuracy may change. The
ideal models for classification problems are Random Forest
C. Recall A model's ability to locate all applicable cases and Decision Tree since they provide a high level o f
within a dataset. The number of true positives dividing by accuracy mentioned. The results of our experiment, as
the number of true positives adds to the number of false shown in Figure 4, demonstrate the performance of var ious
negatives is the definit ion of recall[14][15]. In machine mach ine learn ing algorith ms used for the prediction of
learning, the recalled formu la is True Positives / (True Titanic survival. We have evaluated the performance of the
Positives + False Negatives). algorith ms using accuracy, F1-score, recall, and precision.
Random Forest algorith m performed the best with an
D. Precision: The classification algorith m's ability to detect accuracy of 82%, an F1-score of 0.82, recall of 0.81, and
relevant data. Precision can be calculated by dividing the precision of 0.82. Log istic Regression and Decision Tree
number o f true positives by the number of t rue positives + algorith ms also performed well with an accuracy of 78%
the number of false positives. Machine learning precision and 79%, respectively. Ho wever, the Stochastic Gradient
formula = True Positives / (True Positives + False Positives) Descent algorithm showed poor performance with an
accuracy of only 58%. The K-nearest neighbor algorith m
performed moderately with an accuracy of 66%. These
E. F1-Score: When attempting to determine the ideal results indicate that the Random Forest algorithm is the
accuracy and recall ratio, we can use the F1 score to most suitable for predict ing the survival of Titanic
combine the two criteria. In machine learning, the F1 score passengers using machine learning techniques.
formula is 2*(Precision *Recall) / (Precision +Recall).
90%
VI. RESULT AND CONCLUSION
80%
The first step in conducting data analysis is data cleaning.
Analyzing exp loratory data makes co mprehending the 70%
dataset and the relationships between the features easier.
Utilizing several graphic techniques. The one used above 60%
uses histograms and ggplot. A few inferences are made and
facts are discovered by using exploratory data analysis. 50%
Based on the exploratory data analysis technique, the 40%
precise parameters for build ing the training and prediction
model are identified in feature engineering. Machine 30%
learning models pred ict the worth of passengers who
survived. To make predict ions in classificat ion problems 20%
Random Forest technique is used. With an accuracy of
10%
0.827261504, Recall of 0.813453456, F1-score of
0.8237261504, and precision of.827261504 according to the 0%
confusion matrix, Random Forest emerges as the most Logistic Random Stochastic Decision K-nearest
accurate model. This indicates that Random Fo rest has a Regression Forest Gradient Tree neighbor
very high level of prediction ability in this dataset using the Descent
selected features. For the clear p icture of statistical analysis
Accuracy F1- Score Recall Precision
see table 2.

Figure 4: Performance Measures

Fig4: Display the results of the algorithms. This graph [12] Cicoria, S., Sherlock, J., Muniswamaiah, M., & Clarke, L.
demonstrates the algorithm's performance in relation to (2014). Classification of T itanic Passenger Data and Chances
of Surviving the Disaster. Proceedings of Student-Faculty
accuracy and other factors .
Research Day CSIS, 1-6.
VII. CONCLUSION AND FUT URE SCOPE [13] Lam, E., & Tang, C. (2012). T itanic Machine Learning From
Disaster. Lampang-Titanic Machine Learning From Disaster.
Models created using machine learn ing anticipate the values [14] Xie Agarwal, B., Vovsha, I., Rambow, O., & Passonneau, R.
of passengers who survived. The random forest technique is (2011). Sentiment analysis of T witter data. Proceedings of the
applied to make predict ions in a classification challenge. ACL 2011 Workshop on Languages in Social Media.
The correctness of each model is determined by the [15] Andjelkovic Cirkovic, B. R. (2020). Machine learning
confusion matrix, and the Random Forest model co mes out approach for breast cancer prognosis prediction.
on top with an accuracy of 0.82. This indicates that the Computational Modeling in Bioengineering and
Random forest's predictive capability in th is dataset with the Bioinformatics.
selected features is quite strong. It is made very apparent
that when using a different feature modeling approach, the
models' accuracy may change. The models that provide the
best level o f accuracy for classification problems are
random forest. Machine learn ing and data analytics are
being used in this project. This project's work can be used as
a model for learn ing how to incorporate EDA and machine
learning at the very beginning. With the use of more recent
lib raries, such as shiny in R, the concept can be expanded in
the future to create more co mplex graphical user interfaces.
It is possible to create an interactive page, where the values
corresponding to an attribute's graph (such as a ggplot or
histogram) would also change if the value of the attribute is
modified on the scale. By integrating our results, we can
also get too far more precise conclusions.

REFERENCES
[1] Kaggle.com. (n.d.). Titanic: Machine Learning for Disaster.
Retrieved October 29, 2019, from http://www.kaggle.com/
[2] Jain, N., & Srivastava, V. (2013). Data mining techniques: A
survey paper. IJRET: International Journal of Research in
Engineering and Technology, 2(11), 2319-1163.
[3] Zhao, Z., & Liu, H. (2007). Spectral feature selection for
supervised and unsupervised learning. Proceedings of the 24th
international conference on Machine learning. ACM.
[4] Farag, N., & Hassan, G. (2018). Predicting the Survivors of
the Titanic Kaggle, Machine Learning From Disaster. In
ICSIE'18 Proceedings of the 7th International Conference on
Software and Information Engineering (pp. 1-7). ACM.
[5] E. Lam and C. Tang, CS229 Titanic–Machine Learning From
Disaster, 2012.
[6] Liu, J. (2017). Arkham/Jack-Dies. GitHub. Retrieved August
30, 2017, from https://github.com/Arkham/jack-dies
[7] Singh, A., Saraswat, S., & Faujdar, N. (2017). Analyzing
Titanic disaster using machine learning algorithms. 2017
International Conference on Computing, Communication and
Automation (ICCCA). IEEE.
[8] Han, J., & Morag, C. (1995). The influence of the sigmoid
function parameters on the speed of back propagation
learning. In From Natural to Artificial Neural Computation
(pp. 195-201). Springer.
[9] Peng, W., Chen, J., & Zhou, H. (2009). An implementation of
ID3-decision tree learning algorithm. Retrieved from
http://web.arch.usyd.edu.au/wpeng/DecisionTree2.pdf
[10] Ekinci, E. O., & Acun, N. (2018). A comparative study on
machine learning techniques using T itanic dataset. 7th
International Conference on Advanced T echnologies.
[11] Xiao, Y., Wang, T ., & Wu, J. (2014). T wo methods of
selecting Gaussian kernel parameters for one-class SVM and
their application to fault detection. Knowledge-Based
Systems, 59, 75-84.

Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on September 14,2023 at 07:10:14 UTC from IEEE Xplore. Restrictions apply.

R Data Analysis Projects PDF
No ratings yet
R Data Analysis Projects PDF
354 pages
Machine Learning With Real Life Project: by - Rishabh Gaur
100% (2)
Machine Learning With Real Life Project: by - Rishabh Gaur
26 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
11 pages
AI by Design A Plan For Living With Artificial Intelligence by Catriona
No ratings yet
AI by Design A Plan For Living With Artificial Intelligence by Catriona
155 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
70 pages
Internship Report On Ai
No ratings yet
Internship Report On Ai
32 pages
R22 ML Syllabus
No ratings yet
R22 ML Syllabus
2 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
Loan Prediction System
No ratings yet
Loan Prediction System
5 pages
Tableau Lab Manual
No ratings yet
Tableau Lab Manual
6 pages
Lecture+Notes (Upgrad)
No ratings yet
Lecture+Notes (Upgrad)
5 pages
R22-Ids-Question Bank
No ratings yet
R22-Ids-Question Bank
4 pages
Machine Learning Project Report
100% (1)
Machine Learning Project Report
4 pages
Bagging and Random Forest Presentation1
100% (3)
Bagging and Random Forest Presentation1
23 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
Introduction To Python and LaTeXt PDF
100% (1)
Introduction To Python and LaTeXt PDF
89 pages
Data Generalization
No ratings yet
Data Generalization
3 pages
SE 7204 BIG Data Analysis Unit I Final
No ratings yet
SE 7204 BIG Data Analysis Unit I Final
66 pages
Lab 6 - Naive Bayesian Classification Exercises
No ratings yet
Lab 6 - Naive Bayesian Classification Exercises
9 pages
Capstone Project 2 1
No ratings yet
Capstone Project 2 1
3 pages
270+ Machine Learning: Projects
100% (1)
270+ Machine Learning: Projects
15 pages
Module 4 - Study Material - Overview of Predictive Analytics
No ratings yet
Module 4 - Study Material - Overview of Predictive Analytics
15 pages
The Cricket Winner Prediction With Applications of ML and Data Analytics
No ratings yet
The Cricket Winner Prediction With Applications of ML and Data Analytics
18 pages
ML UNIT-IV Notes
100% (1)
ML UNIT-IV Notes
23 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
No ratings yet
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
5 pages
Mcgraw Hill'S Taxation of Business 2022 Entities Edition, 13Th Edition, Brian
No ratings yet
Mcgraw Hill'S Taxation of Business 2022 Entities Edition, 13Th Edition, Brian
408 pages
Final Twitter - Sentiment - Analysis - Report
100% (1)
Final Twitter - Sentiment - Analysis - Report
14 pages
Predictive Analytics
No ratings yet
Predictive Analytics
7 pages
Data Literacy Questions All Types
No ratings yet
Data Literacy Questions All Types
2 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
21 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
Week 8-Association Rules Part 1
No ratings yet
Week 8-Association Rules Part 1
31 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
Chapter 11: Business Intelligence and Knowledge Management: Oz (5th Edition)
100% (1)
Chapter 11: Business Intelligence and Knowledge Management: Oz (5th Edition)
20 pages
Predictive Analytics
No ratings yet
Predictive Analytics
9 pages
Data Mining
No ratings yet
Data Mining
7 pages
Synopsis Minor Project-2
No ratings yet
Synopsis Minor Project-2
5 pages
Business Analytics Local Author Book 1
No ratings yet
Business Analytics Local Author Book 1
233 pages
Data Analytics Using SQL Final Question Bank
No ratings yet
Data Analytics Using SQL Final Question Bank
79 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Python Interview Questions
No ratings yet
Python Interview Questions
8 pages
Sales Prediction
No ratings yet
Sales Prediction
37 pages
Memory Based Reasoning - BIA
100% (1)
Memory Based Reasoning - BIA
19 pages
CP2 - Study Guide Page 1
No ratings yet
CP2 - Study Guide Page 1
23 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
DM Practice
No ratings yet
DM Practice
15 pages
Titanic Survival Analysis
No ratings yet
Titanic Survival Analysis
61 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Advanced Certification in Data Science and Artificial Intelligence
No ratings yet
Advanced Certification in Data Science and Artificial Intelligence
18 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
No ratings yet
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
4 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Open Eyes 1 - Unit 1 - Answer Key
No ratings yet
Open Eyes 1 - Unit 1 - Answer Key
2 pages
Transformation of Sentences AHSEC Class 12
No ratings yet
Transformation of Sentences AHSEC Class 12
3 pages
ch3 SEM Methods of Estimation - 105548
No ratings yet
ch3 SEM Methods of Estimation - 105548
17 pages
Sample Dissertation Background of The Study
100% (2)
Sample Dissertation Background of The Study
5 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
19 pages
SMEPA 2020-2021: May Anne C. Almario
100% (4)
SMEPA 2020-2021: May Anne C. Almario
19 pages
An Introduction To Strategic Human Resource Management: Pawan Budhwar and Samuel Aryee
No ratings yet
An Introduction To Strategic Human Resource Management: Pawan Budhwar and Samuel Aryee
57 pages
SPM Essay Pandemic Covid-19
No ratings yet
SPM Essay Pandemic Covid-19
4 pages
Were Was Were Were Were Was Were Were Were Were Were
No ratings yet
Were Was Were Were Were Was Were Were Were Were Were
5 pages
Literature in English 0475 June 2023 Grade Threshold Table
100% (1)
Literature in English 0475 June 2023 Grade Threshold Table
2 pages
Applyinggame Basedlearninginprimary
No ratings yet
Applyinggame Basedlearninginprimary
25 pages
Jsu Bahasa Inggeris Pemahaman
No ratings yet
Jsu Bahasa Inggeris Pemahaman
2 pages
Adecuacion 4 Grado Plan Feb 23
No ratings yet
Adecuacion 4 Grado Plan Feb 23
22 pages
2023 24 Format BBA Report
No ratings yet
2023 24 Format BBA Report
6 pages
So3 b1 Quick Quiz U1b PDF
No ratings yet
So3 b1 Quick Quiz U1b PDF
2 pages
Transactional Analysis
No ratings yet
Transactional Analysis
32 pages
Expert Systems With Applications: Wei-Sen Chen, Yin-Kuan Du
No ratings yet
Expert Systems With Applications: Wei-Sen Chen, Yin-Kuan Du
12 pages
Probability Tables
No ratings yet
Probability Tables
19 pages
Applicant Details: Applicant Photo Applicationid:2021Mnh1000007263
No ratings yet
Applicant Details: Applicant Photo Applicationid:2021Mnh1000007263
6 pages
K 2 Tree Mural
No ratings yet
K 2 Tree Mural
3 pages
Advantage Club II GD Rounds II UPES
No ratings yet
Advantage Club II GD Rounds II UPES
16 pages
How Risky Are You?: A. When Approached With A Dare, You
No ratings yet
How Risky Are You?: A. When Approached With A Dare, You
6 pages
E.ramachandraiah CV
No ratings yet
E.ramachandraiah CV
4 pages
Advance Microprocessor and Microcontroller
No ratings yet
Advance Microprocessor and Microcontroller
2 pages
TLE Sir Paras Presentation
No ratings yet
TLE Sir Paras Presentation
21 pages
BSBHRM601 - Assessment 2 - Evaluate and Revise Human Resource Strategic Plan
No ratings yet
BSBHRM601 - Assessment 2 - Evaluate and Revise Human Resource Strategic Plan
3 pages
AJGP 11 2020 Clinical Selvakumaran Buckled Bent Broken WEB
No ratings yet
AJGP 11 2020 Clinical Selvakumaran Buckled Bent Broken WEB
5 pages
3D Animator Resume-1
No ratings yet
3D Animator Resume-1
1 page
De La Salle University - Dasmariñas College of Liberal Arts and Communication
No ratings yet
De La Salle University - Dasmariñas College of Liberal Arts and Communication
3 pages
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet

Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques

Uploaded by

Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques

Uploaded by

Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)

IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

Exploratory Data Analysis of Titanic Survival

Greater Noida, India

Keywords—Machine Learning, Prediction, Pattern

I. INT RODUCT ION

Figure 1: Difference between Traditional Programming and

Fig1:In the conventional approach, the programmer

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 418

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 419

250 233 D. Decision Tree is a method of supervised learning that

IV. MACHINE LEARNING MODELS V. M ODEL PERFORMANCE EVALUAT ION

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 420

matrix's order. The matrix data is widely used to evaluate

Figure 4: Performance Measures

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 421

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 422

You might also like