Titanic
Titanic
DATASET
Lamia Al-Ariqi
The Titanic dataset is a popular dataset in the
field of data science and machine learning. It
contains information about the passengers who
were aboard the RMS Titanic when it sank on its
maiden voyage in 1912.
The dataset typically includes the following columns:
PassengerId Pclass
01 A unique identifier for each 03 The ticket class of the passenger
passenger. (1st, 2nd, or 3rd class).
Survived Name
02 Indicates whether the passenger 04 The name of the passenger.
survived (1) or not) 0( .
Sex SibSp
5 7
The gender of the passenger. Number of siblings/spouses
aboard the Titanic.
Age Parch
6 8
The age of the passenger. Number of parents/children
aboard the Titanic.
Ticket Cabin
9 11
The ticket number. The cabin number.
Fare Embarked
10 12
The fare paid for the ticket. The port of embarkation (C =
Cherbourg, Q = Queenstown,
S = Southampton).
Before conducting EDA (Exploratory Data Analysis) on the Titanic
dataset, I cleaned the data using Python. This involved handling null
and duplicate values, and saving the modified dataset.
Additionally, I utilized Power BI to address the following questions:
4. How many 1st class 5. How many 2nd class 6. How many 3rd class
tickets were sold on the tickets were sold on the tickets were sold on the
Titanic? Titanic? Titanic?
From the graph, we can observe that almost all
female passengers survived (since this dataset is a
test dataset and not the actual dataset).
I also created a visualization to answer the
following questions:
7. At what age range did many 8. Which cabins had all passengers
passengers not survive? survived?
Thanks!