0% found this document useful (0 votes)
7 views2 pages

Assignment 1

The document details a Python script using Google Colab to analyze the Titanic dataset. It includes loading the dataset, displaying its structure, handling missing values, and converting categorical variables into numerical formats. Key preprocessing steps involve filling missing ages with the median and encoding 'Sex' and 'Embarked' columns.

Uploaded by

vaibhavi.darda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views2 pages

Assignment 1

The document details a Python script using Google Colab to analyze the Titanic dataset. It includes loading the dataset, displaying its structure, handling missing values, and converting categorical variables into numerical formats. Key preprocessing steps involve filling missing ages with the median and encoding 'Sex' and 'Embarked' columns.

Uploaded by

vaibhavi.darda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

4/14/25, 12:18 PM assignment1.

ipynb - Colab

from google.colab import files


uploaded = files.upload()

Choose Files Titanic-Dataset.csv


Titanic-Dataset.csv(text/csv) - 61194 bytes, last modified: 4/14/2025 - 100% done
Saving Titanic-Dataset.csv to Titanic-Dataset.csv

import pandas as pd
import numpy as np

# Load the dataset


titanic_data = pd.read_csv('Titanic-Dataset.csv')

print(titanic_data.head())
print(titanic_data.info())

PassengerId Survived Pclass \


0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Name Sex Age SibSp \


0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0

Parch Ticket Fare Cabin Embarked


0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
None

#Fill missing values for age with the median value


titanic_data['Age'].fillna(titanic_data['Age'].median(), inplace=True)

#Drop rows with missing Embarked Values


titanic_data.dropna(subset=['Embarked'], inplace=True)

<ipython-input-5-e4f1bc44da57>:2: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assi
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col

titanic_data['Age'].fillna(titanic_data['Age'].median(), inplace=True)

 

#Convert'Sex' to numerical values (0=female, 1= male)


titanic_data['Sex'] = titanic_data['Sex'].map({'female': 0, 'male': 1})

#Convert'Embarked' to numerical values (0 = C, 1 = Q, 2 =S )


titanic_data['Embarked'] = titanic_data['Embarked'].map({'C': 0, 'Q': 1, 'S': 2})

https://colab.research.google.com/drive/1ER7sS51QdSrWxsQQDyKxwPBbBfyZ4EDG#scrollTo=HzsE0ed0iMCs&printMode=true 1/2
4/14/25, 12:18 PM assignment1.ipynb - Colab

https://colab.research.google.com/drive/1ER7sS51QdSrWxsQQDyKxwPBbBfyZ4EDG#scrollTo=HzsE0ed0iMCs&printMode=true 2/2

You might also like