0% found this document useful (0 votes)
8 views

Assignment Data Science

Uploaded by

Kavyansh Jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Assignment Data Science

Uploaded by

Kavyansh Jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

11/1/24, 3:46 PM Untitled50.

ipynb - Colab

keyboard_arrow_down NAME - ADITYA SINGH TOMAR


ENROLLMENT NO. - 0901ME221006

Suggested code may be subject to a license | Hemantsharma1000/Prodigy


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
try:
titanic_data = pd.read_csv('/content/Titanic-Dataset.csv')
except FileNotFoundError:
print("Error: 'titanic.csv' not found. Please upload the file or provide the correct path.")
exit()
print("\n--- Data Overview ---")
print(titanic_data.head())
print(titanic_data.info())
print(titanic_data.describe())
print("\n--- Missing Values ---")
print(titanic_data.isnull().sum())
titanic_data['Age'].fillna(titanic_data['Age'].median(), inplace=True)
print("\n--- Survival Rate by Passenger Class ---")
print(titanic_data.groupby('Pclass')['Survived'].mean())
sns.countplot(x='Pclass', hue='Survived', data=titanic_data)
plt.show()

print("\n--- Survival Rate by Sex ---")


print(titanic_data.groupby('Sex')['Survived'].mean())
sns.countplot(x='Sex', hue='Survived', data=titanic_data)
plt.show()

print("\n--- Age Distribution ---")


plt.hist(titanic_data['Age'], bins=20)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
plt.show()
print("\n--- Fare Distribution ---")
plt.hist(titanic_data['Fare'], bins=20)
plt.xlabel('Fare')
plt.ylabel('Frequency')
plt.title('Fare Distribution')
plt.show()
print("\n--- Survival Rate by Embarked ---")
print(titanic_data.groupby('Embarked')['Survived'].mean())
sns.countplot(x='Embarked', hue='Survived', data=titanic_data)
plt.show()

https://colab.research.google.com/drive/1kjkCVL883kU6Spe-kxjZZ_1_sIcRhGM9#scrollTo=SLLWArMCM-ts&printMode=true 1/2
11/1/24, 3:46 PM Untitled50.ipynb - Colab

--- Data Overview ---


PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Name Sex Age SibSp \


0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0

Parch Ticket Fare Cabin Embarked


0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
None
PassengerId Survived Pclass Age SibSp \
count 891.000000 891.000000 891.000000 714.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008
std 257.353842 0.486592 0.836071 14.526497 1.102743
min 1.000000 0.000000 1.000000 0.420000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000
50% 446.000000 0.000000 3.000000 28.000000 0.000000
75% 668.500000 1.000000 3.000000 38.000000 1.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000

Parch Fare
count 891.000000 891.000000
mean 0.381594 32.204208
std 0.806057 49.693429
min 0.000000 0.000000
25% 0.000000 7.910400
50% 0.000000 14.454200
75% 0.000000 31.000000
max 6.000000 512.329200

--- Missing Values ---


PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64

--- Survival Rate by Passenger Class ---


Pclass
1 0.629630
2 0.472826
3 0.242363
Name: Survived, dtype: float64
<ipython-input-2-46b4b7a96186>:16: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setti

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[

titanic_data['Age'].fillna(titanic_data['Age'].median(), inplace=True)

https://colab.research.google.com/drive/1kjkCVL883kU6Spe-kxjZZ_1_sIcRhGM9#scrollTo=SLLWArMCM-ts&printMode=true 2/2

You might also like