Assignment 5
Assignment 5
import pandas as pd
import numpy as np
train = pd.read_csv('/content/drive/MyDrive/train.csv')
test = pd.read_csv('/content/drive/MyDrive/test.csv')
gender_submission = pd.read_csv('/content/drive/MyDrive/gender_submission.csv')
Test Dataset:
None
4 5 0 3
e. General description:
Embarked
count 889
unique 3
top S
freq 644
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN
f. Index of rows:
Creating a DataFrame
empty_df = pd.DataFrame()
print("\nEmpty DataFrame:")
print(empty_df)
Empty DataFrame:
Empty DataFrame
Columns: []
Index: []
b. Create a DataFrame from a dictionary:
import pandas as pd
data_dict = {
'Name': ['Ehtisham', 'Amjad', 'Muhammad'],
'Age': [25, 30, 35]
}
# Create a DataFrame from the dictionary
dict_df = pd.DataFrame(data_dict)
import pandas as pd
train = pd.read_csv('/content/drive/MyDrive/train.csv')
FirstName
0 Mr. Owen Harris
1 Mrs. John Bradley (Florence Briggs Thayer)
2 Miss. Laina
3 Mrs. Jacques Heath (Lily May Peel)
4 Mr. William Henry
def find_age_group(age):
if age < 18:
return 'Child'
elif 18 <= age < 40:
return 'Young Adult'
elif 40 <= age < 60:
return 'Middle Aged'
else:
return 'Senior'
train['AgeGroup'] = train['Age'].apply(find_age_group)
print("\nAge groups created:")
print(train[['Age', 'AgeGroup']].head())
a. Rename a column:
train.rename(columns={'Age': 'PassengerAge'}, inplace=True)
print("\nColumns after renaming 'Age' to 'PassengerAge':")
print(train.columns)
pclass_3 = train[train['Pclass'] == 3]
print("\nPassengers in 3rd class:")
print(pclass_3.head())
Passengers in 3rd class:
PassengerId Survived Pclass Name Sex \
0 1 0 3 Braund, Mr. Owen Harris male
2 3 1 3 Heikkinen, Miss. Laina female
4 5 0 3 Allen, Mr. William Henry male
5 6 0 3 Moran, Mr. James male
7 8 0 3 Palsson, Master. Gosta Leonard male
numerical = train.select_dtypes(include=[np.number])
categorical = train.select_dtypes(include=['object'])
print("\nNumerical columns:")
print(numerical.head())
print("\nCategorical columns:")
print(categorical.head())
Numerical columns:
PassengerId Survived Pclass PassengerAge SibSp Parch Fare \
0 1 0 3 22.0 1 0 7.2500
1 2 1 1 38.0 1 0 71.2833
2 3 1 3 26.0 0 0 7.9250
3 4 1 1 35.0 1 0 53.1000
4 5 0 3 35.0 0 0 8.0500
MenIn3rdClass
0 1
1 0
2 0
3 0
4 1
Categorical columns:
Name Sex \
0 Braund, Mr. Owen Harris male
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female
2 Heikkinen, Miss. Laina female
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female
4 Allen, Mr. William Henry male
FirstName AgeGroup
0 Mr. Owen Harris Young Adult
1 Mrs. John Bradley (Florence Briggs Thayer) Young Adult
2 Miss. Laina Young Adult
3 Mrs. Jacques Heath (Lily May Peel) Young Adult
4 Mr. William Henry Young Adult
Adding Rows
new_row_2 = {
'PassengerId': 893,
'Survived': 1,
'Pclass': 2,
'Name': 'Example, Miss Dummy',
'Sex': 'female',
'Age': 28,
'SibSp': 1,
'Parch': 0,
'Ticket': 'SC/Paris 21195',
'Fare': 26.0,
'Embarked': 'C'
}
GroupBy Operations
grouped = train.groupby('Pclass')
print("\nAverage PassengerAge per Pclass:")
print(grouped['PassengerAge'].mean())
Joining DataFrames
# Inner Join
inner = marks_df.merge(age_df, on='Sno', how='inner')
print("\nInner Join:")
print(inner)
# Left Join
left = marks_df.merge(age_df, on='Sno', how='left')
print("\nLeft Join:")
print(left)
# Outer Join
outer = marks_df.merge(age_df, on='Sno', how='outer')
print("\nOuter Join:")
print(outer)
Inner Join:
Sno Marks Age
0 2 80 25
1 3 70 35
Left Join:
Sno Marks Age
0 1 90 NaN
1 2 80 25.0
2 3 70 35.0
Outer Join:
Sno Marks Age
0 1 90.0 NaN
1 2 80.0 25.0
2 3 70.0 35.0
3 4 NaN 45.0