0% found this document useful (0 votes)
31 views18 pages

Aiml Lab04&5 - Output

The document analyzes Titanic passenger data from three CSV files using Python libraries like Pandas and NumPy. It loads and inspects the data, checks column names and data types, identifies missing values, and calculates basic statistics.

Uploaded by

darshil shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views18 pages

Aiml Lab04&5 - Output

The document analyzes Titanic passenger data from three CSV files using Python libraries like Pandas and NumPy. It loads and inspects the data, checks column names and data types, identifies missing values, and calculates basic statistics.

Uploaded by

darshil shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

2/26/24, 10:20 AM Titanic

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
x=pd.read_csv("test.csv")
y=pd.read_csv("train.csv")
z=pd.read_csv("gender_submission.csv")

In [3]:
y.head()

Out[3]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin E

Braund,
A/5
0 1 0 3 Mr. Owen male 22.0 1 0 7.2500 NaN
21171
Harris

Cumings,
Mrs. John
Bradley
1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85
(Florence
Briggs
Th...

Heikkinen,
STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250 NaN
3101282
Laina

Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000 C123
Heath
(Lily May
Peel)

Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0500 NaN
Henry

In [4]:
x.head()

Out[4]: PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked

Kelly, Mr.
0 892 3 male 34.5 0 0 330911 7.8292 NaN Q
James

Wilkes,
Mrs.
1 893 3 James female 47.0 1 0 363272 7.0000 NaN S
(Ellen
Needs)

Myles,
Mr.
2 894 2 male 62.0 0 0 240276 9.6875 NaN Q
Thomas
Francis

3 895 3 Wirz, Mr. male 27.0 0 0 315154 8.6625 NaN S

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 1/18
2/26/24, 10:20 AM Titanic

PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
Albert

Hirvonen,
Mrs.
4 896 3 Alexander female 22.0 1 1 3101298 12.2875 NaN S
(Helga E
Lindqvist)

In [5]:
y.describe()

Out[5]: PassengerId Survived Pclass Age SibSp Parch Fare

count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000

mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208

std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429

min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400

50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200

75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000

max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

In [6]:
y.columns

Out[6]: Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',


'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')

In [7]:
y.dtypes
y.isnull().sum()

Out[7]: PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64

In [8]:
y.info

Out[8]: <bound method DataFrame.info of PassengerId Survived Pclass \


0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
.. ... ... ...
886 887 0 2
localhost:8889/nbconvert/html/Titanic.ipynb?download=false 2/18
2/26/24, 10:20 AM Titanic
887 888 1 1
888 889 0 3
889 890 1 1
890 891 0 3

Name Sex Age SibSp \


0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
.. ... ... ... ...
886 Montvila, Rev. Juozas male 27.0 0
887 Graham, Miss. Margaret Edith female 19.0 0
888 Johnston, Miss. Catherine Helen "Carrie" female NaN 1
889 Behr, Mr. Karl Howell male 26.0 0
890 Dooley, Mr. Patrick male 32.0 0

Parch Ticket Fare Cabin Embarked


0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
.. ... ... ... ... ...
886 0 211536 13.0000 NaN S
887 0 112053 30.0000 B42 S
888 2 W./C. 6607 23.4500 NaN S
889 0 111369 30.0000 C148 C
890 0 370376 7.7500 NaN Q

[891 rows x 12 columns]>

In [9]:
y.Survived.value_counts()

Out[9]: 0 549
1 342
Name: Survived, dtype: int64

In [10]:
plt=y.Survived.value_counts().plot(kind='bar')
plt.set_xlabel('Survived or not ')
plt.set_ylabel('Passenger Count ')

Out[10]: Text(0, 0.5, 'Passenger Count ')

In [11]:
plt= y.Pclass.value_counts().sort_index().plot(kind='bar',title='')
plt.set_xlabel('Pclass')
plt.set_ylabel('Survival probabiltiy')
localhost:8889/nbconvert/html/Titanic.ipynb?download=false 3/18
2/26/24, 10:20 AM Titanic

Out[11]: Text(0, 0.5, 'Survival probabiltiy')

In [12]:
y[['Pclass','Survived']].groupby('Pclass').count()
y[['Pclass','Survived']].groupby('Pclass').sum()
plt=y[['Pclass','Survived']].groupby('Pclass').mean().Survived.plot(kind='bar')

In [13]:
plt.set_xlabel('Pclass')
plt.set_ylabel('survival Probability')

Out[13]: Text(3.200000000000003, 0.5, 'survival Probability')

In [14]:
plt=y.Sex.value_counts().sort_index().plot(kind='bar')
plt.set_xlabel('Sex')
plt.set_ylabel('Passenger Count')

plt=y[['Embarked','Survived']].groupby('Embarked').mean().Survived.plot(kind='bar')

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 4/18
2/26/24, 10:20 AM Titanic

In [15]:
plt=y[['Sex','Survived']].groupby('Sex').mean().Survived.plot(kind='bar')
plt.set_xlabel('Sex')
plt.set_ylabel('Passenger Count')

Out[15]: Text(0, 0.5, 'Passenger Count')

In [16]:
plt=y.Embarked.value_counts().sort_index().plot(kind='bar')
plt.set_xlabel('Embarked')
plt.set_ylabel('Passenger Count')

Out[16]: Text(0, 0.5, 'Passenger Count')

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 5/18
2/26/24, 10:20 AM Titanic

In [17]:
plt=y.SibSp.value_counts().sort_index().plot(kind='bar')
plt.set_xlabel('SibSp')
plt.set_ylabel('Passenger Count')

Out[17]: Text(0, 0.5, 'Passenger Count')

In [18]:
plt=y[['SibSp','Survived']].groupby('SibSp').mean().Survived.plot(kind='bar')
plt.set_xlabel('SibSp')
plt.set_ylabel('Survival Probability')

Out[18]: Text(0, 0.5, 'Survival Probability')

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 6/18
2/26/24, 10:20 AM Titanic

In [19]:
plt=y.Parch.value_counts().sort_index().plot(kind='bar')
plt.set_xlabel('Parch')
plt.set_ylabel('Passenger Count')

Out[19]: Text(0, 0.5, 'Passenger Count')

In [20]:
plt=y[['Parch','Survived']].groupby('Parch').mean().Survived.plot(kind='bar')
plt.set_xlabel('Parch')
plt.set_ylabel('Survival Probability')

Out[20]: Text(0, 0.5, 'Survival Probability')

In [21]:
sns.factorplot('Pclass',col='Embarked',data=y,kind='count')

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3714: UserWarning:
The `factorplot` function has been renamed to `catplot`. The original name will be re
moved in a future release. Please update your code. Note that the default `kind` in `
factorplot` (`'point'`) has changed `'strip'` in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning:
Pass the following variable as a keyword arg: x. From version 0.12, the only valid po
sitional argument will be `data`, and passing other arguments without an explicit key
word will result in an error or misinterpretation.
warnings.warn(
Out[21]: <seaborn.axisgrid.FacetGrid at 0x1eeddcb92e0>

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 7/18
2/26/24, 10:20 AM Titanic

In [22]:
sns.factorplot('Sex',col='Pclass',data=y,kind='count')

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3714: UserWarning:
The `factorplot` function has been renamed to `catplot`. The original name will be re
moved in a future release. Please update your code. Note that the default `kind` in `
factorplot` (`'point'`) has changed `'strip'` in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning:
Pass the following variable as a keyword arg: x. From version 0.12, the only valid po
sitional argument will be `data`, and passing other arguments without an explicit key
word will result in an error or misinterpretation.
warnings.warn(
Out[22]: <seaborn.axisgrid.FacetGrid at 0x1eede0e84f0>

In [23]:
sns.factorplot('Sex',col='Embarked',data=y,kind='count')

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3714: UserWarning:
The `factorplot` function has been renamed to `catplot`. The original name will be re
moved in a future release. Please update your code. Note that the default `kind` in `
factorplot` (`'point'`) has changed `'strip'` in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning:
Pass the following variable as a keyword arg: x. From version 0.12, the only valid po
sitional argument will be `data`, and passing other arguments without an explicit key
word will result in an error or misinterpretation.
warnings.warn(
Out[23]: <seaborn.axisgrid.FacetGrid at 0x1eeddb86be0>

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 8/18
2/26/24, 10:20 AM Titanic

In [24]:
y.drop(822,axis=0,inplace=True)

In [25]:
y['Familysize']=y['SibSp']+y['Parch']+1
y.head()

Out[25]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin E

Braund,
A/5
0 1 0 3 Mr. Owen male 22.0 1 0 7.2500 NaN
21171
Harris

Cumings,
Mrs. John
Bradley
1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85
(Florence
Briggs
Th...

Heikkinen,
STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250 NaN
3101282
Laina

Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000 C123
Heath
(Lily May
Peel)

Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0500 NaN
Henry

In [26]:
y=y.drop(columns=['Ticket','PassengerId','Cabin'])
y.head()
y['Sex']=y['Sex'].map({'male':0,'female':1})
y['Embarked']=y['Embarked'].map({'C':0,'Q':1,'S':2})
y.head()

Out[26]: Survived Pclass Name Sex Age SibSp Parch Fare Embarked Familysize

Braund, Mr. Owen


0 0 3 0 22.0 1 0 7.2500 2.0 2
Harris

1 1 1 Cumings, Mrs. John 1 38.0 1 0 71.2833 0.0 2


Bradley (Florence

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 9/18
2/26/24, 10:20 AM Titanic

Survived Pclass Name Sex Age SibSp Parch Fare Embarked Familysize
Briggs Th...

Heikkinen, Miss.
2 1 3 1 26.0 0 0 7.9250 2.0 1
Laina

Futrelle, Mrs.
3 1 1 Jacques Heath (Lily 1 35.0 1 0 53.1000 2.0 2
May Peel)

Allen, Mr. William


4 0 3 0 35.0 0 0 8.0500 2.0 1
Henry

In [27]:
y['Title']=y.Name.str.extract('([A-Za-z]+)\.',expand=False)

In [28]:
y=y.drop(columns="Name")
y.Title.unique()
y.head()

Out[28]: Survived Pclass Sex Age SibSp Parch Fare Embarked Familysize Title

0 0 3 0 22.0 1 0 7.2500 2.0 2 Mr

1 1 1 1 38.0 1 0 71.2833 0.0 2 Mrs

2 1 3 1 26.0 0 0 7.9250 2.0 1 Miss

3 1 1 1 35.0 1 0 53.1000 2.0 2 Mrs

4 0 3 0 35.0 0 0 8.0500 2.0 1 Mr

In [29]:
y.Title.value_counts().plot(kind='bar')

Out[29]: <AxesSubplot:>

In [30]:
y['Title']=y['Title'].replace(['Dr','Rev','Col', 'Major', 'Countess', 'Sir', 'Johnkh

In [31]:
y['Title']=y['Title'].replace('Ms', 'Miss')

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 10/18
2/26/24, 10:20 AM Titanic

In [32]:
y['Title']=y['Title'].replace('Mlle', 'Miss')

In [33]:
y['Title']=y['Title'].replace('Mme', 'Mrs')

In [34]:
y['Title']=y['Title'].replace('Master', 'Mr')
y.head()

Out[34]: Survived Pclass Sex Age SibSp Parch Fare Embarked Familysize Title

0 0 3 0 22.0 1 0 7.2500 2.0 2 Mr

1 1 1 1 38.0 1 0 71.2833 0.0 2 Mrs

2 1 3 1 26.0 0 0 7.9250 2.0 1 Miss

3 1 1 1 35.0 1 0 53.1000 2.0 2 Mrs

4 0 3 0 35.0 0 0 8.0500 2.0 1 Mr

In [35]:
plt=y.Title.value_counts().sort_index().plot(kind='bar')
plt.set_xlabel('Title')
plt.set_ylabel('Passenger Count')

Out[35]: Text(0, 0.5, 'Passenger Count')

In [36]:
plt=y[['Title','Survived']].groupby('Title').mean().Survived.plot(kind='bar')
plt.set_xlabel('Title')
plt.set_ylabel('Survival Probability')

Out[36]: Text(0, 0.5, 'Survival Probability')

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 11/18
2/26/24, 10:20 AM Titanic

In [37]:
y['Title']=y['Title'].map({'Master':0,'Miss':1,'Mr':2,'Mrs':3,'Others':4})
corr_matrix=y.corr()

In [38]:
import matplotlib.pyplot as plt
plt.figure(figsize=(9,8))
sns.heatmap(data=corr_matrix, cmap='BrBG', annot=True, linewidths=0.2)

Out[38]: <AxesSubplot:>

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 12/18
2/26/24, 10:20 AM Titanic

In [39]:
y.isnull().sum()
y['Embarked']=y['Embarked'].fillna(2)
y.head()

Out[39]: Survived Pclass Sex Age SibSp Parch Fare Embarked Familysize Title

0 0 3 0 22.0 1 0 7.2500 2.0 2 2

1 1 1 1 38.0 1 0 71.2833 0.0 2 3

2 1 3 1 26.0 0 0 7.9250 2.0 1 1

3 1 1 1 35.0 1 0 53.1000 2.0 2 3

4 0 3 0 35.0 0 0 8.0500 2.0 1 2

In [40]:
age_median_train=y.Age.median()
y.Age=y.Age.fillna(age_median_train)
print(age_median_train)

28.0

In [41]:
y.isnull().sum()
y.head()

Out[41]: Survived Pclass Sex Age SibSp Parch Fare Embarked Familysize Title

0 0 3 0 22.0 1 0 7.2500 2.0 2 2

1 1 1 1 38.0 1 0 71.2833 0.0 2 3

2 1 3 1 26.0 0 0 7.9250 2.0 1 1

3 1 1 1 35.0 1 0 53.1000 2.0 2 3

4 0 3 0 35.0 0 0 8.0500 2.0 1 2

In [42]:
from sklearn.utils import shuffle
y=shuffle(y)
y.head()

Out[42]: Survived Pclass Sex Age SibSp Parch Fare Embarked Familysize Title

761 0 3 0 41.0 0 0 7.1250 2.0 1 2

555 0 1 0 62.0 0 0 26.5500 2.0 1 2

351 0 1 0 28.0 0 0 35.0000 2.0 1 2

438 0 1 0 64.0 1 4 263.0000 2.0 6 2

172 1 3 1 1.0 1 1 11.1333 2.0 3 1

In [43]:
x_train=y.drop(columns='Survived')
y_train=y[['Survived']]
x_train.shape

Out[43]: (890, 9)

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 13/18
2/26/24, 10:20 AM Titanic

In [44]:
y=y.drop(columns='Sex')
x_train.head()

Out[44]: Pclass Sex Age SibSp Parch Fare Embarked Familysize Title

761 3 0 41.0 0 0 7.1250 2.0 1 2

555 1 0 62.0 0 0 26.5500 2.0 1 2

351 1 0 28.0 0 0 35.0000 2.0 1 2

438 1 0 64.0 1 4 263.0000 2.0 6 2

172 3 1 1.0 1 1 11.1333 2.0 3 1

In [ ]:

In [45]:
from sklearn.linear_model import LogisticRegression
y_train.head()

Out[45]: Survived

761 0

555 0

351 0

438 0

172 1

In [46]:
y_train.isnull()

Out[46]: Survived

761 False

555 False

351 False

438 False

172 False

... ...

129 False

117 False

692 False

419 False

360 False

890 rows × 1 columns

In [47]:
from sklearn.model_selection import train_test_split
localhost:8889/nbconvert/html/Titanic.ipynb?download=false 14/18
2/26/24, 10:20 AM Titanic

In [48]:
x_training, x_valid, y_training, y_valid= train_test_split(x_train, y_train, test_si
logreg_clf=LogisticRegression()

In [49]:
logreg_clf.fit(x_training, y_training)

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py:63: DataConver
sionWarning: A column-vector y was passed when a 1d array was expected. Please change
the shape of y to (n_samples, ), for example using ravel().
return f(*args, **kwargs)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:763: Con
vergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Out[49]: LogisticRegression()

In [50]:
prediction=logreg_clf.predict(x_valid)

In [51]:
from sklearn.metrics import accuracy_score
accuracy_score(y_valid,prediction)

Out[51]: 0.7640449438202247

In [52]:
from sklearn.metrics import confusion_matrix
confusion=confusion_matrix(y_valid,prediction,labels=[1,0])

In [53]:
print(confusion)

[[42 22]
[20 94]]

In [54]:
from sklearn.metrics import classification_report
report=classification_report(y_valid,prediction)
print(report)

precision recall f1-score support

0 0.81 0.82 0.82 114


1 0.68 0.66 0.67 64

accuracy 0.76 178


macro avg 0.74 0.74 0.74 178
weighted avg 0.76 0.76 0.76 178

In [55]:
x['Familysize']=x['SibSp']+x['Parch']+1
x.head()

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 15/18
2/26/24, 10:20 AM Titanic

Out[55]: PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked F

Kelly, Mr.
0 892 3 male 34.5 0 0 330911 7.8292 NaN Q
James

Wilkes,
Mrs.
1 893 3 James female 47.0 1 0 363272 7.0000 NaN S
(Ellen
Needs)

Myles,
Mr.
2 894 2 male 62.0 0 0 240276 9.6875 NaN Q
Thomas
Francis

Wirz, Mr.
3 895 3 male 27.0 0 0 315154 8.6625 NaN S
Albert

Hirvonen,
Mrs.
4 896 3 Alexander female 22.0 1 1 3101298 12.2875 NaN S
(Helga E
Lindqvist)

In [56]:
x=x.drop(columns=['Ticket','PassengerId','Cabin'])
x.head()
x['Sex']=x['Sex'].map({'male':0,'female':1})
x['Embarked']=x['Embarked'].map({'C':0,'Q':1,'S':2})
x.head()

Out[56]: Pclass Name Sex Age SibSp Parch Fare Embarked Familysize

0 3 Kelly, Mr. James 0 34.5 0 0 7.8292 1 1

1 3 Wilkes, Mrs. James (Ellen Needs) 1 47.0 1 0 7.0000 2 2

2 2 Myles, Mr. Thomas Francis 0 62.0 0 0 9.6875 1 1

3 3 Wirz, Mr. Albert 0 27.0 0 0 8.6625 2 1

Hirvonen, Mrs. Alexander (Helga


4 3 1 22.0 1 1 12.2875 2 3
E Lindqvist)

In [57]:
x['Title']=x.Name.str.extract('([A-Za-z]+)\.',expand=False)

In [58]:
x=x.drop(columns="Name")
x.Title.unique()
x.head()

Out[58]: Pclass Sex Age SibSp Parch Fare Embarked Familysize Title

0 3 0 34.5 0 0 7.8292 1 1 Mr

1 3 1 47.0 1 0 7.0000 2 2 Mrs

2 2 0 62.0 0 0 9.6875 1 1 Mr

3 3 0 27.0 0 0 8.6625 2 1 Mr

4 3 1 22.0 1 1 12.2875 2 3 Mrs

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 16/18
2/26/24, 10:20 AM Titanic

In [59]:
x['Title']=x['Title'].replace(['Dr','Rev','Col', 'Major', 'Countess', 'Sir', 'Johnkh
x['Title']=x['Title'].replace('Ms', 'Miss')
x['Title']=x['Title'].replace('Mlle', 'Miss')
x['Title']=x['Title'].replace('Mme', 'Mrs')
x['Title']=x['Title'].replace('Master', 'Mr')
x.head()

Out[59]: Pclass Sex Age SibSp Parch Fare Embarked Familysize Title

0 3 0 34.5 0 0 7.8292 1 1 Mr

1 3 1 47.0 1 0 7.0000 2 2 Mrs

2 2 0 62.0 0 0 9.6875 1 1 Mr

3 3 0 27.0 0 0 8.6625 2 1 Mr

4 3 1 22.0 1 1 12.2875 2 3 Mrs

In [60]:
x['Title']=x['Title'].map({'Master':0,'Miss':1,'Mr':2,'Mrs':3,'Others':4})
corr_matrix=x.corr()

In [61]:
x.isnull().sum()
x['Embarked']=x['Embarked'].fillna(2)
x.head()

Out[61]: Pclass Sex Age SibSp Parch Fare Embarked Familysize Title

0 3 0 34.5 0 0 7.8292 1 1 2.0

1 3 1 47.0 1 0 7.0000 2 2 3.0

2 2 0 62.0 0 0 9.6875 1 1 2.0

3 3 0 27.0 0 0 8.6625 2 1 2.0

4 3 1 22.0 1 1 12.2875 2 3 3.0

In [62]:
z_train=x
w_train=z[['Survived']]
z_train.shape

Out[62]: (418, 9)

In [63]:
age_median_train=x.Age.median()
x.Age=x.Age.fillna(age_median_train)
print(age_median_train)

27.0

In [64]:
T_median_train=y.Title.median()
x.Title=x.Title.fillna(T_median_train)
print(T_median_train)

2.0

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 17/18
2/26/24, 10:20 AM Titanic

In [65]:
f_median_train=x.Fare.median()
x.Fare=x.Fare.fillna(f_median_train)
print(f_median_train)

14.4542

In [66]:
prediction1=logreg_clf.predict(z_train)

In [68]:
accuracy_score(w_train,prediction1)

Out[68]: 0.9425837320574163

In [69]:
confusion=confusion_matrix(w_train,prediction1,labels=[1,0])

In [70]:
from sklearn.metrics import classification_report
report=classification_report(w_train,prediction1)
print(report)

precision recall f1-score support

0 0.97 0.94 0.95 266


1 0.91 0.94 0.92 152

accuracy 0.94 418


macro avg 0.94 0.94 0.94 418
weighted avg 0.94 0.94 0.94 418

In [ ]:

localhost:8889/nbconvert/html/Titanic.ipynb?download=false 18/18

You might also like