0% found this document useful (0 votes)
5 views

Making - End - To - End - Project - Without - Pipeline - Jupyter Notebook

The document shows the steps taken to build a machine learning model to predict Titanic passengers' chances of survival without using pipelines. It loads and cleans the Titanic dataset, applies feature engineering techniques like imputation and one-hot encoding, trains a decision tree classifier on the preprocessed data, and evaluates its performance on the test set.

Uploaded by

satyamk86770
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Making - End - To - End - Project - Without - Pipeline - Jupyter Notebook

The document shows the steps taken to build a machine learning model to predict Titanic passengers' chances of survival without using pipelines. It loads and cleans the Titanic dataset, applies feature engineering techniques like imputation and one-hot encoding, trains a decision tree classifier on the preprocessed data, and evaluates its performance on the test set.

Uploaded by

satyamk86770
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

3/18/24, 12:17 AM Making_end_to_end_project_without_pipeline - Jupyter Notebook

In [26]: import numpy as np


import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.tree import DecisionTreeClassifier

In [27]: df=pd.read_csv("titanic.csv")

In [28]: df.head()

Out[28]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked

0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S

Cumings, Mrs. John Bradley (Florence Briggs


1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C
Th...

STON/O2.
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 7.9250 NaN S
3101282

3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S

4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

In [29]: df.drop(columns=['PassengerId','Name','Ticket','Cabin'], inplace=True)

localhost:8888/notebooks/100DaysMLCourse/Making_end_to_end_project_without_pipeline.ipynb 1/5
3/18/24, 12:17 AM Making_end_to_end_project_without_pipeline - Jupyter Notebook

In [30]: df.head()

Out[30]:
Survived Pclass Sex Age SibSp Parch Fare Embarked

0 0 3 male 22.0 1 0 7.2500 S

1 1 1 female 38.0 1 0 71.2833 C

2 1 3 female 26.0 0 0 7.9250 S

3 1 1 female 35.0 1 0 53.1000 S

4 0 3 male 35.0 0 0 8.0500 S

In [31]: x_train,x_test,y_train,y_test = train_test_split(df.drop(columns=['Survived']),df['Survived'],test_size=0.2,random_state=42

In [32]: df.isnull().sum()

Out[32]: Survived 0
Pclass 0
Sex 0
Age 177
SibSp 0
Parch 0
Fare 0
Embarked 2
dtype: int64

localhost:8888/notebooks/100DaysMLCourse/Making_end_to_end_project_without_pipeline.ipynb 2/5
3/18/24, 12:17 AM Making_end_to_end_project_without_pipeline - Jupyter Notebook

In [85]: si_age = SimpleImputer()


si_embarked = SimpleImputer(strategy='most_frequent')

x_train_age= si_age.fit_transform(x_train[['Age']])
x_train_embarked = si_embarked.fit_transform(x_train[['Embarked']])

x_test_age= si_age.fit_transform(x_test[['Age']])
x_test_embarked = si_embarked.fit_transform(x_test[['Embarked']])

In [86]: # applying the one hot encoding on sex and embarked because they are catogrical data

ohe_sex= OneHotEncoder(sparse=False, handle_unknown='ignore')
ohe_embarked= OneHotEncoder(sparse=False, handle_unknown='ignore')

x_train_sex = ohe_sex.fit_transform(x_train[['Sex']])
x_train_embarked = ohe_embarked.fit_transform(x_train_embarked)

x_test_sex = ohe_sex.fit_transform(x_test[['Sex']])
x_test_embarked = ohe_embarked.fit_transform(x_test_embarked)

C:\ProgramData\anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:828: FutureWarning: `sparse` was renamed to


`sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its def
ault value.
warnings.warn(
C:\ProgramData\anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:828: FutureWarning: `sparse` was renamed to
`sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its def
ault value.
warnings.warn(
C:\ProgramData\anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:828: FutureWarning: `sparse` was renamed to
`sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its def
ault value.
warnings.warn(
C:\ProgramData\anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:828: FutureWarning: `sparse` was renamed to
`sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its def
ault value.
warnings.warn(

localhost:8888/notebooks/100DaysMLCourse/Making_end_to_end_project_without_pipeline.ipynb 3/5
3/18/24, 12:17 AM Making_end_to_end_project_without_pipeline - Jupyter Notebook

In [113]: x_train_sex

Out[113]: array([[0., 1.],


[0., 1.],
[0., 1.],
...,
[0., 1.],
[1., 0.],
[0., 1.]])

In [88]: x_train_rem = x_train.drop(columns=['Sex','Age','Embarked'])


x_test_rem = x_test.drop(columns=['Sex','Age','Embarked'])

In [95]: x_train_transformed=np.concatenate((x_train_rem,x_train_age,x_train_sex,x_train_embarked),axis=1)
x_test_transformed=np.concatenate((x_test_rem,x_test_age,x_test_sex,x_test_embarked),axis=1)

In [98]: x_test_transformed.shape

Out[98]: (179, 10)

In [102]: clf=DecisionTreeClassifier()
clf.fit(x_train_transformed,y_train)

Out[102]: ▾ DecisionTreeClassifier
DecisionTreeClassifier()

In [103]: y_pred = clf.predict(x_test_transformed)

localhost:8888/notebooks/100DaysMLCourse/Making_end_to_end_project_without_pipeline.ipynb 4/5
3/18/24, 12:17 AM Making_end_to_end_project_without_pipeline - Jupyter Notebook

In [109]: from sklearn.metrics import accuracy_score


accuracy_score(y_test,y_pred)*100

Out[109]: 74.86033519553072

In [110]: import pickle

In [112]: pickle.dump(ohe_sex,open('models/ohe_sex.pkl','wb'))
pickle.dump(ohe_embarked,open('models/ohe_embarked.pkl','wb'))
pickle.dump(clf,open('models/clf.pkl','wb'))

In [ ]: ​

localhost:8888/notebooks/100DaysMLCourse/Making_end_to_end_project_without_pipeline.ipynb 5/5

You might also like