0% found this document useful (0 votes)
15 views5 pages

Making - End - To - End - Project - Without - Pipeline - Jupyter Notebook

The document shows the steps taken to build a machine learning model to predict Titanic passengers' chances of survival without using pipelines. It loads and cleans the Titanic dataset, applies feature engineering techniques like imputation and one-hot encoding, trains a decision tree classifier on the preprocessed data, and evaluates its performance on the test set.

Uploaded by

satyamk86770
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Making - End - To - End - Project - Without - Pipeline - Jupyter Notebook

The document shows the steps taken to build a machine learning model to predict Titanic passengers' chances of survival without using pipelines. It loads and cleans the Titanic dataset, applies feature engineering techniques like imputation and one-hot encoding, trains a decision tree classifier on the preprocessed data, and evaluates its performance on the test set.

Uploaded by

satyamk86770
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

3/18/24, 12:17 AM Making_end_to_end_project_without_pipeline - Jupyter Notebook

In [26]: import numpy as np


import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.tree import DecisionTreeClassifier

In [27]: df=pd.read_csv("titanic.csv")

In [28]: df.head()

Out[28]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked

0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S

Cumings, Mrs. John Bradley (Florence Briggs


1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C
Th...

STON/O2.
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 7.9250 NaN S
3101282

3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S

4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

In [29]: df.drop(columns=['PassengerId','Name','Ticket','Cabin'], inplace=True)

localhost:8888/notebooks/100DaysMLCourse/Making_end_to_end_project_without_pipeline.ipynb 1/5
3/18/24, 12:17 AM Making_end_to_end_project_without_pipeline - Jupyter Notebook

In [30]: df.head()

Out[30]:
Survived Pclass Sex Age SibSp Parch Fare Embarked

0 0 3 male 22.0 1 0 7.2500 S

1 1 1 female 38.0 1 0 71.2833 C

2 1 3 female 26.0 0 0 7.9250 S

3 1 1 female 35.0 1 0 53.1000 S

4 0 3 male 35.0 0 0 8.0500 S

In [31]: x_train,x_test,y_train,y_test = train_test_split(df.drop(columns=['Survived']),df['Survived'],test_size=0.2,random_state=42

In [32]: df.isnull().sum()

Out[32]: Survived 0
Pclass 0
Sex 0
Age 177
SibSp 0
Parch 0
Fare 0
Embarked 2
dtype: int64

localhost:8888/notebooks/100DaysMLCourse/Making_end_to_end_project_without_pipeline.ipynb 2/5
3/18/24, 12:17 AM Making_end_to_end_project_without_pipeline - Jupyter Notebook

In [85]: si_age = SimpleImputer()


si_embarked = SimpleImputer(strategy='most_frequent')

x_train_age= si_age.fit_transform(x_train[['Age']])
x_train_embarked = si_embarked.fit_transform(x_train[['Embarked']])

x_test_age= si_age.fit_transform(x_test[['Age']])
x_test_embarked = si_embarked.fit_transform(x_test[['Embarked']])

In [86]: # applying the one hot encoding on sex and embarked because they are catogrical data

ohe_sex= OneHotEncoder(sparse=False, handle_unknown='ignore')
ohe_embarked= OneHotEncoder(sparse=False, handle_unknown='ignore')

x_train_sex = ohe_sex.fit_transform(x_train[['Sex']])
x_train_embarked = ohe_embarked.fit_transform(x_train_embarked)

x_test_sex = ohe_sex.fit_transform(x_test[['Sex']])
x_test_embarked = ohe_embarked.fit_transform(x_test_embarked)

C:\ProgramData\anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:828: FutureWarning: `sparse` was renamed to


`sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its def
ault value.
warnings.warn(
C:\ProgramData\anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:828: FutureWarning: `sparse` was renamed to
`sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its def
ault value.
warnings.warn(
C:\ProgramData\anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:828: FutureWarning: `sparse` was renamed to
`sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its def
ault value.
warnings.warn(
C:\ProgramData\anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:828: FutureWarning: `sparse` was renamed to
`sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its def
ault value.
warnings.warn(

localhost:8888/notebooks/100DaysMLCourse/Making_end_to_end_project_without_pipeline.ipynb 3/5
3/18/24, 12:17 AM Making_end_to_end_project_without_pipeline - Jupyter Notebook

In [113]: x_train_sex

Out[113]: array([[0., 1.],


[0., 1.],
[0., 1.],
...,
[0., 1.],
[1., 0.],
[0., 1.]])

In [88]: x_train_rem = x_train.drop(columns=['Sex','Age','Embarked'])


x_test_rem = x_test.drop(columns=['Sex','Age','Embarked'])

In [95]: x_train_transformed=np.concatenate((x_train_rem,x_train_age,x_train_sex,x_train_embarked),axis=1)
x_test_transformed=np.concatenate((x_test_rem,x_test_age,x_test_sex,x_test_embarked),axis=1)

In [98]: x_test_transformed.shape

Out[98]: (179, 10)

In [102]: clf=DecisionTreeClassifier()
clf.fit(x_train_transformed,y_train)

Out[102]: ▾ DecisionTreeClassifier
DecisionTreeClassifier()

In [103]: y_pred = clf.predict(x_test_transformed)

localhost:8888/notebooks/100DaysMLCourse/Making_end_to_end_project_without_pipeline.ipynb 4/5
3/18/24, 12:17 AM Making_end_to_end_project_without_pipeline - Jupyter Notebook

In [109]: from sklearn.metrics import accuracy_score


accuracy_score(y_test,y_pred)*100

Out[109]: 74.86033519553072

In [110]: import pickle

In [112]: pickle.dump(ohe_sex,open('models/ohe_sex.pkl','wb'))
pickle.dump(ohe_embarked,open('models/ohe_embarked.pkl','wb'))
pickle.dump(clf,open('models/clf.pkl','wb'))

In [ ]: ​

localhost:8888/notebooks/100DaysMLCourse/Making_end_to_end_project_without_pipeline.ipynb 5/5

You might also like