0% found this document useful (0 votes)
151 views19 pages

Titanic: Logistic Regression Project

This document summarizes a logistic regression project to predict Titanic passenger survival. It covers topics like exploratory data analysis, handling missing data, feature engineering, building a logistic regression model, and evaluating model performance with confusion matrices and classification reports. Members contributed to various parts of the project including data visualization, data cleaning, model building, and analyzing results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views19 pages

Titanic: Logistic Regression Project

This document summarizes a logistic regression project to predict Titanic passenger survival. It covers topics like exploratory data analysis, handling missing data, feature engineering, building a logistic regression model, and evaluating model performance with confusion matrices and classification reports. Members contributed to various parts of the project including data visualization, data cleaning, model building, and analyzing results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

TITANIC

Logistic Regression
Project

GROUP 4
Aryan Panicker - BS21DMU012
Anh Viet Doan - BS21DON043
Bang Nguyen - BS21DON020
Duy Le Duc - BS21DON032
Geethanjali Dhanish – BS21DON044
INTRODUCTION
o GIVEN : titanic_train.csv

o To Predict : whether the passenger will Survive or Not

o According to the data given :


Not Survived = 0 [negative]
Survived = 1 [positive]

TOPICS TO BE COVERED

Exploratory Logistic
Data Handling Regression Confusion Feature
Dataframe Analysis and Missing Categorica Model – Matrix and
Engineerin
Creation Data Data l Features Prediction Classification
Visualizatio and Report g
Values
n Evaluation
INITIAL STEPS (DATAFRAME
CREATION)

LIBRARIES IMPORTED Read CSV File and Display


o import numpy as np
Dataframe
o import pandas as pd o titanic = pd.read_csv('titanic_train.csv’) 
o import matplotlib.pyplot as plt
o titanic.head()
%matplotlib inline
o import seaborn as sns
EXPLORATORY DATA ANALYSIS
DATA VISUALIZATION
o sns.countplot(titanic['Survived']) o sns.countplot(x='Survived', hue='Sex', data=titanic)

o sns.countplot(titanic['Pclass']) o sns.countplot(x='Survived', hue='Pclass', data=titanic)
DATA VISUALIZATION
o sns.countplot(titanic.Parch) o sns.countplot(titanic.SibSp)
plt.title("Number of Children/Parents Aboard") plt.title("Number of Sibling/Spouses Aboard")
plt.xlabel("Children/Parents Aboard") plt.xlabel("Sibling/Spouses Aboard")

o plt.hist(titanic['Age'])
plt.xlabel("Age")
plt.ylabel("Number of persons")
plt.title('Passenger Ages on Titanic')
HANDLING MISSING DATA
o titanic.isnull().sum() DATA CLEANING
o null_1 = titanic['Age'][titanic['Pclass'] == 1].isnull()
o null_2 = titanic['Age'][titanic['Pclass'] == 2].isnull()
o null_3 = titanic['Age'][titanic['Pclass'] == 3].isnull()

o pc1 = titanic['Age'][titanic['Pclass'] == 1].mean(skipna = True)
o pc2 = titanic['Age'][titanic['Pclass'] == 2].mean(skipna = True)
o pc3 = titanic['Age'][titanic['Pclass'] == 3].mean(skipna = True)

o titanic['Age'].fillna(titanic.groupby('Pclass')
['Age'].transform('mean'), inplace = True)

o titanic_new = titanic.drop('Cabin', axis = 1)

o titanic_new.dropna(subset=['Embarked'], inplace = True)

o titanic_new.drop(['PassengerId','Name','Ticket'], axis = 1, 
inplace = True)
HANDLING MISSING DATA
DATA CLEANING – O/Ps
AFTER MISSING VALUES HANDLING
AFTER ADDITIONAL DATA
CLEANING
FEATURE ENGINEERING
o titanic_new['Title'] = titanic['Name'].apply(lambda x: x[x.find(', ')+2 : x.find('.')])
titanic_new['Title'].value_counts()
o titanic.dropna(subset=['Cabin'], inplace = True)
titanic_new['Cabin_Letter'] = titanic['Cabin'].astype(str).str[0]
titanic_new['Cabin_Letter'].value_counts()
CONVERT CATEGORICAL FEATURES

o titanic_new = pd.get_dummies(titanic_new,columns = ['Sex','Embarked'])
CONVERT CATEGORICAL FEATURES

o titanic_new = pd.get_dummies(titanic_new,columns = ['Sex','Embarked','Title','Cabin_Letter'])
LOGISTIC REGRESSION MODEL
STEP 1 : SPLITTING THE DATA
o from sklearn.model_selection import train_test_split

o X = titanic_new.drop('Survived', axis = 1)
y = titanic_new['Survived']
o X_train, X_test, y_train, y_test = train_test_split
(X, y, train_size = 0.7, random_state = 24)
LOGISTIC REGRESSION MODEL
STEP 2 : BUILDING THE MODEL
o from sklearn.linear_model import LogisticRegression

o model = LogisticRegression(solver = 'lbfgs', max_iter=900)

o model.fit(X_train, y_train)
PREDICTION AND EVALUATION
o y_pred = model.predict(X_test)

o y_pred

o model.predict_proba(X_test)
ANALYSIS – CONFUSION MATRIX AND CLASSIFICATION
REPORT (WITHOUT FEATURE ENGINEERING)
o from sklearn.metrics import confusion_matrix, 
ConfusionMatrixDisplay, classification_report
o print(confusion_matrix(y_test, y_pred))

o print(classification_report(y_test, y_pred))

o Inference: (Confusion Matrix)


True Negative : 156 ; False Positive : 13
False Negative : 30 ; True Positive : 68
o Inference: (Classification Report)
Precision = = = 68/81 = 0.839
Recall = = = 68/98 = 0.693
Assuming P = R (equal importance),
F1-score = 2* = 2*0.378 = 0.757
ANALYSIS – CONFUSION MATRIX AND CLASSIFICATION
REPORT (WITH FEATURE ENGINEERING)
o from sklearn.metrics import confusion_matrix, 
ConfusionMatrixDisplay, classification_report
o print(confusion_matrix(y_test, y_pred))

o print(classification_report(y_test, y_pred))

o Inference: (Confusion Matrix)


True Negative : 154 ; False Positive : 15
False Negative : 24 ; True Positive : 74
o Inference: (Classification Report)
Precision = = = 74/89 = 0.831
Recall = = = 74/98 = 0.755
Assuming P = R (equal importance),
F1-score = 2* = 2*0.396 = 0.793
MEMBERS CONTRIBUTION
S.No Name Contribution Slide Nos.
1. Aryan Panicker Exploratory Data Analysis, 4 - 6
Data Visualization
2. Anh Viet Doan Missing Data Handling, Data Cleaning 7 - 9
3. Bang Nguyen Building Logistic Regression Model, 13 - 15
Model Prediction and Evaluation
4. Duy Le Duc Feature Engineering, Categorical Data 10 - 12
5. Geethanjali Dhanish Introduction, Analysis of Confusion 2 – 3, 16 - 17
Matrix and Classification Report

You might also like