Titanic: Logistic Regression Project
Titanic: Logistic Regression Project
Logistic Regression
Project
GROUP 4
Aryan Panicker - BS21DMU012
Anh Viet Doan - BS21DON043
Bang Nguyen - BS21DON020
Duy Le Duc - BS21DON032
Geethanjali Dhanish – BS21DON044
INTRODUCTION
o GIVEN : titanic_train.csv
TOPICS TO BE COVERED
Exploratory Logistic
Data Handling Regression Confusion Feature
Dataframe Analysis and Missing Categorica Model – Matrix and
Engineerin
Creation Data Data l Features Prediction Classification
Visualizatio and Report g
Values
n Evaluation
INITIAL STEPS (DATAFRAME
CREATION)
o sns.countplot(titanic['Pclass']) o sns.countplot(x='Survived', hue='Pclass', data=titanic)
DATA VISUALIZATION
o sns.countplot(titanic.Parch) o sns.countplot(titanic.SibSp)
plt.title("Number of Children/Parents Aboard") plt.title("Number of Sibling/Spouses Aboard")
plt.xlabel("Children/Parents Aboard") plt.xlabel("Sibling/Spouses Aboard")
o plt.hist(titanic['Age'])
plt.xlabel("Age")
plt.ylabel("Number of persons")
plt.title('Passenger Ages on Titanic')
HANDLING MISSING DATA
o titanic.isnull().sum() DATA CLEANING
o null_1 = titanic['Age'][titanic['Pclass'] == 1].isnull()
o null_2 = titanic['Age'][titanic['Pclass'] == 2].isnull()
o null_3 = titanic['Age'][titanic['Pclass'] == 3].isnull()
o pc1 = titanic['Age'][titanic['Pclass'] == 1].mean(skipna = True)
o pc2 = titanic['Age'][titanic['Pclass'] == 2].mean(skipna = True)
o pc3 = titanic['Age'][titanic['Pclass'] == 3].mean(skipna = True)
o titanic['Age'].fillna(titanic.groupby('Pclass')
['Age'].transform('mean'), inplace = True)
o titanic_new = titanic.drop('Cabin', axis = 1)
o titanic_new.dropna(subset=['Embarked'], inplace = True)
o titanic_new.drop(['PassengerId','Name','Ticket'], axis = 1,
inplace = True)
HANDLING MISSING DATA
DATA CLEANING – O/Ps
AFTER MISSING VALUES HANDLING
AFTER ADDITIONAL DATA
CLEANING
FEATURE ENGINEERING
o titanic_new['Title'] = titanic['Name'].apply(lambda x: x[x.find(', ')+2 : x.find('.')])
titanic_new['Title'].value_counts()
o titanic.dropna(subset=['Cabin'], inplace = True)
titanic_new['Cabin_Letter'] = titanic['Cabin'].astype(str).str[0]
titanic_new['Cabin_Letter'].value_counts()
CONVERT CATEGORICAL FEATURES
o titanic_new = pd.get_dummies(titanic_new,columns = ['Sex','Embarked'])
CONVERT CATEGORICAL FEATURES
o titanic_new = pd.get_dummies(titanic_new,columns = ['Sex','Embarked','Title','Cabin_Letter'])
LOGISTIC REGRESSION MODEL
STEP 1 : SPLITTING THE DATA
o from sklearn.model_selection import train_test_split
o X = titanic_new.drop('Survived', axis = 1)
y = titanic_new['Survived']
o X_train, X_test, y_train, y_test = train_test_split
(X, y, train_size = 0.7, random_state = 24)
LOGISTIC REGRESSION MODEL
STEP 2 : BUILDING THE MODEL
o from sklearn.linear_model import LogisticRegression
o model = LogisticRegression(solver = 'lbfgs', max_iter=900)
o model.fit(X_train, y_train)
PREDICTION AND EVALUATION
o y_pred = model.predict(X_test)
o y_pred
o model.predict_proba(X_test)
ANALYSIS – CONFUSION MATRIX AND CLASSIFICATION
REPORT (WITHOUT FEATURE ENGINEERING)
o from sklearn.metrics import confusion_matrix,
ConfusionMatrixDisplay, classification_report
o print(confusion_matrix(y_test, y_pred))
o print(classification_report(y_test, y_pred))
o print(classification_report(y_test, y_pred))