0% found this document useful (0 votes)
4 views3 pages

Python Experiment

The document outlines a Python script for analyzing the Titanic dataset using machine learning techniques. It includes data preprocessing steps such as handling missing values and encoding categorical features, followed by training and evaluating three models: Naive Bayes, Decision Tree, and K-Nearest Neighbors. The accuracies of the models are reported, with Decision Tree achieving the highest accuracy of 0.804.

Uploaded by

Shubham Maurya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

Python Experiment

The document outlines a Python script for analyzing the Titanic dataset using machine learning techniques. It includes data preprocessing steps such as handling missing values and encoding categorical features, followed by training and evaluating three models: Naive Bayes, Decision Tree, and K-Nearest Neighbors. The accuracies of the models are reported, with Decision Tree achieving the highest accuracy of 0.804.

Uploaded by

Shubham Maurya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

pip install pandas numpy scikit-learn matplotlib seaborn

import pandas as pd

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

# Load Titanic dataset from seaborn

df = sns.load_dataset('titanic')

# Drop columns with too many missing values or irrelevant

df.drop(['deck', 'embark_town', 'alive', 'class', 'who', 'adult_male'], axis=1, inplace=True)

# Fill missing values

df['age'].fillna(df['age'].median(), inplace=True)

df['embarked'].fillna(df['embarked'].mode()[0], inplace=True)

# Encode categorical features

label_enc = LabelEncoder()

df['sex'] = label_enc.fit_transform(df['sex'])

df['embarked'] = label_enc.fit_transform(df['embarked'])

# Define features and target

X = df.drop('survived', axis=1)

y = df['survived']

# Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score, classification_report

# Naive Bayes

nb_model = GaussianNB()

nb_model.fit(X_train, y_train)

nb_preds = nb_model.predict(X_test)

print("Naive Bayes Accuracy:", accuracy_score(y_test, nb_preds))

print("Naive Bayes Report:\n", classification_report(y_test, nb_preds))

from sklearn.tree import DecisionTreeClassifier

# J48 equivalent - Decision Tree

j48_model = DecisionTreeClassifier(random_state=42)

j48_model.fit(X_train, y_train)

j48_preds = j48_model.predict(X_test)

print("J48 Accuracy:", accuracy_score(y_test, j48_preds))

print("J48 Report:\n", classification_report(y_test, j48_preds))

from sklearn.neighbors import KNeighborsClassifier

# KNN Classifier

knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, y_train)

knn_preds = knn_model.predict(X_test)

print("KNN Accuracy:", accuracy_score(y_test, knn_preds))

print("KNN Report:\n", classification_report(y_test, knn_preds))

print("Summary of Accuracies:")

print("Naive Bayes:", accuracy_score(y_test, nb_preds))

print("J48 Decision Tree:", accuracy_score(y_test, j48_preds))

print("KNN:", accuracy_score(y_test, knn_preds))

Naive Bayes Accuracy: 0.765

J48 Accuracy: 0.804

KNN Accuracy: 0.787

You might also like