0% found this document useful (0 votes)
20 views

ML Report-1

Uploaded by

atharvalitake24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

ML Report-1

Uploaded by

atharvalitake24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

PUNE INSTITUTE OF COMPUTER

TECHNOLOGY, DHANKAWADI, PUNE-43

Mini Project Report – Machine Learning

‘Survival Prediction on the Titanic Using Machine Learning


Techniques’

Submitted By
Name: Atharva Litake
Roll no: 41143 Class: BE-1

Name: Mihir Deshpande


Roll no: 41150 Class: BE-1

Under the guidance of


Prof. RAJNI JADHAV

DEPARTMENT OF COMPUTER ENGINEERING


Academic Year 2024-25
Contents
1. TITLE

2. PROBLEM DEFINITION

3. LEARNING OBJECTIVES

4. LEARNING OUTCOMES

5. ABSTRACT

6. TECHNICAL DETAILS ABOUT THE PROJECT

7. GLIMPSE OF THE PROJECT

8. CONCLUSION
1. TITLE:
Survival Prediction on the Titanic Using Machine Learning
Techniques

2. PROBLEM DEFINITION:

The Titanic shipwreck is one of the most infamous maritime disasters


in history. A variety of factors such as age, gender, and socio-
economic class may have influenced the survival rates of its
passengers. This project aims to build a machine learning model to
predict whether a passenger survived the Titanic disaster based on the
available features such as name, age, gender, and socio-economic
status. The challenge is to analyze the data and construct a model that
can make accurate predictions.
The problem statement given specifies building a machine learning
model that predicts the type of people who survived the Titanic
shipwreck using passenger data (i.e. name, age, gender, socio-
economic class, etc.). The Titanic dataset contains information on 891
passengers, with features such as PassengerID, Name, Age, Sex,
Pclass (ticket class), SibSp (siblings/spouses aboard), Parch
(parents/children aboard), Fare, Embarked (port of embarkation), and
the target variable Survived. The target variable indicates whether a
passenger survived (1) or not (0). This dataset is widely used for
classification tasks in machine learning due to its diversity of features
and real-world applicability.
In this project, we will use the Titanic dataset to predict passenger
survival through a machine learning model. Key steps include data
preprocessing, where we handle missing values, encode categorical
variables (e.g., Sex, Embarked), and normalize features like Fare.
We'll then conduct Exploratory Data Analysis (EDA) using Seaborn
and Matplotlib to visualize survival trends. A machine learning model
will be trained on the data and evaluated using metrics like accuracy,
precision, and F1-score, with cross-validation ensuring its reliability.
The goal is to predict passenger survival and assess the model's
accuracy.
3. LEARNING OBJECTIVES:

● Understand how to preprocess and clean real-world datasets (handling


missing values, encoding categorical data).
● Learn the fundamentals of classification algorithms in machine learning.
● Explore feature engineering and its impact on model performance.
● Implement and evaluate different machine learning models for
classification tasks.
● Develop skills to interpret the output of models and extract meaningful
insights.
● Gain experience in using data visualization techniques to understand data
distributions and relationships between variables.
● Understand how to split datasets into training and testing sets for model
validation.
● Learn to tune hyperparameters to improve model performance.
● Explore techniques for handling imbalanced datasets, if survival rates are
skewed.

4. LEARNING OUTCOMES:

 Data Preprocessing Proficiency: Ability to clean and preprocess real-


world datasets, including handling missing values, encoding categorical
variables, and scaling features.

 Data Visualization Skills: Proficiency in using tools like Matplotlib and


Seaborn to visualize and interpret data trends, distributions, and
correlations.

 Feature Engineering: Understanding how to select and engineer features


to improve model performance and accuracy.

 Model Building: Experience in implementing and training classification


models using machine learning frameworks such as Scikit-learn.

 Model Evaluation: Ability to evaluate models using various performance


metrics like accuracy, precision, recall, F1-score, and confusion matrices.
5. ABSTRACT:

This project aims to predict the survival of passengers from the Titanic
disaster using machine learning techniques, utilizing the well-known
Titanic dataset. The dataset includes features such as age, gender, socio-
economic class, fare, and embarkation port, providing a rich foundation
for analysis.
The project begins with data loading and exploratory data analysis
(EDA) to understand the structure of the dataset and the relationships
between variables. Various visualizations, such as histograms and bar
plots, reveal key trends in survival rates among different demographics.
Following EDA, the dataset is cleaned by removing non-essential
features and handling missing values in critical columns like 'Age' and
'Embarked.'
The models were trained on a training set derived from the original
dataset, followed by predictions on a separate test set. Data
preprocessing techniques were applied consistently to the test set to
maintain integrity, ensuring that it mirrored the training data structure.
The final predictions were compiled into a submission file for
evaluation.
Feature engineering is performed by encoding categorical variables and
transforming the 'Fare' feature to normalize its distribution. Multiple
classification algorithms, including Decision Trees, Random Forests, are
implemented and evaluated for accuracy. The models are trained on a
portion of the dataset and tested on a separate test set, maintaining
consistent preprocessing steps.
The final model achieves an accuracy of 73.8%, demonstrating its
effectiveness in predicting survival outcomes based on the available
features. This project illustrates the importance of data preprocessing,
feature engineering, and model evaluation in machine learning,
providing insights into the factors influencing survival during the Titanic
disaster.
TECHNICAL DETAILS ABOUT THE PROJECT

The project aims to predict the survival of passengers from the Titanic disaster
using machine learning techniques, leveraging the Titanic dataset which
includes features such as age, gender, class, and fare.

1. Libraries Used

 Pandas: For data manipulation and analysis.


 NumPy: For numerical operations.
 Seaborn & Matplotlib: For data visualization and exploratory data analysis
(EDA).
 Scikit-learn: For machine learning model implementation and evaluation.
 LightGBM: For gradient boosting.
 XGBoost: For extreme gradient boosting.

2. Dataset Description

 Train Data: Contains information on 891 passengers, including features such


as PassengerId, Name, Sex, Age, SibSp, Parch, Fare, Embarked, and the target
variable Survived.
 Test Data: Contains similar features but lacks the Survived column, which is
what we aim to predict.

3. Data Loading

train_data = pd.read_csv("data/train.csv")
test = pd.read_csv("data/test.csv")

4. Exploratory Data Analysis (EDA)

 Info Summary: Used train_data.info() to understand the structure of the


dataset.
 Numerical Features Visualization: Histograms plotted for numerical features
to observe distributions.
 Survival Rate Analysis: Bar plots visualized survival rates by passenger class
and other categorical features.
 Pivot Tables: Created pivot tables to analyse the relationship between survival
and other features (e.g., Sex, Pclass, Embarked).
5. Data Cleaning

 Handling Missing Values:


o Dropped non-essential columns: PassengerId, Cabin, Name, Ticket.
o Filled missing values in Age with the mean age and in Embarked with
the mode.
 Final Missing Values Check: Confirmed no remaining null values in the
dataset.

6. Feature Engineering

 Fare Transformation: Log transformation applied to the Fare column to


normalize its distribution.
 Label Encoding: Categorical variables Sex and Embarked were transformed
into numerical format using LabelEncoder.
 Data Preparation for Modeling
 Feature and Target Split: Separated the dataset into features X and target
variable y.

7. Train-Test Split:

Divided the data into training and testing sets with a 75-25 split.

8. Model Selection and Evaluation

 Model Training and Evaluation Function: Created a function to train the


model, predict outcomes, and evaluate accuracy.
 Model Testing: Trained multiple classifiers including Decision Tree, Random
Forest and Logistic Regression, assessing accuracy.

9. Predicting on Test Data

 Preprocessing Test Data: Applied similar preprocessing steps on the test


dataset, including filling missing values and label encoding and use the trained
model to predict survival on the test set.

10. Model Performance

 Final Model Accuracy: The model achieved an accuracy of 73.8%, indicating


a moderately effective prediction of survival based on the available features.
7. GLIMPSE OF THE PROJECT & DEPLOYMENT:
8.CONCLUSION

The project demonstrates how machine learning can be applied to historical data to
predict survival outcomes in a real-world disaster scenario. By preprocessing data,
selecting relevant features, and choosing the best-performing machine learning
model, we can make accurate predictions regarding Titanic passengers' survival.
This project also emphasizes the importance of feature engineering and model
evaluation in the process of building machine learning solutions.
The insights gained underscore the importance of dataS handling and model
evaluation in predictive analytics. This project serves as a valuable example of
applying machine learning techniques to historical datasets, paving the way for
further exploration in data science and its applications in real-world scenarios.

You might also like