ML Mini Project - Docx New (A)
ML Mini Project - Docx New (A)
ML Mini Project - Docx New (A)
A Project Report
On
“Predicting Survival of Titanic Passengers Using Machine Learning”
SUBMITTED BY
Guided by Prof.
Prof. Mahesh Korade
2024-25
INSTITUTE OF ENGINEERING, NASHIK
Certificate
This is to certify that
Aditya Sanjay salve(A-44) has completed the necessary Mini Project work &
prepared the report on
“Predicting Survival of Titanic Passengers Using Machine Learning”
in satisfactory manner as a fulfilment of the requirement of the award
of degree of Bachelor of Computer Engineering in the Academic year
2024-25
Date:
Place: Nashik
We take this opportunity to thank all those who have contributed in successful
completion of this Project work. I would like to express my sincere thanks to my Project
Guided by Prof. Mahesh Korade, who has encouraged me to work on this project and
guided me whenever required.
We also would like to express our gratitude to our H.O.D. Dr. P.M.Yawalkar for
giving us opportunities to undertake this project work at Met Institute of engineering,
Nashik.
We are extremely grateful to our Principal Dr. V. P. Wani for his constant inspiration
and keen interest to make the project and presentation absolutely flawless.
At the last but not the least we would like to thank our Teaching staff member,
Workshop staff member, Friends and family member for their timely co-operation and help.
By,
Sanket Khandu Sadgir(A-43)
Aditya Sanjay salve(A-44)
Vaibhav savliram Bodke(A-09)
Sahil pathania (A-34)
Course Objective:
• To understand the need for Machine learning
• To explore various data pre-processing methods.
• To study and understand classification methods
• To understand the need for multi-class classifiers.
• To learn the working of clustering algorithms To learn fundamental
neural network algorithms.
Course Outcome:
On completion of the course, student will be able to
CO1: Identify the needs and challenges of machine learning for real time
applications.
CO2: Apply various data pre-processing techniques to simplify and speed up
machine learning algorithms.
CO3: Select and apply appropriately supervised machine learning algorithms for
real time applications.
CO4: Implement variants of multi-class classifier and measure its performance.
CO5: Compare and contrast different clustering algorithms.
CO6: Design a neural network for solving engineering problems.
Abstract
The Titanic shipwreck in 1912 remains one of history's most notorious maritime
disasters, leading to the loss of over 1,500 lives. The event sparked significant interest in
understanding the dynamics that influenced survival during the catastrophe. Extensive
datasets containing passenger information have been compiled, enabling researchers to
analyze the contributing factors to survival using modern data science techniques. This
project aims to build a predictive machine learning model to determine the likelihood of
a passenger's survival based on various attributes, including age, gender, class, and fare.
By applying various machine learning methodologies such as logistic regression, random
forests, and support vector machines.
we developed a predictive model that sheds light on which factors were most influential
in determining survival outcomes. The results reveal that the random forests algorithm
provided the most accurate predictions, highlighting socio-economic class and gender as
pivotal factors affecting survival rates.
Contents
04 Implementation 11
05 Future Scope 12
06 Screenshots 13
07 Conclusion 17
1. Introduction
The introduction is a crucial section of your project report as it sets the stage for the reader,
providing an overview of the project's context, objectives, and significance. It's your
opportunity to grab the reader's attention and explain what the Titanic Survival Prediction
project is all about.
1. Project Title: Start by stating the project's title, "Predicting Survival of Titanic
Passengers Using Machine Learning".
2. Background:
The sinking of the RMS Titanic on April 15, 1912, remains one of the most infamous
maritime disasters in history, leading to the loss of over 1,500 lives. This tragedy not
only highlighted the limitations of early 20th-century safety measures but also raised
questions about the factors influencing survival during the catastrophe. The Titanic
dataset, which includes comprehensive records of passengers and their characteristics,
provides an excellent foundation for analyzing these factors. This project aims to apply
machine learning techniques to predict the survival of passengers based on attributes
such as age, gender, socio-economic class, and more, shedding light on the complex
interplay of these variables during a crisis.
3. Project Objectives: Clearly state the objectives of your project. For example:
o Analyze historical passenger data from the Titanic disaster to identify trends and
patterns influencing survival.
o Develop predictive models utilizing machine learning algorithms to forecast the
likelihood of survival for passengers based on their characteristics.
o Evaluate and compare the performance of various machine learning models,
including Logistic Regression, Random Forest, and Support Vector Machines.
o Provide insights into the key factors affecting survival outcomes during the
Titanic disaster.
4. Significance of the Project: Understanding the dynamics of survival during historical
disasters like the Titanic tragedy contributes to the broader fields of data science and
machine learning. This project not only provides valuable insights into human behavior
in emergencies but also enhances our understanding of predictive modeling techniques.
The findings will be beneficial for historians, researchers, and data scientists, enabling
them to apply these methodologies to similar situations or datasets.
5. Scope of the Project: This project involves a comprehensive analysis of the Titanic
dataset, including data preprocessing, exploratory data analysis, and the development
of predictive models. Various machine learning techniques will be employed to forecast
survival probabilities, focusing on the identification of significant features that
influence outcomes. While the study primarily centers on historical analysis, the
methodologies and insights gained could be applied to other contexts involving
survival prediction.
This report will elucidate the design and development of the Titanic Survival Prediction
model, detailing the methodologies, algorithms, and techniques employed in the analysis.
Additionally, the report will cover the evaluation of model performance, highlighting key
findings and implications of the results. The project report is organized to provide a
thorough understanding of the Titanic Survival Prediction project and its contributions to
the fields of data science and historical analysis.
2. Objectives
The primary objectives of this project are as follows:
• Model Development: To create a robust machine learning model that predicts the
survival of Titanic passengers based on their attributes, enabling users to understand
the likelihood of survival under similar conditions.
• Key Factor Identification: To identify and analyze the critical features contributing to
survival, such as age, gender, socio-economic class, and family relationships, providing
insights into the demographics of survival.
• Algorithm Comparison: To implement and compare the performance of various
machine learning algorithms—specifically Logistic Regression, Random Forest, and
Support Vector Machines—to ascertain the most effective approach for this
classification problem.
• Data Preprocessing: To conduct a thorough analysis of the dataset, addressing issues
such as missing values, data imbalances, and encoding of categorical variables to
prepare the data for machine learning algorithms.
• Model Evaluation: To rigorously evaluate the model’s performance using a range of
metrics (accuracy, precision, recall, F1-score), ensuring the reliability and validity of
predictions made by the model.
3. Requirements
3. Requirements
3.1 Hardware Requirements
To effectively run the machine learning algorithms and handle the dataset, the following
hardware specifications are recommended:
• Processor: Intel Core i5 (7th generation) or higher, ensuring sufficient computational
power for processing the dataset and training models.
• RAM: A minimum of 128MB, though 8GB or more is preferable for optimal
performance during model training and evaluation.
• Hard Disk: At least 20GB of free space to accommodate the dataset, software tools,
and any generated outputs (models, graphs, etc.).
3.2 Software Requirements
The software environment for this project includes:
• Platform: Jupyter Notebook, which offers an interactive coding environment
conducive to data analysis and visualization.
• Programming Language: Python, chosen for its extensive libraries and community
support in data science and machine learning.
• Libraries and Tools:
o Pandas: Essential for data manipulation and analysis, enabling efficient
handling of datasets.
o NumPy: Provides support for numerical operations and mathematical functions
essential for data analysis.
o Scikit-learn: A powerful library that provides simple and efficient tools for
machine learning and statistical modeling.
o Matplotlib and Seaborn: Libraries used for data visualization, helping to create
informative plots to analyze and present findings effectively.
4. Implementation
The implementation of the Titanic survival prediction project involves several stages, from
data preprocessing to model training and evaluation.
4.1 Data Preprocessing
• Handling Missing Values: The dataset contains missing values, particularly in the Age
and Cabin columns. To address this:
o The Cabin column, which has a significant number of missing
entries, is dropped to simplify the model.
• Encoding Categorical Variables: Categorical variables, such as Sex and Embarked,
are converted into numerical values using techniques like Label Encoding. This
transformation enables machine learning algorithms to interpret these features
effectively.
• Feature Selection: A thorough analysis is conducted to select the most relevant
features for predicting survival. The selected features include:
o Pclass: Passenger class (1st, 2nd, 3rd) o Sex: Gender of the
passenger o Age: Age of the passenger o SibSp: Number of
siblings/spouses aboard o Parch: Number of parents/children
aboard o Fare: Ticket fare paid o Embarked: Port of
embarkation
4.2 Model Training
• Logistic Regression: A fundamental model for binary classification, logistic
regression predicts the probability of survival based on a linear combination of the
input features.
• Random Forest: This ensemble method enhances prediction accuracy by
constructing multiple decision trees during training and outputting the mode of their
predictions. Random Forest is known for its robustness against overfitting.
• Support Vector Machines (SVM): This classifier works by finding the optimal
hyperplane that separates the classes in a high-dimensional space, making it
particularly effective in complex datasets.
4.3 Model Evaluation
• Metrics: Various performance metrics, including accuracy, precision, recall, and
F1-score, are employed to evaluate the models. Cross-validation techniques are
utilized to ensure that the models generalize well to unseen data.
• Results: The performance of the models is compared, revealing that Random
Forest outperforms others regarding accuracy, precision, and recall, establishing it as
the most suitable model for this dataset.
5. Future Scope