0% found this document useful (0 votes)
430 views15 pages

Business Analytics Presentation: Titanic Survival Analysis and Prediction

The document is a presentation on analyzing the Titanic dataset using various machine learning algorithms like decision trees and KNN. It first describes exploratory data analysis including loading data, checking for missing values, data visualization and feature selection. Then it discusses building a decision tree model with train-test split and calculating accuracy. It also covers implementing K-fold cross validation with decision trees. Finally, it explains the working of KNN algorithm and shows the steps to find optimal K and predict survival values on test data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
430 views15 pages

Business Analytics Presentation: Titanic Survival Analysis and Prediction

The document is a presentation on analyzing the Titanic dataset using various machine learning algorithms like decision trees and KNN. It first describes exploratory data analysis including loading data, checking for missing values, data visualization and feature selection. Then it discusses building a decision tree model with train-test split and calculating accuracy. It also covers implementing K-fold cross validation with decision trees. Finally, it explains the working of KNN algorithm and shows the steps to find optimal K and predict survival values on test data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Business Analytics

Presentation
Titanic Survival Analysis and
Prediction

PREPARED BY -         DIVYANSH SINGH                           - 20BM63030


                                    PRANAV KUMAR                            - 20BM63064
                                    RUMANI CHAKRABORTY              - 20BM63076
                                    SHUMI MITRA                                 - 20BM63086
                                    TALATI SAURABH RASHMIKANT - 20BM63096

TEAM NUMBER       - 04
TEAM NAME            - MAVERICKS
1
Exploratory Data
Analysis
An approach to analyzing data sets to summarize
their main characteristics, often with visual
methods.

2
Process Adopted
1. Loading &
getting detailed 2. Checking for
3. Visualization
statistics of the missing data 
dataset

5. Critically
4. Filling the 6. Appending the
analyzing the
missing data modified fields
essential data

7. Generating the
final table

3
Initial table after loading the data set

Upon checking for missing data


4
• Since there is quite a lot of data in the dataset to be gone
through, visualizing will be a better tool for analysis
• 20% of data in Age column contains null, while too many values in
Cabin column are null
• Visualising and filling necessary data in the Age column 
• Dropping the Cabin Column

5
Visualization
6
Final Table Obtained Upon Correlating

7
Decision Tree
[ with train-test break up]
Decision tree is the most powerful and popular tool for
classification and prediction.

8
Understanding Decisi
on Tree​
1. LOADING THE LIBRARIES

2. FEATURE & TARGET SELECTION

3. SPLITTING THE DATASET INTO TRAINING SET AND TEST SET

4. CREATING & TRAINING DECISION TREE CLASSIFIER OBJECT


Flowchart like tree structure, where 
5. PREDICTING THE RESPONSE FOR TEST DATASET 1. Each internal node denotes a
feature/ attribute
6. PRINTING THE ACCURACY
2. Each branch represents the decision rule
7. VISUALIZING THE DECISION TREE 3. Each leaf node (terminal node) represents
the outcome.

9
K Fold
Cross Partition • Partition the dataset into k equal sized partitions

Validation Select • Select 1 partition as the validation data

Use • Use the remaining k-1 as training data

Train • Train the model and determine the accuracy

Repeat • Repeat the process k times, selecting a different


partition each time

Average • Average the accuracy results


10
Result
Accuracy: 0.7574626865671642
11
KNN Implementation
KNN is a non-parametric and lazy learning algorithm

12
Understanding KNN
• K is the number of nearest neighbors. The number of neighbors is
the core deciding factor
•    KNN has the following basic steps:
 Calculate distance
 Find closest neighbors
 Vote for labels

1 2 3 4 5 6
 Finding accuracy for   Plotting accuracy  loading & displaying Copying the test dataset Predicting survival values  Displaying the final 
selected number of  corresponding to value of the test dataset & analysing it to get the submission
neighbours   K in KNN relevant columns

13
Results Obtained

14
THANK
YOU !!!

15

You might also like