0% found this document useful (0 votes)
8 views2 pages

Tutorial 10

This tutorial discusses using logistic regression and naïve Bayes classification to predict survival on the Titanic using passenger data. It shows how to load the Titanic dataset, perform logistic regression and naïve Bayes classification on it, and compare the ROC curves of the two classifiers visually.

Uploaded by

Low Jia Hui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views2 pages

Tutorial 10

This tutorial discusses using logistic regression and naïve Bayes classification to predict survival on the Titanic using passenger data. It shows how to load the Titanic dataset, perform logistic regression and naïve Bayes classification on it, and compare the ROC curves of the two classifiers visually.

Uploaded by

Low Jia Hui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Tutorial 10

DSA1101
Introduction to Data Science
November 9, 2018

Exercise 1. Logistic regression in R


In tutorial 7, we looked at the CSV dataset “Titanic.csv” which provides information on the
fate of passengers on the fatal maiden voyage of the ocean liner Titanic, and includes the
variables economic status (class), sex, age and survival. We trained a naı̈ve Bayes classifier
using this dataset, and predict survival. This week, we will use logistic regression to predict
survival, and compare the performances of the two classifiers visually using an ROC curve.

(a) Load the dataset “Titanic.csv” which has been posted under the folder for Tutorial 7.

1 Titanic _ dataset = read . csv ( " Titanic . csv " )


2 dim ( Titanic _ dataset )
3 head ( Titanic _ dataset )

(b) Perform logistic regression of ‘Survived’ on all the feature variables.


1 Survival _ logistic <- glm ( Survived ~ . ,
2 data = Titanic _ dataset ,
3 family = binomial ( link = " logit " ) )

(c) Perform naı̈ve Bayes classification of ‘Survived’ based on all the feature variables.
1 library ( e1071 )
2 Survival _ Nbayes <- naiveBayes ( Survived ~ . ,
3 data = Titanic _ dataset )

1
(d) Observe and compare the ROC curves for the two classifiers.
1 library ( ROCR )
2 pred = predict ( Survival _ logistic , type = " response " )
3 predObj = prediction ( pred , Titanic _ dataset $ Survived )
4 rocObj = performance ( predObj , measure = " tpr " , x . measure = " fpr " )
5 plot ( rocObj )
6
7
8 nb _ prediction <- predict ( Survival _ Nbayes , Titanic _ dataset , type = ’ raw ’)
9 score <- nb _ prediction [ , 2]
10 pred _ nb <- prediction ( score , Titanic _ dataset $ Survived )
11 roc _ nb = performance ( pred _ nb , measure = " tpr " , x . measure = " fpr " )
12 plot ( roc _ nb , add = TRUE , col = 2)
13
14
15 legend ( " bottomright " , c ( " logisic regression " ," naive Bayes " ) , col = c ( "
black " ," red " ) , lty =1)

You might also like