0% found this document useful (0 votes)
17 views6 pages

Ps Project

This document loads and prepares a Titanic dataset for analysis and modeling. It performs data cleaning and preprocessing steps like removing NA values, converting variables to factors, and discretizing the age variable. It also includes several data visualization and exploration steps like correlation plots, bar plots of survival counts by sex and class, and density/bar plots of age. Finally, it fits a prediction model, makes predictions on a test set, and calculates the accuracy.

Uploaded by

Sowmya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Ps Project

This document loads and prepares a Titanic dataset for analysis and modeling. It performs data cleaning and preprocessing steps like removing NA values, converting variables to factors, and discretizing the age variable. It also includes several data visualization and exploration steps like correlation plots, bar plots of survival counts by sex and class, and density/bar plots of age. Finally, it fits a prediction model, makes predictions on a test set, and calculates the accuracy.

Uploaded by

Sowmya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

data.frame = read.csv(“.../path_to_/train.csv”, na.

strings = “”)
install.packages(‘psych’)
library(psych)
View(data.frame)
library(Amelia)
missmap(data.frame, col=c(“black”, “grey”))
library(dplyr)
data.frame = select(data.frame, Survived, Pclass, Age, Sex, SibSp, Parch)
data.frame = na.omit(data.frame)
> str(data.frame)'data.frame': 891 obs. of 6 variables:
$ Survived: int 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...

To convert them into categorical variables (or factors), use the factor() function.
data.frame$Survived = factor(data.frame$Survived)
data.frame$Pclass = factor(data.frame$Pclass, order=TRUE, levels = c(3, 2, 1))
Data visualization
Correlation plot

library(GGally)ggcorr(data.frame,
nbreaks = 6,
label = TRUE,
label_size = 3,
color = “grey50”)
Survived count

library(ggplot2)
ggplot(data.frame, aes(x = Survived)) +
geom_bar(width=0.5, fill = "coral") +
geom_text(stat='count', aes(label=stat(count)), vjust=-0.5) +
theme_classic()

Survived count by Sex

ggplot(data.frame, aes(x = Survived, fill=Sex)) +


geom_bar(position = position_dodge()) +
geom_text(stat=’count’,
aes(label=stat(count)),
position = position_dodge(width=1), vjust=-0.5)+
theme_classic()

Survival by Pclass

ggplot(data.frame, aes(x = Survived, fill=Pclass)) +


geom_bar(position = position_dodge()) +
geom_text(stat=’count’,
aes(label=stat(count)),
position = position_dodge(width=1),
vjust=-0.5)+
theme_classic()
Age Density

ggplot(data.frame, aes(x = Age)) +


geom_density(fill=’coral’)
Survival by Age

# Discretize age to plot survival


data.frame$Discretized.age = cut(data.frame$Age,
c(0,10,20,30,40,50,60,70,80,100))# Plot discretized age
ggplot(data.frame, aes(x = Discretized.age, fill=Survived)) +
geom_bar(position = position_dodge()) +
geom_text(stat='count', aes(label=stat(count)), position =
position_dodge(width=1), vjust=-0.5)+
theme_classic()data.frame$Discretized.age = NULL

predicted = predict(fit, test, type = type)


table = table(test$Survived, predicted)
dt_accuracy = sum(diag(table_mat)) / sum(table_mat)
paste("The accuracy is : ", dt_accuracy)

predicted
0 1
0 113 19 | (TN) (FP)
1 18 65 | (FN) (TP)

The accuracy is calculated using (TP + TN)/(TP + TN + FP + FN). I


got an accuracy of 81.11%

You might also like