0% found this document useful (0 votes)
7 views2 pages

File Show

Uploaded by

shafaq tanveer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views2 pages

File Show

Uploaded by

shafaq tanveer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Titanic Data Preprocessing

Eman

2024-06-16
{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)

R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring
HTML, PDF, and MS Word documents. For more details on using R Markdown see
http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as
well as the output of any embedded R code chunks within the document. You can embed an
R code chunk like this:
{r cars} summary(cars)

Including Plots
You can also embed plots, for example:
{r pressure, echo=FALSE} plot(pressure)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of
the R code that generated the plot.
library(dplyr) library(tidyr) library(caret) library(knitr) library(rmarkdown)
#Loading the dataset titanic_data <- read.csv(“titanic.csv”)
#Inspect the first few rows and structure head(titanic_data) str(titanic_data)
#Handling missing values missing_values <- sapply(titanic_data, function(x) sum(is.na(x)))
missing_values

Impute missing values for Age and Embarked


titanic_data A g e ¿ Age)] <- median(titanic_data
A g e , n a . r m=T R U E ¿ m o s t f r e q u e n t e m b a r k e d<− a s . c h a r a c t e r ¿Embarked),
decreasing = TRUE)[1])) titanic_data E m b a r k e d ¿Embarked)] <- most_frequent_embarked
Drop Cabin column
titanic_data <- titanic_data %>% select(-Cabin)

Encode Sex as a numeric indicator


titanic_data S e x <− a s . f a c t o r ¿Sex) titanic_data <- titanic_data %>% mutate(Sex =
as.numeric(Sex == “female”))

Convert Embarked to factor


titanic_data E m b a r k e d <− a s . f a c t o r ¿Embarked)

Check the structure again


str(titanic_data)
#Feature Enginering titanic_data F a mil y S i z e <−t it a ni c d a t aSibSp + titanic_data
P a r c h+ 1t it a ni c d a t aIsAlone <- ifelse(titanic_data$FamilySize == 1, 1, 0) titanic_data <-
titanic_data %>% select(-PassengerId, -Name, -Ticket)
#Splitting the Dataset set.seed(123) # For reproducibility train_index <-
createDataPartition(titanic_data$Survived, p = 0.8, list = FALSE) train_data <-
titanic_data[train_index, ] test_data <- titanic_data[-train_index, ]
#Output the dimensions of the training and testing sets cat(“Training set dimensions:”,
dim(train_data), “”) cat(“Testing set dimensions:”, dim(test_data), “”)

You might also like