0% found this document useful (0 votes)
20 views3 pages

R Project Data Cleaning Notes

The document outlines a data cleaning process in R, including importing data, counting occurrences of specific variables, and replacing values based on conditions. It demonstrates the use of the dplyr package for data manipulation, such as mutating and selecting columns. The final dataset is organized by renaming and rearranging columns for clarity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views3 pages

R Project Data Cleaning Notes

The document outlines a data cleaning process in R, including importing data, counting occurrences of specific variables, and replacing values based on conditions. It demonstrates the use of the dplyr package for data manipulation, such as mutating and selecting columns. The final dataset is organized by renaming and rearranging columns for clarity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Cleaning

#Data cleaning in R
setwd("E:\\R programme\\R.Directory")
#Read in Data
library(readr)
#Import the raw data into Rscript
df<-read_csv("Data_Cleaning.csv")
df
View(df)
#Apply the count function from dplyr
library(dplyr)
#count function without pipeline
count(df,Facility)
#With pipe
df %>% count(Facility)
#count JobLevel
count(df,JobLevel)
#Replace sepecific value for spaecific case
str(df)
newdf1<- df %>%
mutate(Facility=replace(Facility,match("EP1202",EmpID),"Beaverton"))

newdf1<- newdf1 %>%


mutate(Facility=replace(Facility,match("EP1207",EmpID),"Beaverton"))
newdf1<-newdf1 %>%
mutate(JobLevel=replace(JobLevel,match('EP1210',EmpID), 1))
#Replace the numeric value with NA in joblevel variable column
newdf1<- newdf1 %>%
mutate(JobLevel=replace(JobLevel,match("EP1203",EmpID), NA))
#Replace Specific value for all cases with a particular
value(OnboardingCompleted)
newdf1<-newdf1 %>%
mutate(OnboardingCompleted=ifelse(OnboardingCompleted=="No",
"yes",OnboardingCompleted))
newdf1<-newdf1 %>%
mutate(OnboardingCompleted=ifelse(OnboardingCompleted=="yes",
1,OnboardingCompleted))
#Change it back 1 to yes
newdf1$OnboardingCompleted[newdf1$OnboardingCompleted==1]<-"yes"
#Change the name of variable
newdf1$HireDate<-newdf1$StartDate
#Delete old column
newdf1$StartDate<-NULL
#arrange the column in the previous position.
newdf1<-select(newdf1,EmpID,Facility,JobLevel,HireDate,everything())

You might also like