Data Handling in R Programming notes
Data Handling in R Programming notes
R Programming Language is an open-source programming language that is widely used as a
statistical software and data analysis tool. Data Frames in R Language are generic data
objects of R that are used to store tabular data.
Data frames can also be interpreted as matrices where each column of a matrix can be of
different data types. R DataFrame is made up of three principal components, the data, rows,
and columns.
R – Data Frames
Output:
friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni
One can get the structure of the R data frame using str() function in R.
It can display even the internal structure of large lists which are nested. It provides one-liner
output for the basic R objects letting the user know about the object and its constituents.
R
# R program to get the
# structure of the data frame
In R, one can perform various types of operations on a data frame like accessing rows and
columns, selecting the subset of the data frame, editing data frames, delete rows and
columns in a data frame, etc.
data
data
Output:
In the above code, we first created a data frame called data with three
columns: friend_id, friend_name, and location. To remove a row with friend_id equal to 3,
we used the subset() function and specified the condition friend_id != 3. This removed the
row with friend_id equal to 3.
data
Output:
To remove the location column, we used the select() function and specified -location.
The – sign indicates that we want to remove the location column. The resulting data
frame data will have only two columns: friend_id and friend_name.
cat("\nDataframe 2:\n")
print(df2)
cat("\nDataframe 2:\n")
print(df2)
Importing Files in R
Let us take some basic files to import in R to learn the approach of importing data
files:
Importing Text Files
In R language, text files can be read using read.table() function.
Syntax:
read.table(filename, header = FALSE, sep = "")
Parameters:
header represents if the file contains header row or not
sep represents the delimiter value used in file
To know about all the arguments of read.table(), execute below command in R:
help("read.table")
Example:
Suppose a file is present in the current working directory and using R
programming, import the data from that particular text file and the content of text
file is as shown:
100 A a
200 B b
300 C c
400 D d
500 E e
600 F f
Values are separated by white spaces.
getwd()
print(data)
print(class(data))
Output:
[1] "C:/Users/xyz/Documents"
V1 V2 V3
1 100 A a
2 200 B b
3 300 C c
4 400 D d
5 500 E e
6 600 F f
[1] "data.frame"
Syntax:
read.csv(filename, header = FALSE, sep = "")
Parameters:
header represents if the file contains header row or not
sep represents the delimiter value used in file
To know about all the arguments of read.csv(), execute below command in R:
help("read.csv")
Example:
Suppose a csv file is present in the current working directory and the content of
file is as shown:
# Check current working directory
getwd()
print(data)
print(class(data))
Output:
[1] "C:/Users/xyz/Documents"
V1 V2 V3
1 100 AB ab
2 200 CD cd
3 300 EF ef
4 400 GH gh
5 500 IJ ij
[1] "data.frame"
Importing Excel File
To read and import the excel files, “xlsx” package is required to use
the read.xlsx() function. To read “.xls” excel files, “gdata” package is required
to use read.xls() function.
Syntax:
read.xlsx(filename, sheetIndex)
OR
read.xlsx(filename, sheetName)
Parameters:
sheetIndex specifies number of sheet
sheetName specifies name of sheet
To know about all the arguments of read.xlsx(), execute below command in R:
help("read.xlsx")
Example:
Suppose a xlsx file is present in the current working directory and the content of
file is as shown:
install.packages("xlsx")
library(xlsx)
# Check current working directory
getwd()
sheetIndex = 1,
header = FALSE)
print(data)
print(class(data))
Output:
[1] "C:/Users/xyz/Documents"
X1 X2 X3
1 1000 ABC abc
2 2000 DEF def
3 3000 GHI ghi
4 4000 JKL jkl
5 5000 MNO mno
[1] "data.frame"
Exporting files in R
Below are some methods to export the data to a file in R:
Using console
cat() function in R is used to output the object to console. It can be also used
as redirecting the output to a particular file.
Syntax:
cat(..., file)
Parameter:
file specifies the filename to which output has to redirected
To know about all the arguments of cat(), execute below command in R:
help("cat")
Example:
str = "World"
Output:
Above code creates a new file and redirects the output of cat(). The contents
of the file are shown below after executing the code-
Hello, World
Using sink() function:
sink() function is used to redirect all the outputs from cat() and print() to the
given filename.
Syntax:
sink(filename) # begins redirecting output to file
.
.
sink()
To know about all the arguments of sink(), execute below command in R:
help("sink")
Example:
sink("SinkExample.txt")
print(mean(x))
print(class(x))
print(median(x))
sink()
Output:
The above code creates a new file and redirects the output. The contents of the
file are shown below after executing the code-
[1] 4.6
[1] "numeric"
[1] 4
# Create vectors
# Create matrix
Output:
Above code creates a new file and redirects the output. The contents of the file is
shown below after executing the code-
Working with CSV files in R Programming
R CSV Files
R CSV Files are text files wherein the values of each row are separated by a
delimiter, as in a comma or a tab. In this article, we will use the following sample
CSV file.
Getting and Setting the Working Directory with R CSV Files
R
print(getwd())
setwd("/web/com")
print(getwd())
Output:
[1] "C:/Users/GFG19565/Documents"
[1] "C:/Users/GFG19565/Documents"
With the help of getwd() function we can get the current working directory and
with the help of setwd()function we can also set a new working directory.
Input as R CSV Files
id, name, department, salary, projects
1, A, IT, 60754, 4
2, B, Tech, 59640, 2
3, C, Marketing, 69040, 8
4, D, Marketing, 65043, 5
5, E, Tech, 59943, 2
6, F, IT, 65000, 5
7, G, HR, 69000, 7
We can save this file in notepad and give name sample.csv so we can upload this
in R Programming Language.
print(csv_data)
print (ncol(csv_data))
print(nrow(csv_data))
Output:
id, name, department, salary, projects
1 1 A HR 60754 14
2 2 B Tech 59640 3
3 3 C Marketing 69040 8
4 4 D HR 65043 5
5 5 E Tech 59943 2
6 6 F IT 65000 5
7 7 G HR 69000 7
[1] 4
[1] 7
We can upload the R Csv Files by passing its directory the header is by default
set to a TRUE value in the function. The head is not included in the count of rows,
therefore this CSV has 7 rows and 4 columns.
print (min_pro)
Output:
2
Aggregator functions (min, max, count etc.) can be applied on the CSV data. Here
the min() function is applied on projects column using $ symbol. The minimum
number of projects which is 2 is returned.
R
# Selecting 'name' and 'salary' columns for employees with salary greater than
60000
print(result)
name salary
1 A 60754
3 C 69040
4 D 65043
6 F 65000
7 G 69000
The subset of the data that is created is stored as a data frame satisfying the
conditions specified as the arguments of the function. Selecting ‘name’ and
‘salary’ columns for employees with salary greater than 60000.
print(result)
Output:
HR IT Marketing Tech
69000.0 62877.0 67041.5 59791.5
Sample_data2.xlsx:
Reading Files:
The two excel files Sample_data1.xlsx and Sample_data2.xlsx and read from
the working directory.
R
install.packages("readxl")
library(readxl)
head(Data1)
head(Data2)
The excel files are loaded into variables Data_1 and Data_2 as
a dataframes and then variable Data_1 and Data_2 is called that prints the
dataset.
Modifying Files
The Sample_data1.xlsx file and Sample_file2.xlsx are modified.
R
Data1$Pclass <- 0
head(Data1)
head(Data2)
The value of the P-class attribute or variable of Data1 data is modified to 0. The
value of Embarked attribute or variable of Data2 is modified to S.
R
Data1
Data2
The – sign is used to delete columns or attributes from the dataset. Column 2 is
deleted from the Data1 dataset and Column 3 is deleted from the Data2 dataset.
Merging Files
The two excel datasets Data1 and Data2 are merged using merge() function
which is in base package and comes pre-installed in R.
R
# Merging Files
Data1 and Data2 are merged with each other and the resultant file is stored in
the Data3 variable.
Creating new columns
New columns or features can be easily created in Data1 and Data2 datasets.
R
Data1$Num < - 0
head(Data1)
head(Data2)
Writing Files
After performing all operations, Data1 and Data2 are written into new files
using write.xlsx() function built in writexl package.
R
install.packages("writexl")
# Loading package
library(writexl)
# Writing Data1
write_xlsx(Data1, "New_Data1.xlsx")
# Writing Data2
write_xlsx(Data2, "New_Data2.xlsx")
The Data1 dataset is written New_Data1.xlsx file and Data2 dataset is written
in New_Data2.xlsx file. Both the files are saved in the present working
directory.