0% found this document useful (0 votes)
14 views

Data Handling in R Programming notes

R programing files for the present generation

Uploaded by

nain09364
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Data Handling in R Programming notes

R programing files for the present generation

Uploaded by

nain09364
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

R – Data Frames


R Programming Language is an open-source programming language that is widely used as a
statistical software and data analysis tool. Data Frames in R Language are generic data
objects of R that are used to store tabular data.
Data frames can also be interpreted as matrices where each column of a matrix can be of
different data types. R DataFrame is made up of three principal components, the data, rows,
and columns.

R Data Frames Structure


As you can see in the image below, this is how a data frame is structured.
The data is presented in tabular form, which makes it easier to operate and understand.

R – Data Frames

Create Dataframe in R Programming Language


To create an R data frame use data.frame() function and then pass each of the vectors you
have created as arguments to the function.
R
# R program to create dataframe

# creating a data frame


friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)
# print the data frame
print(friend.data)

Output:
friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni

Get the Structure of the R Data Frame

One can get the structure of the R data frame using str() function in R.
It can display even the internal structure of large lists which are nested. It provides one-liner
output for the basic R objects letting the user know about the object and its constituents.
R
# R program to get the
# structure of the data frame

# creating a data frame


friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)
# using str()
print(str(friend.data))
Output:

Summary of Data in the R data frame


In the R data frame, the statistical summary and nature of the data can be obtained by
applying summary() function.
It is a generic function used to produce result summaries of the results of various model fitting
functions. The function invokes particular methods which depend on the class of the first
argument.
R
# R program to get the
# summary of the data frame

# creating a data frame


friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)
# using summary()
print(summary(friend.data))
Output:

Extract Data from Data Frame in R


Extracting data from an R data frame means that to access its rows or columns. One can extract
a specific column from an R data frame using its column name.
R
# R program to extract
# data from the data frame
# creating a data frame
friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)

# Extracting friend_name column


result <- data.frame(friend.data$friend_name)
print(result)
Output:

Expand Data Frame in R Language


A data frame in R can be expanded by adding new columns and rows to the already existing R
data frame.
R
# R program to expand
# the data frame

# creating a data frame


friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)
# Expanding data frame
friend.data$location <- c("Kolkata", "Delhi",
"Bangalore", "Hyderabad",
"Chennai")
resultant <- friend.data
# print the modified data frame
print(resultant)
Output:
friend_id friend_name location
1 Sachin Kolkata
2 Sourav Delhi
3 Dravid Bangalore
4 Sehwag Hyderabad
5 Dhoni Chennai

In R, one can perform various types of operations on a data frame like accessing rows and
columns, selecting the subset of the data frame, editing data frames, delete rows and
columns in a data frame, etc.

Access Items in R Data Frame


We can select and access any element from data frame by using single $ ,brackets [ ] or double
brackets [[]] to access columns from a data frame.
R
# creating a data frame
friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)

# Access Items using []


friend.data[1]

# Access Items using [[]]


friend.data[['friend_name']]

# Access Items using $


friend.data$friend_id
Output:
Amount of Rows and Columns
We can find out how many rows and columns present in our dataframe by using dim function.
R
# creating a data frame
friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)

# find out the number of rows and clumns


dim(friend.data)
Output:
[1] 5 2
Add Rows and Columns in R Data Frame
You can easily add rows and columns in a R DataFrame. Insertion helps in
expanding the already existing DataFrame, without needing a new one.
Let’s look at how to add rows and columns in a DataFrame ? with an example:
Add Rows in R Data Frame
To add rows in a Data Frame, you can use a built-in function rbind().
Following example demonstrate the working of rbind() in R Data Frame.
R
# Creating a dataframe representing products in a store
Products <- data.frame(
Product_ID = c(101, 102, 103),
Product_Name = c("T-Shirt", "Jeans", "Shoes"),
Price = c(15.99, 29.99, 49.99),
Stock = c(50, 30, 25)
)

# Print the existing dataframe


cat("Existing dataframe (Products):\n")
print(Products)

# Adding a new row for a new product


New_Product <- c(104, "Sunglasses", 39.99, 40)
Products <- rbind(Products, New_Product)

# Print the updated dataframe after adding the new product


cat("\nUpdated dataframe after adding a new product:\n")
print(Products)
Output:
Add Columns in R Data Frame
To add columns in a Data Frame, you can use a built-in function cbind().
Following example demonstrate the working of cbind() in R Data Frame.
R
# Existing dataframe representing products in a store
Products <- data.frame(
Product_ID = c(101, 102, 103),
Product_Name = c("T-Shirt", "Jeans", "Shoes"),
Price = c(15.99, 29.99, 49.99),
Stock = c(50, 30, 25)
)

# Print the existing dataframe


cat("Existing dataframe (Products):\n")
print(Products)

# Adding a new column for 'Discount' to the dataframe


Discount <- c(5, 10, 8) # New column values for discount
Products <- cbind(Products, Discount)

# Rename the added column


colnames(Products)[ncol(Products)] <- "Discount" # Renaming the last column

# Print the updated dataframe after adding the new column


cat("\nUpdated dataframe after adding a new column 'Discount':\n")
print(Products)
Output:

Remove Rows and Columns


A data frame in R removes columns and rows from the already existing R data frame.
Remove Row in R DataFrame
R
library(dplyr)
# Create a data frame
data <- data.frame(
friend_id = c(1, 2, 3, 4, 5),
friend_name = c("Sachin", "Sourav", "Dravid", "Sehwag", "Dhoni"),
location = c("Kolkata", "Delhi", "Bangalore", "Hyderabad", "Chennai")
)

data

# Remove a row with friend_id = 3


data <- subset(data, friend_id != 3)

data
Output:
In the above code, we first created a data frame called data with three
columns: friend_id, friend_name, and location. To remove a row with friend_id equal to 3,
we used the subset() function and specified the condition friend_id != 3. This removed the
row with friend_id equal to 3.

Remove Column in R DataFrame


R
library(dplyr)
# Create a data frame
data <- data.frame(
friend_id = c(1, 2, 3, 4, 5),
friend_name = c("Sachin", "Sourav", "Dravid", "Sehwag", "Dhoni"),
location = c("Kolkata", "Delhi", "Bangalore", "Hyderabad", "Chennai")
)
data

# Remove the 'location' column


data <- select(data, -location)

data
Output:
To remove the location column, we used the select() function and specified -location.
The – sign indicates that we want to remove the location column. The resulting data
frame data will have only two columns: friend_id and friend_name.

Combining Data Frames in R


There are 2 way to combine data frames in R. You can either combine them vertically or
horizontally.
Let’s look at both cases with example:
Combine R Data Frame Vertically
If you want to combine 2 data frames vertically, you can use rbind() function. This function
works for combination of two or more data frames.
R
# Creating two sample dataframes
df1 <- data.frame(
Name = c("Alice", "Bob"),
Age = c(25, 30),
Score = c(80, 75)
)

df2 <- data.frame(


Name = c("Charlie", "David"),
Age = c(28, 35),
Score = c(90, 85)
)

# Print the existing dataframes


cat("Dataframe 1:\n")
print(df1)

cat("\nDataframe 2:\n")
print(df2)

# Combining the dataframes using rbind()


combined_df <- rbind(df1, df2)

# Print the combined dataframe


cat("\nCombined Dataframe:\n")
print(combined_df)
Output:
Combine R Data Frame Horizontally:
If you want to combine 2 data frames horizontally, you can use cbind() function. This function
works for combination of two or more data frames.
R
# Creating two sample dataframes
df1 <- data.frame(
Name = c("Alice", "Bob"),
Age = c(25, 30),
Score = c(80, 75)
)

df2 <- data.frame(


Height = c(160, 175),
Weight = c(55, 70)
)

# Print the existing dataframes


cat("Dataframe 1:\n")
print(df1)

cat("\nDataframe 2:\n")
print(df2)

# Combining the dataframes using cbind()


combined_df <- cbind(df1, df2)

# Print the combined dataframe


cat("\nCombined Dataframe:\n")
print(combined_df)
Output:
Data Handling in R Programming

R Programming Language is used for statistics and data analytics purposes.
Importing and exporting of data is often used in all these applications of R
programming.
R language has the ability to read different types of files such as comma-separated
values (CSV) files, text files, excel sheets and files, SPSS files, SAS files, etc.
R allows its users to work smoothly with the systems directories with the help of
some pre-defined functions that take the path of the directory as the argument or
return the path of the current directory that the user is working on. Below are
some directory functions in R:
 getwd(): This function is used to get the current working directory being used
by R.
 setwd(): This function in R is used to change the path of current working
directory and the path of the directory is passed as argument in the function.
Example:
 setwd("C:/RExamples/")
OR
setwd("C:\\RExamples\\")
 list.files(): This function lists all files and folders present in current working
directory.

Importing Files in R
Let us take some basic files to import in R to learn the approach of importing data
files:
Importing Text Files
In R language, text files can be read using read.table() function.
Syntax:
read.table(filename, header = FALSE, sep = "")
Parameters:
header represents if the file contains header row or not
sep represents the delimiter value used in file
To know about all the arguments of read.table(), execute below command in R:
help("read.table")
Example:
Suppose a file is present in the current working directory and using R
programming, import the data from that particular text file and the content of text
file is as shown:
100 A a
200 B b
300 C c
400 D d
500 E e
600 F f
Values are separated by white spaces.

# Check current working directory

getwd()

# Get content into a data frame

data <- read.table("TextFileExample.txt",

header = FALSE, sep = " ")

# Printing content of Text File

print(data)

# Print the class of data

print(class(data))

Output:
[1] "C:/Users/xyz/Documents"
V1 V2 V3
1 100 A a
2 200 B b
3 300 C c
4 400 D d
5 500 E e
6 600 F f
[1] "data.frame"

Importing CSV Files


Comma separated values or CSV files can be imported and read in R
using read.csv() function.

Syntax:
read.csv(filename, header = FALSE, sep = "")

Parameters:
header represents if the file contains header row or not
sep represents the delimiter value used in file
To know about all the arguments of read.csv(), execute below command in R:
help("read.csv")

Example:
Suppose a csv file is present in the current working directory and the content of
file is as shown:
# Check current working directory

getwd()

# Get content into a data frame

data <- read.csv("CSVFileExample.csv",

header = FALSE,sep = "\t")

# Printing content of Text File

print(data)

# Print the class of data

print(class(data))

Output:
[1] "C:/Users/xyz/Documents"
V1 V2 V3
1 100 AB ab
2 200 CD cd
3 300 EF ef
4 400 GH gh
5 500 IJ ij
[1] "data.frame"
Importing Excel File
To read and import the excel files, “xlsx” package is required to use
the read.xlsx() function. To read “.xls” excel files, “gdata” package is required
to use read.xls() function.

Syntax:
read.xlsx(filename, sheetIndex)
OR
read.xlsx(filename, sheetName)

Parameters:
sheetIndex specifies number of sheet
sheetName specifies name of sheet
To know about all the arguments of read.xlsx(), execute below command in R:
help("read.xlsx")

Example:
Suppose a xlsx file is present in the current working directory and the content of
file is as shown:

# Install xlsx package

install.packages("xlsx")

library(xlsx)
# Check current working directory

getwd()

# Get content into a data frame

data <- read.xlsx("ExcelExample.xlsx",

sheetIndex = 1,

header = FALSE)

# Printing content of Text File

print(data)

# Print the class of data

print(class(data))

Output:
[1] "C:/Users/xyz/Documents"
X1 X2 X3
1 1000 ABC abc
2 2000 DEF def
3 3000 GHI ghi
4 4000 JKL jkl
5 5000 MNO mno
[1] "data.frame"
Exporting files in R
Below are some methods to export the data to a file in R:
 Using console
cat() function in R is used to output the object to console. It can be also used
as redirecting the output to a particular file.
Syntax:
cat(..., file)
Parameter:
file specifies the filename to which output has to redirected
To know about all the arguments of cat(), execute below command in R:
help("cat")
Example:

str = "World"

# Redirect Output to file

cat("Hello, ", str, file = "catExample.txt")

Output:
Above code creates a new file and redirects the output of cat(). The contents
of the file are shown below after executing the code-
Hello, World
 Using sink() function:
sink() function is used to redirect all the outputs from cat() and print() to the
given filename.
Syntax:
 sink(filename) # begins redirecting output to file
 .
 .
 sink()
To know about all the arguments of sink(), execute below command in R:
help("sink")

Example:

# Begin redirecting output

sink("SinkExample.txt")

x <- c(1, 3, 4, 5, 10)

print(mean(x))

print(class(x))

print(median(x))

sink()

Output:
The above code creates a new file and redirects the output. The contents of the
file are shown below after executing the code-
[1] 4.6
[1] "numeric"
[1] 4

Writing to CSV files


A matrix or data-frame object can be redirected and written to csv file
using write.csv() function.
Syntax: write.csv(x, file)
Parameter:
file specifies the file name used for writing
To know about all the arguments of write.csv(), execute below command in R:
help("write.csv")
Example:

# Create vectors

x <- c(1, 3, 4, 5, 10)

y <- c(2, 4, 6, 8, 10)

z <- c(10, 12, 14, 16, 18)

# Create matrix

data <- cbind(x, y, z)

# Writing matrix to CSV File

write.csv(data, file = "CSVWrite.csv", row.names = FALSE)

Output:
Above code creates a new file and redirects the output. The contents of the file is
shown below after executing the code-
Working with CSV files in R Programming


R CSV Files
R CSV Files are text files wherein the values of each row are separated by a
delimiter, as in a comma or a tab. In this article, we will use the following sample
CSV file.
Getting and Setting the Working Directory with R CSV Files
 R

# Get the current working directory.

print(getwd())

# Set current working directory.

setwd("/web/com")

# Get and print current working directory.

print(getwd())

Output:
[1] "C:/Users/GFG19565/Documents"

[1] "C:/Users/GFG19565/Documents"
With the help of getwd() function we can get the current working directory and
with the help of setwd()function we can also set a new working directory.
Input as R CSV Files
id, name, department, salary, projects
1, A, IT, 60754, 4
2, B, Tech, 59640, 2
3, C, Marketing, 69040, 8
4, D, Marketing, 65043, 5
5, E, Tech, 59943, 2
6, F, IT, 65000, 5
7, G, HR, 69000, 7

We can save this file in notepad and give name sample.csv so we can upload this
in R Programming Language.

Reading a R CSV Files


The contents of a CSV file can be read as a data frame in R using the read.csv(…)
function. The CSV file to be read should be either present in the current working
directory or the directory should be set accordingly using the setwd(…) command
in R. The CSV file can also be read from a URL using read.csv() function.
 R

csv_data <- read.csv(file = 'C:\\Users\\GFG19565\\Downloads\\sample.csv')

print(csv_data)

# print number of columns

print (ncol(csv_data))

# print number of rows

print(nrow(csv_data))

Output:
id, name, department, salary, projects
1 1 A HR 60754 14
2 2 B Tech 59640 3
3 3 C Marketing 69040 8
4 4 D HR 65043 5
5 5 E Tech 59943 2
6 6 F IT 65000 5
7 7 G HR 69000 7
[1] 4
[1] 7

We can upload the R Csv Files by passing its directory the header is by default
set to a TRUE value in the function. The head is not included in the count of rows,
therefore this CSV has 7 rows and 4 columns.

Querying with R CSV Files


SQL queries can be performed on the CSV content, and the corresponding result
can be retrieved using the subset(csv_data,) function in R. Multiple queries can
be applied in the function at a time where each query is separated using a logical
operator. The result is stored as a data frame in R.
 R

min_pro <- min(csv_data$projects)

print (min_pro)

Output:
2

Aggregator functions (min, max, count etc.) can be applied on the CSV data. Here
the min() function is applied on projects column using $ symbol. The minimum
number of projects which is 2 is returned.
 R
# Selecting 'name' and 'salary' columns for employees with salary greater than
60000

result <- csv_data[csv_data$salary > 60000, c("name", "salary")]

# Print the result

print(result)

name salary
1 A 60754
3 C 69040
4 D 65043
6 F 65000
7 G 69000
The subset of the data that is created is stored as a data frame satisfying the
conditions specified as the arguments of the function. Selecting ‘name’ and
‘salary’ columns for employees with salary greater than 60000.

Writing into a R CSV Files


The contents of the data frame can be written into a CSV file. The CSV file is
stored in the current working directory with the name specified in the function
write.csv(data frame, output CSV name) in R.
 R
# Calculating the average salary for each department

result <- tapply(csv_data$salary, csv_data$department, mean)

# Print the result

print(result)

Output:
HR IT Marketing Tech
69000.0 62877.0 67041.5 59791.5

In this we will Calculating the average salary for each department.


Working with Excel Files in R Programming

Excel files are of extension .xls, .xlsx and .csv(comma-separated values). To
start working with excel files in R Programming Language, we need to first
import excel files in RStudio or any other R supporting IDE(Integrated
development environment).
Reading Excel Files in R Programming Language
First, install readxl package in R to load excel files. Various methods including
their subparts are demonstrated further.
Sample_data1.xlsx:

Sample_data2.xlsx:
Reading Files:
The two excel files Sample_data1.xlsx and Sample_data2.xlsx and read from
the working directory.

 R

# Working with Excel Files

# Installing required package

install.packages("readxl")

# Loading the package

library(readxl)

# Importing excel file

Data1 < - read_excel("Sample_data1.xlsx")

Data2 < - read_excel("Sample_data2.xlsx")

# Printing the data

head(Data1)

head(Data2)
The excel files are loaded into variables Data_1 and Data_2 as
a dataframes and then variable Data_1 and Data_2 is called that prints the
dataset.

Modifying Files
The Sample_data1.xlsx file and Sample_file2.xlsx are modified.

 R

# Modifying the files

Data1$Pclass <- 0

Data2$Embarked <- "S"

# Printing the data

head(Data1)
head(Data2)

The value of the P-class attribute or variable of Data1 data is modified to 0. The
value of Embarked attribute or variable of Data2 is modified to S.

Deleting Content from files


The variable or attribute is deleted from Data1 and Data2 datasets containing
Sample_data1.xlsx and Sample_data2.xlsx files.

 R

# Deleting from files

Data1 <- Data1[-2]

Data2 <- Data2[-3]


# Printing the data

Data1

Data2

The – sign is used to delete columns or attributes from the dataset. Column 2 is
deleted from the Data1 dataset and Column 3 is deleted from the Data2 dataset.

Merging Files
The two excel datasets Data1 and Data2 are merged using merge() function
which is in base package and comes pre-installed in R.
 R

# Merging Files

Data3 <- merge(Data1, Data2, all.x = TRUE, all.y = TRUE)

# Displaying the data


head(Data3)

Data1 and Data2 are merged with each other and the resultant file is stored in
the Data3 variable.
Creating new columns
New columns or features can be easily created in Data1 and Data2 datasets.

 R

# Creating feature in Data1 dataset

Data1$Num < - 0

# Creating feature in Data2 dataset

Data2$Code < - "Mission"

# Printing the data

head(Data1)
head(Data2)

Num is a new feature that is created with 0 default value in Data1


dataset. Code is a new feature that is created with the mission as a default string
in Data2 dataset.

Writing Files
After performing all operations, Data1 and Data2 are written into new files
using write.xlsx() function built in writexl package.
 R

# Installing the package

install.packages("writexl")

# Loading package

library(writexl)
# Writing Data1

write_xlsx(Data1, "New_Data1.xlsx")

# Writing Data2

write_xlsx(Data2, "New_Data2.xlsx")
The Data1 dataset is written New_Data1.xlsx file and Data2 dataset is written
in New_Data2.xlsx file. Both the files are saved in the present working
directory.

You might also like