Data Analysis Using R - 4
Data Analysis Using R - 4
4.1 Functions in R
In R, functions are blocks of code that perform a specific task or set of tasks. They are essential for
organizing and reusing code, making our R programs more modular and easier to maintain.
R has a large number of in-built functions and the user-defined functions where user can create
their own functions.
Types of Function in R Language:
Built-in Function: Built-in functions in R are pre-defined functions that are available in R
programming languages to perform common tasks or operations.
User-defined Function: R language allow us to write our own function.
Creating a Function: An R function is created by using the keyword function. The basic syntax
of an R function definition is as follows.
function_name <- function(arg_1, arg_2, ...) {
Function body
}
Example:
my_function <- function() { # create a function with the name my_function
print("Hello World!")
}
Call a Function: To call a function, use the function name followed by parenthesis, like
my_function()
Example:
my_function() # call the function named my_function
Function Arguments: Functions can have arguments (parameters) that allow you to pass
values to the function for processing.Information can be passed into functions as arguments.
Arguments are specified after the function name, inside the parentheses. we can add as many
arguments as you want, just separate them with a comma.
my_function <- function(fname)
{
paste(fname, "Griffin")
}
my_function("Peter")
my_function("Lois")
my_function("Stewie")
#Output:
[1] "Peter Griffin"
[1] "Lois Griffin"
[1] "Stewie Griffin"
In R the functions are executed in a lazy fashion. When we say lazy what it means is if some
arguments are missing the function is still executed as long as the execution does not involve
those arguments.
This can help improve performance and reduce unnecessary computation.
In R, lazy evaluation is primarily associated with function arguments.
Example:
calculate= function(a,b){
square<-a^2
return(square)
}
# This'll execute because this b is not used in the
# calculations inside the function.
print(calculate(5))
#Output:[1] 25
calculate= function(a,b){
add<-a+b
return(add)
}
# This'll throw an error
print(calculate(5))
#Output: Error in calculate(5) : argument "b" is missing, with no default
Calls: print -> calculate
Execution halted
4.2 Import and Export data to/from Text and CSV file
To import data from excel file into R, it is required to prepare data, i.e. data must be in aproper
format.
• Following are some of the formatting options:
1. First row of excel spreadsheet is usually a header. Try to reserve first row for header.
2. Avoid fields/values with blank spaces. E.g. if field is enrolment number then avoid blank
space between these two words otherwise it will be considered as 2 separate words. You may
use enrolment.number as a field/value. Use dot operator to concate twowords.
3. Try to avoid names containing special characters like @,&,#,%,+,/,(,),{,},[,<, etc
4. Delete comments from excel file if any. It will be considered as separate column.
5. Try to indicate missing values in excel file as NA.
6. The common extensions to save excel file are .xls, .xlsx, but you may also save your excel file
as .txt or .csv
7. Depending on the type/extension of file, data fields are separated either by tabs or
by commas.
8. After all above preparations, file is now ready to import into R.
In R, there are two options to import data, through commands or through packages.
• Basic R commands are stored in Utils package which is a built-in R package that stores utility
functions.
• Following are commands to import excel file into R:
Import Data from Text File into R:
1. read.table():
• If excel file is stored as .txt, then read.table command is used to read text file.
demo<-read.table("filename.txt",header=TRUE,sep="/",strip.white=TRUE)
• Here demo is the name of the file in R where we are importing our text file.
• filename.txt is the name of file to import. You need to specify complete file path.
• Header=TRUE, is used when excel file has first row as header.
• Usually text file uses tabs as a separator. If our file is using any symbol other than tab as
a separator then sep parameter is used to indicate that separator symbol.
Here sep=“/“ it means input file is using / as a separator.
• strip.white=TRUE is used if we want to strip/clean white spacesfrom unquoted
characters in input file. It is used with sep parameter only.
• If input file does not contain header, R automatically assigns some default headers to it.
--------------------------------------------------------------------------------------------------------------------------------
2. write.table():
• If excel file is stored as .csv, then read.csv command is used to read .csv file.
demo<-read.csv/read.csv2(file="filename.csv",header=TRUE,stringsAsFactors=FALSE,
strip.white=TRUE)
o Here demo is the name of the file in R where we are importing our csv file.
o filename.csv is the name of file to import. You need to specify complete file path.
o Header=TRUE, is used when excel file has first row as header.
o strip.white=TRUE is used if we want to strip/clean white spaces from unquoted characters in
input file. It is used with sep parameter only.
o stringsAsFactors specifies whether strings should be considered as factors.
o Usually csv file uses commas as a separator. If our file is using “;” as a separator then we should
use read.csv2() command to import that file.
--------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------
Export data from R into Excel file .
1.writeWorksheet():
writeWorksheet(wb, TYFSdf,sheet=”TYFS", startRow=1, startCol=1)
• writeWorksheet() writes data from R dataframe into new worksheet of newly created workbook.
• In above command:
• wb is the name of newly created workbook
• TYFS is the name of worksheet created in workbook wb.
• TYFSdf is the name of dataframe to be written into worksheet.
• startRow, startCol used to mention row and column index from which data
writing will start.
saveWorkbook(wb) #save workbook and write file to disk in current working dir.
--------------------------------------------------------------------------------------------------------------------------------
2.write.xlsx():
# write data from demo dataframe to existing emp file by creating new worksheet named new_emp
--------------------------------------------------------------------------------------------------------------------------------
i. writexl package:
• install.packages("writexl”)
• library(writexl)
3.write_excel():
write_excel(demo,"bookinfo.xlsx ", row.names=FALSE)
• write_excel command is used to write data frame to excel file.
• row.names parameter is set to FALSE if we don’t want to export row names. By default it
is set to TRUE.
• Following are some of the commands to work with the MySQL environment:
1. Get connection summary:
summary(con)
• install.packages("XML")
• library(“XML”),
• library(“methods”)
Comparing 2 graphs:
>plot(months, temp21, type="b",ylab = "temperature",main="Temperature 2020 vs 2021",col
="blue", cex=1, pch=2, lwd=2)
> points(months,temp20,col="red", cex=2,type="b")
> legend(x="topright",title = "Years",legend = c("2020","2021"),fill=c("red","blue"))
Plot using Two different inputs (list and dataframe):
4. Save as pdf:
> pdf(file=“temparature.pdf”)
> plot(months, temp, type="b", main="temperature-2021", col="blue", cex=2, pch=2, lwd=2, lty=1)
> dev.off( )
•Print numbers(values) in each bar using text function and make box around the graph :
value <- barplot (event$scount,names.arg=event$ename,main = "Event Analysis",xlab = "Event
names",ylab ="No.of students",col=topo.colors(4))
text(value , 0 , event$scount , cex=1 , pos=3)
box() #creates a box around the graph
•Horizontal barchart :
barplot (event$scount,names.arg=event$ename,main = "Event Analysis",ylab = "Event names",
xlab ="No.of students",col=topo.colors(4), horiz = TRUE)
Dataframe 2
> event1
ename boys girls
1 coding 15 20
2 paper 25 23
3 project 20 34
4 roborace 40 39
• for above dataframe simple bar chart can not be plotted. It has two categories of student(boys, girls).
We need to create grouped or stacked bar chart for it.
• first we need to create a matrix for those two columns columns.
emat<-matrix(c(boys, girls), nrow=2,ncol=4,byrow=TRUE)
rownames (emat)=c("boys","girls")
colnames (emat)=c("coding", "paper", "project", "roborace")
emat
Output:
coding paper project roborace
boys 15 25 20 40
girls 20 23 34 39
Grouped barchart :
barplot(emat,xlab="events",ylab="no.of.students",main="Event Analysis" ,names.arg=c("coding",
"paper","project","roborace"), col=c(7,2),beside=TRUE)
• using beside=TRUE, a grouped bar chart is created.
Print numbers in each bar using text function:
• first store X-Y coordinates of bar chart in one vector
img<barplot(emat,xlab="events",ylab="no.of.students",main="EventAnalysis" ,names.arg=c(
"coding","paper","project", "roborace"), col=c(7,2),beside=TRUE)
text(img,0,emat,cex=1,pos=3)
4.6.4 Histogram:
A histogram represents the frequencies of values of a variable bucketed into ranges.
Histogram is similar to bar chat but the difference is it groups the values into continuous ranges.
Each bar in histogram represents the height of the number of values present in that range.
R creates histogram using hist() function. This function takes a vector as an input and uses some
more parameters to plot histograms.
Syntax:
hist(v,main,xlab,xlim,ylim,breaks,col,border)
Following is the description of the parameters used :
4.6.5. Boxplot:
A box graph is a chart that is used to display information in the form of distribution by drawing
boxplots for each of them.
This distribution of data is based on five sets (minimum, first quartile, median, third quartile, and
maximum).
Boxplots are created in R by using the boxplot() function.
Syntax: boxplot(x, data, notch, varwidth, names, main)
- x: This parameter sets as a vector or a formula.
- data: This parameter sets the data frame.
- notch: This parameter is the label for horizontal axis.
- varwidth: This parameter is a logical value. Set as true to draw width of the box proportionate to the
sample size.
- main: This parameter is the title of the chart.
- names: This parameter are the group labels that will be showed under each boxplot.
When we put notch=TRUE then output will be shown as below: