0% found this document useful (0 votes)

45 views23 pages

Data Analysis Using R - 4

Uploaded by

harshvasudevkoli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views23 pages

Data Analysis Using R - 4

Uploaded by

harshvasudevkoli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

DAR

4. Working with Data

4.1 Functions in R

 In R, functions are blocks of code that perform a specific task or set of tasks. They are essential for
organizing and reusing code, making our R programs more modular and easier to maintain.
 R has a large number of in-built functions and the user-defined functions where user can create
their own functions.
 Types of Function in R Language:
 Built-in Function: Built-in functions in R are pre-defined functions that are available in R
programming languages to perform common tasks or operations.
 User-defined Function: R language allow us to write our own function.

4.1.1 User-defined Function:

 Creating a Function: An R function is created by using the keyword function. The basic syntax
of an R function definition is as follows.
function_name <- function(arg_1, arg_2, ...) {
Function body
}
Example:
my_function <- function() { # create a function with the name my_function
print("Hello World!")
}
 Call a Function: To call a function, use the function name followed by parenthesis, like
my_function()
Example:
my_function() # call the function named my_function
 Function Arguments: Functions can have arguments (parameters) that allow you to pass
values to the function for processing.Information can be passed into functions as arguments.
Arguments are specified after the function name, inside the parentheses. we can add as many
arguments as you want, just separate them with a comma.
my_function <- function(fname)
{
paste(fname, "Griffin")
}
my_function("Peter")
my_function("Lois")
my_function("Stewie")
#Output:
[1] "Peter Griffin"
[1] "Lois Griffin"
[1] "Stewie Griffin"

4.1.2 Lazy Evalution of Function:

 In R the functions are executed in a lazy fashion. When we say lazy what it means is if some
arguments are missing the function is still executed as long as the execution does not involve
those arguments.
 This can help improve performance and reduce unnecessary computation.
 In R, lazy evaluation is primarily associated with function arguments.
Example:
 calculate= function(a,b){
square<-a^2
return(square)
}
# This'll execute because this b is not used in the
# calculations inside the function.
print(calculate(5))
#Output:[1] 25

 calculate= function(a,b){
add<-a+b
return(add)
}
# This'll throw an error
print(calculate(5))
#Output: Error in calculate(5) : argument "b" is missing, with no default
Calls: print -> calculate
Execution halted

4.2 Import and Export data to/from Text and CSV file

 To import data from excel file into R, it is required to prepare data, i.e. data must be in aproper
format.
• Following are some of the formatting options:
1. First row of excel spreadsheet is usually a header. Try to reserve first row for header.
2. Avoid fields/values with blank spaces. E.g. if field is enrolment number then avoid blank
space between these two words otherwise it will be considered as 2 separate words. You may
use enrolment.number as a field/value. Use dot operator to concate twowords.
3. Try to avoid names containing special characters like @,&,#,%,+,/,(,),{,},[,<, etc
4. Delete comments from excel file if any. It will be considered as separate column.
5. Try to indicate missing values in excel file as NA.
6. The common extensions to save excel file are .xls, .xlsx, but you may also save your excel file
as .txt or .csv
7. Depending on the type/extension of file, data fields are separated either by tabs or
by commas.
8. After all above preparations, file is now ready to import into R.
In R, there are two options to import data, through commands or through packages.
• Basic R commands are stored in Utils package which is a built-in R package that stores utility
functions.
• Following are commands to import excel file into R:
 Import Data from Text File into R:
1. read.table():
• If excel file is stored as .txt, then read.table command is used to read text file.
 demo<-read.table("filename.txt",header=TRUE,sep="/",strip.white=TRUE)
• Here demo is the name of the file in R where we are importing our text file.
• filename.txt is the name of file to import. You need to specify complete file path.
• Header=TRUE, is used when excel file has first row as header.
• Usually text file uses tabs as a separator. If our file is using any symbol other than tab as
a separator then sep parameter is used to indicate that separator symbol.
Here sep=“/“ it means input file is using / as a separator.
• strip.white=TRUE is used if we want to strip/clean white spacesfrom unquoted
characters in input file. It is used with sep parameter only.
• If input file does not contain header, R automatically assigns some default headers to it.
--------------------------------------------------------------------------------------------------------------------------------

Export data from R to text file

2. write.table():

It is used to export data from R to external file.

 write.table(student, “studinfo.txt”, row.names=FALSE)
• student is the name of R object which is to be exported.
• studinfo.txt is the name of file in which data is to be exported. By default it will create target
file in current working directory. It is required to specify complete file path if we want to
change target file location.
• row.names parameter is set to FALSE if we don’t want to export row names. By default it
is set to TRUE.
 Import data from csv file into R

1. read.csv()/read.csv2: comma separated file.

• If excel file is stored as .csv, then read.csv command is used to read .csv file.
 demo<-read.csv/read.csv2(file="filename.csv",header=TRUE,stringsAsFactors=FALSE,
strip.white=TRUE)
o Here demo is the name of the file in R where we are importing our csv file.
o filename.csv is the name of file to import. You need to specify complete file path.
o Header=TRUE, is used when excel file has first row as header.
o strip.white=TRUE is used if we want to strip/clean white spaces from unquoted characters in
input file. It is used with sep parameter only.
o stringsAsFactors specifies whether strings should be considered as factors.
o Usually csv file uses commas as a separator. If our file is using “;” as a separator then we should
use read.csv2() command to import that file.
--------------------------------------------------------------------------------------------------------------------------------

Export data from R to csv file

2. write.csv ():
• It is used to export data from R to external file.
 write.csv(student, “studinfo.csv”. row.names=FALSE)
• student is the name of R object which is to be exported.
• studinfo.csv is the name of file in which data is to be exported. By default it will create target
file in current working directory. It is required to specify complete file path if we want to
change target file location.
• row.names parameter is set to FALSE if we don’t want to export row names. By default it is
set to TRUE.
Deleting data from file : (Deleting the Pages Column)

4.3 Import and Export data to/from Excel file :

 Import data from Excel file into R
 Many packages are available in R to import excel file. We need to load these packages alongwith
the library to use their functions.
i. XLConnect package:
• install.packages("XLConnect") #install package
• library(XLConnect) #install library
1.readWorksheetFromFile():
 demo<-readWorksheetFromFile("filename.extension",sheet=1,startRow=2,endRow=10,endCol =3)
• readWorksheetFromFile command is used to read a specified sheet from excel file.
• sheet parameter is used to specify sheet number/index to be read.
• startRow or startCol indicates from which row or column data should be imported.
• endRow or endCol indicates up to which row or column data should be imported.
• If row or column index is not specified, it always read from row and column 1.
• Alternatively entire workbook can be read and then we can select a sheet to be read by using
following command:
 wb<-loadWorkbook("studinfo.xlsx") #Load complete workbook into R
 getSheets(wb) # get the list of sheets in workbook
 demo<-readWorksheet(wb,sheet=2) # then read required sheet
 wb<-loadWorkbook("studinfo.xlsx", create=TRUE) #create workbook if not exist
createSheet(wb, name=”TYFS") # create new sheet named “TYFS”
ii. xlsx package:
• install.packages("xlsx")
• library(xlsx)
2. read.xlsx:
 demo<-read.xlsx("filename.extension",sheetIndex=1, rowIndex=5,colIndex=3)
• sheetIndex specifies index of sheet to be read.
• rowIndex and colIndex specify the row and column index from which data should be read.
Export data from R to Excel file.
----------------------------------------------------------------------------------------------------------------------------------
iii. readxl package:
• install.packages("readxl")
• library(readxl)
3.read_excel():
 demo<-read_excel("filename.extension",sheet=4, skip=2)
 read_excel command is used to read specified file with given sheet index.
• Above command reads sheet no. 4 by skipping first 2 rows.

 Skip first 2 columns from excel file.

 Changing column names .

--------------------------------------------------------------------------------------------------------------------------------
 Export data from R into Excel file .
1.writeWorksheet():
 writeWorksheet(wb, TYFSdf,sheet=”TYFS", startRow=1, startCol=1)
• writeWorksheet() writes data from R dataframe into new worksheet of newly created workbook.
• In above command:
• wb is the name of newly created workbook
• TYFS is the name of worksheet created in workbook wb.
• TYFSdf is the name of dataframe to be written into worksheet.
• startRow, startCol used to mention row and column index from which data
writing will start.
 saveWorkbook(wb) #save workbook and write file to disk in current working dir.

--------------------------------------------------------------------------------------------------------------------------------

2.write.xlsx():

 write.xlsx(demo,"emp.xlsx") # write data from demo dataframe to emp file

 write.xlsx(demo,"emp.xlsx",sheetName= “new_emp”, append=TRUE)

# write data from demo dataframe to existing emp file by creating new worksheet named new_emp

append parameter is used to append dataframe to existing file.

--------------------------------------------------------------------------------------------------------------------------------

i. writexl package:
• install.packages("writexl”)
• library(writexl)
3.write_excel():
 write_excel(demo,"bookinfo.xlsx ", row.names=FALSE)
• write_excel command is used to write data frame to excel file.
• row.names parameter is set to FALSE if we don’t want to export row names. By default it
is set to TRUE.

4.4 Database connectivity via ODBC :

 Import data, perform different operation on it :

1. RMySQL package: RMySQL is a database interface and MySQL driver for R.

• install.packages("RMySQL")
• library(RMySQL)
• Make a connection object:
 con<-dbConnect(MySQL(), user=”root”, password=”Pass@123”, host=”localhost”,
dbname=”employee”)
• MySQL() function creates a driver object for MySQL.
• user, password and host are the values that are set while installation of MYSQL.
• dbname is the name of database to be connected with.

• Following are some of the commands to work with the MySQL environment:
1. Get connection summary:
 summary(con)

2. Get Database information:

 dbGetInfo(con)

3. Show tables in connected database:

 dbListTables(con)

4. Show fields in any table:

 dbListFields(con,”marketing”) #display fields from marketing table.

5. Remove any table from database:

 dbRemoveTable(con,”marketing”) #remove table marketing from connected database.

6. Read entire table from database:

 dbReadTable(con,”testing”) #read table “testing”

7. Extract rows from table:

dbSendQuery() submits and executes SQL query to database engine.
o It does not extract any record. dbFetch or fecth() functions are used to fetch records.
o dbGetQuery() can also be used for interactive session.

 market<-dbSendQuery(con, “select * from marketing;”)

 market<-dbSendQuery(con, “select * from marketing where sal>5000 ;”)
 market_data<-dbFetch/fetch(market) #fetch all rows from marketing table.
 dbGetRowCount(market_data) #get number of rows fetched.
 market_data<-dbFetch/fetch(market, n=10) #fetch first 10 rows from marketing table.
 market_data<-dbFetch/fetch(market, n=-1) #fetch first all rows from marketing table.

8. Get count of number of rows affected by query:

 dbGetRowsAffected(market)

 Export data to database

1. Execute various queries on database:

 dbSendQuery(con, insert into testing values(15,”jack”,6000);)
 dbSendQuery(con, update testing set salary=7000 where empno=10);)
 dbSendQuery(con,”drop table if exists marketing”;)

2. Clear data/free resources

 dbClearResult(market)

3. Overwrite table in the database:

 dbWriteTable(con,”testing”,”new_test”,overwrite=TRUE) #overwrite table testing with new_test.

4. Append data to the table in the database:

 dbWriteTable(con,”testing”,”new_test”,append=TRUE) #append new_test to the testing table.

5. Disconnect from database:

 dbDisconnect(con)

4.5 Import XML file:

• install.packages("XML")
• library(“XML”),
• library(“methods”)

• Import XML file:

 emp<-xmlParse(file=”employee.xml”)
 print(emp) #produces list as output

• Extract root node of fetched file:

 root<-xmlRoot(emp)

• Find number of nodes in xml file:

 filesize<-xmlSize(root)

• Print specific node from file:

 print(root[1]) #display data from 1st node.
 print(root[[1]][[3]]) #display 3rd component/element of 1st node.

 Convert XML file to Dataframe

 empdf<-xmlToDataFrame(“employee.xml”)

 Export dataframe to xml file:

• Install.packages(“kulife”) #not mandatory to install
 write.xml(new_emp,”newemp.xml”) #export dataframe new_emp to newemp.xml file.
4.6 Graphical data analysis :

4.6.1 Simple Graph: plot( )

• Plot() in R is used to plot points in a graph.
• It is a generic function that has many methods which are called based on the type of input object
passed to it.
• Plot() is basically used to create plot a scatter plot or line graph of 2 vectors. i.e it is used to plot 2
vectors against each other.
• It develops a 2-dimensional graph.
• syntax: plot(x, y, type, main, xlab, ylab, col, cex, pch, lwd, lty)
• x and y are two input vectors corresponds to X and Y axis resp.
• type is a code used to specify the type of plot
- “p” to plot points only
-“l” to plot line only
- “b” to plot both points and line
- “c” to join empty points with line
- “o” to plot both over-plotted pointes and line
- “h” to plot histogram plot
-“s” to plot stair steps
• main parameter is used to give title to plot
• xlab and ylab are used to specify labels of X and Y axis resp.
• col is used to specify colour of points and line.
• cex specifies the size of points. 1 is default size.
• pch is used to specify shape of points. Value of pch ranges from 0 to 25.
• lwd specifies line width. Default width is 1.
• lty specifies specifies line style. Line format ranges from 0 to 6.
- 0 removes line
- 1 displays solid line
- 2 displays dashed line
- 3 displays dotted line
-4 displays “dot dashed” line
- 5 displays “long dashed” line
- 6 displays “two dashed”(long and short dashes) line
• Example:
 months <-(1:12)
temp21<-c(19.5, 22.3, 24.4, 27.2, 31.9, 31.0, 30.5, 28.0, 27.4, 25.2, 23.1, 20.0)
plot(months, temp21, type="b", main="temperature-2021", col="blue", cex=2, pch=2, lwd=2,
lty=1)
> To change the label of x and y axis using xlab and ylab:
plot(months, temp21, type="b",ylab = "temperature" ,main="temperature-2021", col="blue",
cex=1, pch=2, lwd=2)

 Comparison of 2 plots using points( ) and lines( ) function:

 temp20<-c(18,20.3,22.4,24.2,27.9,29,30,27,25.4,23.2,21.1,20)
 plot(months, temp20, type="b", main="temp 2020 vs 2021", col="blue",cex=2, pch=2, lwd=2, lty=1)
 points(months,temp21,col="red", cex=2,type=“b")
OR
 lines(months,temp21,col="red", cex=2,type=“b")
 legend(x="topright", title=“years", legend=c(“2020", “2021"),fill=c("blue","red"))

Comparing 2 graphs:
>plot(months, temp21, type="b",ylab = "temperature",main="Temperature 2020 vs 2021",col
="blue", cex=1, pch=2, lwd=2)
> points(months,temp20,col="red", cex=2,type="b")
> legend(x="topright",title = "Years",legend = c("2020","2021"),fill=c("red","blue"))
Plot using Two different inputs (list and dataframe):

>plot(classA$rno, classA$marks, type="b",xlab= "Roll No.", ylab = "Percentage",

main="Class A vs Class B", col="orange", cex=1, pch=2, lwd=2)
> points(classbB$rno,classbB$marks,type = "b",col="green")
> legend(x="topright",title = "Class",legend = c("Class A","Class B"),fill = c("orange",
"green"))

 Save file using commands:

1. Save as jpeg image:
> jpeg(file="temparature.jpeg")
> plot(months, temp, type="b", main="temperature-2021", col="blue", cex=2, pch=2, lwd=2, lty=1)
> dev.off( )
Note: dev.off( ) is used to shut down current device. Here it closes down
current plot.

2. Save as png image:

> png(file="temparature.png")
> plot(months, temp, type="b", main="temperature-2021", col="blue",
cex=2, pch=2, lwd=2, lty=1)
> dev.off( )
OR
> png(file="temparature.png",width=600,height=350)

3. Save as bmp image:

> bmp(file="temparature.bmp”)
> plot(months, temp, type="b", main="temperature-2021", col="blue", cex=2, pch=2, lwd=2, lty=1)
> dev.off( )
OR
>bmp (file="temparature.bmp", width=6,height=4.5, units=“in”, res=100)

4. Save as pdf:
> pdf(file=“temparature.pdf”)
> plot(months, temp, type="b", main="temperature-2021", col="blue", cex=2, pch=2, lwd=2, lty=1)
> dev.off( )

4.6.2 Pie Chart

• Pie chart is a circular graph that indicates numerical proportions in slices.

• It is used to show contributions of slices into the entire graph.
• syntax: pie(x, labels, radius, main, col, clockwise)
• x is a input numeric vector.
• labels are used to specify descriptions of slices.
• radius of a circle.
• col used to give colors to slices of chart.
• clockwise indicates whether slices drawn clockwise or anti clockwise.
Example:
 books<-c(“Biography”, ”comic”, ”poetry”, “story”, “fashion magazines”,“Cookbook”, “Fiction”)
 readers<-c(20,30,50,25,32,40,35)
 pie(readers,labels=books, main="readers survey", col=rainbow(length(books)))

 How to calculate percentage of book reader for above pie chart:

 per<-round(readers/sum(readers)*100)
 lblread<-paste(books,”-”,per,”%”)
 pie(readers, labels=lblread, radius=0.5, main="readers survey",col=rainbow(length(books)))

 How to draw legend for pie chart:

 per2<-paste(round(readers/sum(readers)*100),"%")
 pie(readers, labels=per2, radius=0.7, main="readers survey",col=rainbow(length(books)))
 legend(x="topright",cex=0.7,title="book type", legend=books, fill=rainbow(length(books)))
 3D Pie Chart in R
• 3D Pie chart is created by using pie3D( ).
• plotrix package is required.
• install.packages(“plotrix”)
• library(“plotrix”)
 pie3D(readers, labels=books, explode=0.05, main="readers survey", col=rainbow(length(books)))

4.6.3 Bar Chart

• Bar chart is a graph with rectangular bars.
• It represents categorical data.
• The height of bars proportional to the values they represent.
• syntax: barplot(x, xlab, ylab, main, names.arg, col)
• x is a input numeric vector or matrix to represent bars.
• xlab and ylab are the labels for X and Y axes respectively.
• main is the title of bar chart.
• names.arg is the name of vectors appearing under each bar.
• col used to give colors to the bars.
Dataframe 1
 event #dataframe
ename scount
1 coding 15
2 paper 25
3 project 20
4 roborace 40
• Simple barchart:
 barplot (event$scount,names.arg=event$ename)
•barchart with some more parameters:
 barplot (event$scount,names.arg=event$ename,main = "Event Analysis",xlab = "Event names",
ylab ="No.of students",col = rainbow(4))

•change the width of each bar in barchart :

 barplot (event$scount,names.arg=event$ename,main = "Event Analysis",xlab = "Event names",
ylab ="No.of students",col = rainbow(4),width=c(0.2,0.5,0.4,1))

•barchart with different color palettes:

 barplot (event$scount,names.arg=event$ename,main = "Event Analysis",xlab = "Event names",
ylab ="No.of students",col = rainbow(4)) #as above graph
 barplot (event$scount,names.arg=event$ename,main = "Event Analysis",xlab = "Event names",
ylab ="No.of students",col=heat.colors(4))
 barplot (event$scount,names.arg=event$ename,main = "Event Analysis",xlab = "Event names",
ylab ="No.of students",col=terrain.colors(4))

 barplot (event$scount,names.arg=event$ename,main = "Event Analysis",xlab = "Event names",

ylab ="No.of students",col=topo.colors(4))

•Print numbers(values) in each bar using text function and make box around the graph :
 value <- barplot (event$scount,names.arg=event$ename,main = "Event Analysis",xlab = "Event
names",ylab ="No.of students",col=topo.colors(4))
 text(value , 0 , event$scount , cex=1 , pos=3)
 box() #creates a box around the graph
•Horizontal barchart :
 barplot (event$scount,names.arg=event$ename,main = "Event Analysis",ylab = "Event names",
xlab ="No.of students",col=topo.colors(4), horiz = TRUE)

Dataframe 2
> event1
ename boys girls
1 coding 15 20
2 paper 25 23
3 project 20 34
4 roborace 40 39
• for above dataframe simple bar chart can not be plotted. It has two categories of student(boys, girls).
We need to create grouped or stacked bar chart for it.
• first we need to create a matrix for those two columns columns.
 emat<-matrix(c(boys, girls), nrow=2,ncol=4,byrow=TRUE)
 rownames (emat)=c("boys","girls")
 colnames (emat)=c("coding", "paper", "project", "roborace")
 emat
Output:
coding paper project roborace
boys 15 25 20 40
girls 20 23 34 39

 Grouped barchart :
 barplot(emat,xlab="events",ylab="no.of.students",main="Event Analysis" ,names.arg=c("coding",
"paper","project","roborace"), col=c(7,2),beside=TRUE)
• using beside=TRUE, a grouped bar chart is created.
 Print numbers in each bar using text function:
• first store X-Y coordinates of bar chart in one vector
 img<barplot(emat,xlab="events",ylab="no.of.students",main="EventAnalysis" ,names.arg=c(
"coding","paper","project", "roborace"), col=c(7,2),beside=TRUE)
 text(img,0,emat,cex=1,pos=3)

 Stacked bar chart :

 img<barplot(emat,xlab="events",ylab="no.of.students",main="EventAnalysis" ,names.arg=c(
"coding","paper","project", "roborace"), col=c(7,2))
 legend(x="top",cex=1,legend=c("girls","boys"),fill=c(2,7))
• By omitting beside parameter stacked bar chart can be created.
 Another Example:
Creating a dataframe bookdata :
> bookdata
bid bname copies month
1 1 C 25 jan
2 2 C++ 12 jan
3 3 dbms 15 jan
4 4 java 20 jan
5 5 mongoDB 8 jan
6 1 C 11 feb
7 2 C++ 3 feb
8 3 dbms 23 feb
9 4 java 8 feb
10 5 mongoDB 10 feb

Converting the dataframe into matrix:

 bmat<-matrix(bookdata$copies,nrow = 5,ncol = 2)> colnames(bmat)<-c("jan","feb")> rownames

(bmat)<-c("C","C++","dbms","java","mongoDB")> bmat
jan feb
C 25 11
C++ 12 3
dbms 15 23
java 20 8
mongoDB 8 10

Plotting the data on the graph:

 Stacked bar chart:
 img<-
barplot(bmat,xlab="months",ylab="no.of.copies",main="monthwisebooksale",names.arg=c("jan","
feb"), col=rainbow(5))
 legend(x="topright",cex=1, legend=c("C","C++","DBMS","JAVA","MongoDB"), fill=rainbow(5))

4.6.4 Histogram:
 A histogram represents the frequencies of values of a variable bucketed into ranges.
 Histogram is similar to bar chat but the difference is it groups the values into continuous ranges.
 Each bar in histogram represents the height of the number of values present in that range.
 R creates histogram using hist() function. This function takes a vector as an input and uses some
more parameters to plot histograms.
 Syntax:
hist(v,main,xlab,xlim,ylim,breaks,col,border)
Following is the description of the parameters used :

 v is a vector containing numeric values used in histogram.

 main indicates title of the chart.
 col is used to set color of the bars.
 border is used to set border color of each bar.
 xlab is used to give description of x-axis.
 xlim is used to specify the range of values on the x-axis.
 ylim is used to specify the range of values on the y-axis.
 breaks is used to mention the width of each bar.
 v<-c(1,5,7,3,7,9,2,4,7,2,4,7,6,8,1)
 hist(v,xlab = "No.of books",main = "Histogram",xlim = c(0,10),ylim = c(0,10),col = "yellow",border
= "black")

4.6.5. Boxplot:
 A box graph is a chart that is used to display information in the form of distribution by drawing
boxplots for each of them.
 This distribution of data is based on five sets (minimum, first quartile, median, third quartile, and
maximum).
 Boxplots are created in R by using the boxplot() function.
 Syntax: boxplot(x, data, notch, varwidth, names, main)
- x: This parameter sets as a vector or a formula.
- data: This parameter sets the data frame.
- notch: This parameter is the label for horizontal axis.
- varwidth: This parameter is a logical value. Set as true to draw width of the box proportionate to the
sample size.
- main: This parameter is the title of the chart.
- names: This parameter are the group labels that will be showed under each boxplot.
 When we put notch=TRUE then output will be shown as below:

UNIT -2 R programming
No ratings yet
UNIT -2 R programming
32 pages
Data Science Wrangling
No ratings yet
Data Science Wrangling
121 pages
Systems Audit Template
No ratings yet
Systems Audit Template
4 pages
Office 365 Intro. and Tutorials
100% (1)
Office 365 Intro. and Tutorials
11 pages
R Programming Unit 2
No ratings yet
R Programming Unit 2
46 pages
R1_uptoVisualisation
No ratings yet
R1_uptoVisualisation
122 pages
Data Import, Export and Analysis using R
No ratings yet
Data Import, Export and Analysis using R
190 pages
Transfer Data From Microsoft Excel To Google Sheet - TheDataLabs
No ratings yet
Transfer Data From Microsoft Excel To Google Sheet - TheDataLabs
22 pages
Lec 5 Working with Files (1)
No ratings yet
Lec 5 Working with Files (1)
34 pages
2013 - Notes - R Trinker'S - Notes
No ratings yet
2013 - Notes - R Trinker'S - Notes
274 pages
DSR Block 2 All
No ratings yet
DSR Block 2 All
95 pages
Statistics With R Unit 1: Divya Arun Kumar
No ratings yet
Statistics With R Unit 1: Divya Arun Kumar
65 pages
m2
No ratings yet
m2
33 pages
2.02.R Data Export
No ratings yet
2.02.R Data Export
21 pages
New Features in CMG 2012 Software
No ratings yet
New Features in CMG 2012 Software
27 pages
01.Session-notes-Data Import
No ratings yet
01.Session-notes-Data Import
3 pages
Unit 2
No ratings yet
Unit 2
29 pages
Chap 1
No ratings yet
Chap 1
32 pages
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
No ratings yet
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
24 pages
Document Tips BPML Master List All
No ratings yet
Document Tips BPML Master List All
132 pages
Introduction To R Programming 1691124649
No ratings yet
Introduction To R Programming 1691124649
79 pages
Module II Notes - 1
No ratings yet
Module II Notes - 1
6 pages
R Studio Notes
No ratings yet
R Studio Notes
4 pages
R Lab
No ratings yet
R Lab
7 pages
How to Import an Excel File into R
No ratings yet
How to Import an Excel File into R
4 pages
R Basics Continued - Factors and Data Frames - Intro To R and RStudio For Genomics
No ratings yet
R Basics Continued - Factors and Data Frames - Intro To R and RStudio For Genomics
17 pages
Chapter 4
No ratings yet
Chapter 4
22 pages
Bundesen Property Management - Process Manual
No ratings yet
Bundesen Property Management - Process Manual
50 pages
M3 Dar
No ratings yet
M3 Dar
52 pages
HP iPAQ RX3115 Pocket PC Manual PDF
No ratings yet
HP iPAQ RX3115 Pocket PC Manual PDF
301 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
Lecture 4.pptx
No ratings yet
Lecture 4.pptx
27 pages
MQP R Answers
No ratings yet
MQP R Answers
19 pages
Introduction
No ratings yet
Introduction
2 pages
Module 3-2
No ratings yet
Module 3-2
35 pages
Excel Skills For Business: Intermediate II: Week 3: Automating Lookups
0% (1)
Excel Skills For Business: Intermediate II: Week 3: Automating Lookups
10 pages
R456
No ratings yet
R456
8 pages
updated question paper 2 ans
No ratings yet
updated question paper 2 ans
12 pages
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
No ratings yet
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
17 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
R programing
No ratings yet
R programing
12 pages
Ellam Avan Seyal
No ratings yet
Ellam Avan Seyal
2 pages
Homework 10 Quadratic Word Problems
50% (2)
Homework 10 Quadratic Word Problems
8 pages
Data Preparation: Treatment of Missing Values
No ratings yet
Data Preparation: Treatment of Missing Values
26 pages
Data Preparation: Handling Missing Values and Outliers
No ratings yet
Data Preparation: Handling Missing Values and Outliers
28 pages
Broomspatial
No ratings yet
Broomspatial
31 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
materi 4
No ratings yet
materi 4
30 pages
Brainalyst's VBA For Macros Guide
No ratings yet
Brainalyst's VBA For Macros Guide
71 pages
Getting Started With R
No ratings yet
Getting Started With R
7 pages
Rcourse3 PDF
No ratings yet
Rcourse3 PDF
35 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
Practical 1_Data Frame Manipulation_072502
No ratings yet
Practical 1_Data Frame Manipulation_072502
16 pages
Digital Literacy Level 6-Candidates Tool
100% (1)
Digital Literacy Level 6-Candidates Tool
3 pages
Modulel IV
No ratings yet
Modulel IV
48 pages
mod3 tables EPP
No ratings yet
mod3 tables EPP
9 pages
R - Solved QB Unit-II
No ratings yet
R - Solved QB Unit-II
14 pages
Week 7
No ratings yet
Week 7
10 pages
Unit 3 Basic Business Analysis Using R
No ratings yet
Unit 3 Basic Business Analysis Using R
9 pages
Ejemplo
No ratings yet
Ejemplo
52 pages
Empowerment Technologies Quarter 2 Module 1
No ratings yet
Empowerment Technologies Quarter 2 Module 1
44 pages
OM-CP - DATA LOOGER AND SOFTWARE OPERATING MANUAL v2.07.1
No ratings yet
OM-CP - DATA LOOGER AND SOFTWARE OPERATING MANUAL v2.07.1
129 pages
R Programming Notes
No ratings yet
R Programming Notes
23 pages
KingView 6.52 Introduction - E
No ratings yet
KingView 6.52 Introduction - E
108 pages
R WorkSamples
No ratings yet
R WorkSamples
44 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
ProgrammingForDS14_Rbasics
No ratings yet
ProgrammingForDS14_Rbasics
32 pages
UCI 104 Lecture 2 MS Excel PDF
No ratings yet
UCI 104 Lecture 2 MS Excel PDF
48 pages
Modeling and Planning With PowerPivot
No ratings yet
Modeling and Planning With PowerPivot
18 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
BRM Khushi 3sem
No ratings yet
BRM Khushi 3sem
62 pages
Spreadsheet Simulation
No ratings yet
Spreadsheet Simulation
6 pages
Data Visualization Using Spreadsheet - Theory Question Bank
No ratings yet
Data Visualization Using Spreadsheet - Theory Question Bank
6 pages
Pairwise Comparision - Template
No ratings yet
Pairwise Comparision - Template
4 pages
Briefcase Browser
No ratings yet
Briefcase Browser
64 pages
Ej Download EJ CONTROL
No ratings yet
Ej Download EJ CONTROL
3 pages
5.1 Assignment - Chapter 5- Revisd
No ratings yet
5.1 Assignment - Chapter 5- Revisd
9 pages
Sorting and Filtering
No ratings yet
Sorting and Filtering
4 pages
Commonly Used Shortcut Keys: Excel 2003
No ratings yet
Commonly Used Shortcut Keys: Excel 2003
12 pages
Time Line 1946-19501
No ratings yet
Time Line 1946-19501
3 pages
ACCENTURE
No ratings yet
ACCENTURE
2 pages
MO-211 OD MOS365 ExcelExpert
No ratings yet
MO-211 OD MOS365 ExcelExpert
3 pages
Pangasinan State University: Republic of The Philippines
No ratings yet
Pangasinan State University: Republic of The Philippines
5 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet

Data Analysis Using R - 4

Uploaded by

Data Analysis Using R - 4

Uploaded by

DAR

4. Working with Data

4.1.1 User-defined Function:

4.1.2 Lazy Evalution of Function:

Export data from R to text file

It is used to export data from R to external file.

1. read.csv()/read.csv2: comma separated file.

Export data from R to csv file

4.3 Import and Export data to/from Excel file :

 Skip first 2 columns from excel file.

 Changing column names .

 write.xlsx(demo,"emp.xlsx") # write data from demo dataframe to emp file

 write.xlsx(demo,"emp.xlsx",sheetName= “new_emp”, append=TRUE)

append parameter is used to append dataframe to existing file.

4.4 Database connectivity via ODBC :

 Import data, perform different operation on it :

1. RMySQL package: RMySQL is a database interface and MySQL driver for R.

2. Get Database information:

3. Show tables in connected database:

4. Show fields in any table:

5. Remove any table from database:

6. Read entire table from database:

7. Extract rows from table:

 market<-dbSendQuery(con, “select * from marketing;”)

8. Get count of number of rows affected by query:

 Export data to database

1. Execute various queries on database:

2. Clear data/free resources

3. Overwrite table in the database:

4. Append data to the table in the database:

5. Disconnect from database:

4.5 Import XML file:

• Import XML file:

• Extract root node of fetched file:

• Find number of nodes in xml file:

• Print specific node from file:

 Convert XML file to Dataframe

 Export dataframe to xml file:

4.6.1 Simple Graph: plot( )

 Comparison of 2 plots using points( ) and lines( ) function:

>plot(classA$rno, classA$marks, type="b",xlab= "Roll No.", ylab = "Percentage",

 Save file using commands:

2. Save as png image:

3. Save as bmp image:

4.6.2 Pie Chart

• Pie chart is a circular graph that indicates numerical proportions in slices.

 How to calculate percentage of book reader for above pie chart:

 How to draw legend for pie chart:

4.6.3 Bar Chart

•change the width of each bar in barchart :

•barchart with different color palettes:

 barplot (event$scount,names.arg=event$ename,main = "Event Analysis",xlab = "Event names",

 Stacked bar chart :

Converting the dataframe into matrix:

 bmat<-matrix(bookdata$copies,nrow = 5,ncol = 2)> colnames(bmat)<-c("jan","feb")> rownames

Plotting the data on the graph:

 v is a vector containing numeric values used in histogram.

You might also like