0% found this document useful (0 votes)
14 views

04 Data Interfaces in R

R data interfqce

Uploaded by

wilson.actuary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

04 Data Interfaces in R

R data interfqce

Uploaded by

wilson.actuary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Data Interfaces in R

Tushar B. Kute,
http://tusharkute.com
Data Interfaces

• In R, we can read data from files stored outside


the R environment.
• We can also write data into files which will be
stored and accessed by the operating system.
• R can read and write into various file formats
like csv, excel, xml etc.
Getting and setting working directory

• You can check which directory the R workspace


is pointing to using the getwd() function.
• You can also set a new working directory using
setwd()function.
Input as CSV file

• The csv file is a text file in which the values in the columns are
separated by a comma. Let's consider the following data
present in the file named input.csv.
• You can create this file using windows notepad or Ubuntu
gedit by copying and pasting this data. Save the file as
input.csv using the save As All files(*.*) option in
notepad/gedit.
Reading a CSV file
Analysing a CSV file
Analysing a CSV file
Writing into a CSV file
XLSX file

• Microsoft Excel is the most widely used


spreadsheet program which stores data in the
.xls or .xlsx format.
• R can read directly from these files using some
excel specific packages.
• Few such packages are - XLConnect, xlsx, gdata
etc.
• We will be using xlsx package. R can also write
into excel file using this package.
XLSX package installation

• You can use the following command in the R


console to install the "xlsx" package.
• It may ask to install some additional packages
on which this package is dependent.
• Follow the same command with required
package name to install the additional
packages.

install.packages("xlsx")
Verify installation
Create an xlsx file
Reading xlsx file

• The input.xlsx is read by using the read.xlsx()


function as shown below. The result is stored as
a data frame in the R environment.

# Read the first worksheet in the file input.xlsx.


data <- read.xlsx("input.xlsx", sheetIndex = 1)
print(data)
Reading xlsx file
The XML file

• XML is a file format which shares both the file format and
the data on the World Wide Web, intranets, and elsewhere
using standard ASCII text.
• It stands for Extensible Markup Language (XML). Similar to
HTML it contains markup tags. But unlike HTML where the
markup tag describes structure of the page, in xml the
markup tags describe the meaning of the data contained
into the file.
• You can read a xml file in R using the "XML" package. This
package can be installed using following command.

install.packages("XML")
Input data

• Create a XMl file by copying the below data into a text editor.
Save the file with a .xml extension and choosing the file type as
all files(*.*).

<RECORDS>
<EMPLOYEE>
<ID>1</ID>
<NAME>Param</NAME>
<SALARY>623.3</SALARY>
<STARTDATE>1/1/2012</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE> …….
Reading XML file

• The xml file is read by R using the function xmlParse(). It is stored as a list in R.

# Load the package required to read XML files.


library("XML")

# Also load the other required package.


library("methods")

# Give the input file name to the function.


result <- xmlParse(file = "input.xml")

# Print the result.


print(result)
Reading XML file
Get number of nodes
Get details of first node
Get different elements of the node
XML to data frame
• # Load the packages required to read XML files.
library("XML")
library("methods")
# Convert the input xml file to a data frame.
xmldataframe <- xmlToDataFrame("input.xml")
print(xmldataframe)
Web data

• Many websites provide data for consumption by its users.


• For example the World Health Organization(WHO)
provides reports on health and medical information in the
form of CSV, txt and XML files.
• Using R programs, we can programmatically extract
specific data from such websites.
• Some packages in R which are used to scrap data from the
web are − "Rcurl", “XML", and "stringr".
• They are used to connect to the URL’s, identify required
links for the files and download them to the local
environment.
Install and input

• Install R Packages
– The following packages are required for processing the URL’s and
links to the files. If they are not available in your R Environment, you
can install them using following commands.

install.packages("RCurl")
install.packages("XML")
install.packages("stringr")
install.packages("plyr")

• Input Data
– We will visit the URL <http://www.geos.ed.ac.uk/~weather/jcmb_ws/>
weather data and download the CSV files using R for the year 2015.
Example:

• We will use the function getHTMLLinks() to


gather the URLs of the files.
• Then we will use the function download.file() to
save the files to the local system.
• As we will be applying the same code again and
again for multiple files, we will create a function
to be called multiple times.
• The filenames are passed as parameters in form
of a R list object to this function.
Example:
# Read the URL.
url <- "http://www.geos.ed.ac.uk/~weather/jcmb_ws/"

# Gather the html links present in the webpage.


links <- getHTMLLinks(url)

# Identify only the links which point to the JCMB 2015 files.
filenames <- links[str_detect(links, "JCMB_2015")]

# Store the file names as a list.


filenames_list <- as.list(filenames)

# Create a function to download the files by passing the URL and filename list.
downloadcsv <- function (mainurl,filename) {
filedetails <- str_c(mainurl,filename)
download.file(filedetails,filename)
}

# Now apply the l_ply function and save the files into the current R working
directory.
l_ply(filenames,downloadcsv,mainurl =
"http://www.geos.ed.ac.uk/~weather/jcmb_ws/")
Web data download
Verify file download

• After running the above code, you can locate


the following files in the current R working
directory.
– "JCMB_2015.csv"
– "JCMB_2015_Apr.csv"
– "JCMB_2015_Feb.csv"
– "JCMB_2015_Jan.csv"
– "JCMB_2015_Mar.csv"
Connecting to MySQL

• The data is Relational database systems are stored in


a normalized format.
• So, to carry out statistical computing we will need
very advanced and complex Sql queries.
• But R can connect easily to many relational databases
like MySql, Oracle, Sql server etc. and fetch records
from them as a data frame.
• Once the data is available in the R environment, it
becomes a normal R data set and can be manipulated
or analyzed using all the powerful packages and
functions.
The RMySQL package

• R has a built-in package named "RMySQL" which


provides native connectivity between with
MySql database.
• You can install this package in the R
environment using the following command.

install.packages("RMySQL")
Connecting R to MySQL

• Once the package is installed we create a connection object


in R to connect to the database.
• It takes the username, password, database name and host
name as input.
# Create a connection Object to MySQL database.
# We will connect to the sample database named "testdb"
that comes with MySql installation.
mysqlconnection = dbConnect(MySQL(), user = 'root',
password = 'epsilon', dbname = 'testdb', host=
'localhost')

# List the tables available in this database.


dbListTables(mysqlconnection)
Connecting R to MySQL
Querying the tables

# Query the "college" tables to get all the rows.


result = dbSendQuery(mysqlconnection, "select *
from COLLEGE")

# Store the result in a R data frame object. n = 3


is used to fetch first 3 rows.
data.frame = fetch(result, n = 3)
print(data.frame)
Querying the tables
Querying the filters

result = dbSendQuery(mysqlconnection, "select


* from COLLEGE where INCOME > 20000")

# Fetch all the records(with n = -1) and store


it as a data frame.
data.frame = fetch(result, n = -1)
print(data.frame)
Querying the filters
Update
Insert
Insert
Insert
Useful resources
Thank you
This presentation is created using LibreOffice Impress 4.2.8.2, can be used freely as per GNU General Public License

Web Resources Blogs


http://mitu.co.in http://digitallocha.blogspot.in
http://tusharkute.com http://kyamputar.blogspot.in

[email protected]

You might also like