0% found this document useful (0 votes)
34 views11 pages

Reading Data in R

This document discusses reading different types of data files as the first step in data preparation. It covers reading flat files using read.table(), reading Excel files using the xlsx package, reading XML files using the XML package, and reading JSON files using the jsonlite package. The goal of data preparation is to collect and clean raw data from various sources into a clean format for analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views11 pages

Reading Data in R

This document discusses reading different types of data files as the first step in data preparation. It covers reading flat files using read.table(), reading Excel files using the xlsx package, reading XML files using the XML package, and reading JSON files using the jsonlite package. The goal of data preparation is to collect and clean raw data from various sources into a clean format for analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Reading data

Foundations of Data Analytics


Data Preparation

Vellore Institute of Technology, Chennai

July 29,2020

Data Preparation
Reading data

Outline

Motivation
Introduction
Reading data

Data Preparation
Reading data

Motivation

Types of Data
Structured data - Excel file
Semi-structured data - JSON, XML file
-https://json.org/example.html
Unstructured data -text file
-https://rdp.cme.msu.edu/tutorials/init_process/
RDPtutorial_INITIAL-PROCESS.html
Data storage
In databases NOSQL or MONGODB
In websites

Data Preparation
Reading data

Introduction

First step in Data Analytics


Collecting data from differnt sources which may be in various
formats such as flat files (.csv, .txt),Excel files, JSON, XML etc.

Data Collection
Data Cleaning
Data Understanding

Raw data —> Clean data —> Data Analysis

Data Preparation
Reading flat files
Reading Excel files
Reading data
Reading XML file
Reading JSON file

Reading flat files

Text or CSV files - Use read.table()


The read.table function is one of the most commonly used
functions for reading data. It has a few important arguments:
file, the name of a file, or a connection
header, logical indicating if the file has a header line
sep, a string indicating how the columns are separated
colClasses, a character vector indicating the class of each
column in the dataset
nrows, the number of rows in the dataset
comment.char, a character string indicating the comment
character
skip, the number of lines to skip from the beginning
stringsAsFactors, should character variables be coded as
factors?

Data Preparation
Reading flat files
Reading Excel files
Reading data
Reading XML file
Reading JSON file

Reading flat files (contd.)

#Using read.table()

loan <- read.table("loans data.csv",header = TRUE,sep


= ",")
str(loan)

#Using read.csv()

loan <- read.csv("loans data.csv"


str(loan)

Data Preparation
Reading flat files
Reading Excel files
Reading data
Reading XML file
Reading JSON file

Reading Excel files

#Reading Excel file


#You need to install xlsx package
install.packages("xlsx")

#Load the package


library(xlsx)

#Read the data


loan<-read.xlsx("loan.xlsx",sheetIndex=1, header=TRUE)

Data Preparation
Reading flat files
Reading Excel files
Reading data
Reading XML file
Reading JSON file

Reading XML file


#Reading XML file
#You need to install XML package and load it
install.packages("XML")
library(XML)

#Load the package httr to work with Urls and http


library(httr)

fileurl <- "https://www.w3schools.com/xml/simple.xml"


xmldata <- GET(fileurl)
doc <- xmlTreeParse(xmldata,useInternal=TRUE)
root <- xmlRoot(doc)
xmlName(root)
names(root)
Data Preparation
Reading flat files
Reading Excel files
Reading data
Reading XML file
Reading JSON file

Reading XML file (contd.)

#Accessing parts of xml file in the same way as list


root[[1]] #accessing 1st food
root[[1]][[1]] #accessing name of the 1st food

#Extracting parts of XML file


xmlSApply(root,xmlValue)

#Extracting individual nodes of XML file


xpathSApply(root,"//name",xmlValue)
xpathSApply(root,"//price",xmlValue)

Data Preparation
Reading flat files
Reading Excel files
Reading data
Reading XML file
Reading JSON file

Reading JSON file

#Loading jsonlite package


library(jsonlite)
jdata <- fromJSON("https://api.github.com/users/jtleek/repos")
names(jdata)

#Extracting nested objects


names(jdata$owner)
jdata$owner$login

#writing to json file


jfile <- toJSON(iris,pretty = TRUE)
cat(jfile)

Data Preparation
Reading flat files
Reading Excel files
Reading data
Reading XML file
Reading JSON file

References

Getting and Cleaning data - Coursera


XML package http://www.omegahat.net/RSXML/Tour.pdf
jsonlite package https://www.r-bloggers.com/
new-package-jsonlite-a-smarter-json-encoderdecoder/

Data Preparation

You might also like