0% found this document useful (0 votes)
42 views30 pages

Curso 2 Data in Out Listas

This document provides an introduction to reading and writing data in R. It discusses using functions like read.table(), read.csv(), scan(), and cat() to input data from files and the keyboard. It also covers using write.table(), cat(), and other functions to output data to files or the screen. The document demonstrates how to import data from common formats like Excel, SPSS, Stata, and JSON files and export data to Excel, SPSS, Stata, and other formats.

Uploaded by

iiSHRii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views30 pages

Curso 2 Data in Out Listas

This document provides an introduction to reading and writing data in R. It discusses using functions like read.table(), read.csv(), scan(), and cat() to input data from files and the keyboard. It also covers using write.table(), cat(), and other functions to output data to files or the screen. The document demonstrates how to import data from common formats like Excel, SPSS, Stata, and JSON files and export data to Excel, SPSS, Stata, and other formats.

Uploaded by

iiSHRii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 1

INTRODUCTION TO R
Introducción a la Ciencia de Datos

Coral del Val Muñoz

Dept. Ciencias de la Computación e Inteligencia Artificial,


Universidad de Granada
Dept. Molecular Biophysics, German Cancer Research Center Heidelberg, Alemania
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores

Index
• Introduction to R • Data input and Output
• Rstudio • Examining Datasets
• Getting Started - R Console • Selecting subsets
• Help • Merging datasets
• R-workspace • Numerical Sumaries
• Packages • Useful functions
• Data types and Structures
• Vectors
• Missing and special values
• Matrices and Arrays
• Factors
• Lists
• Data frames
• Indexing
• Conditional indexing
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores

Lists
• Lists can be used to combine objects (of possibly different kinds/sizes) into a
larger composite object.
• The components of the list are named according to the arguments used.
• Components can be extracted with the double bracket operator [[ ]]
• Alternatively, named components can be accessed with the "$" separator.

> A<-c(31,32,40) Indexing Lists


> S<-as.factor(c("F","M","M","F"))
> People<-list(age=A,sex=S) > People[[1]]
> People [1] 31 32 40
$age
[1] 31 32 40 > People$age
$sex [1] 31 32 40
[1] F M M F
Levels: F M
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores

Names
Names of an R object can be accessed and/or modified with
the names() function.
z <- list(a = 1, b = "c", c = 1:3)
> z
$a
[1] 1
# change just the name of the third element.
$b names(z)[3] <- "c2”
[1] "c" z
$c
[1] 1 2 3 $a
[1] 1

$b
[1] "c"

$c2
[1] 1 2 3
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 5

Input/Output: Keyboard and Monitor


•Suppose we have a file (file.txt) with this content:
12
2 5
641
scan("file.txt")
Read 4 items
[1] 12 2 5 641
scan("file.txt",what=character())
Read 4 items
[1] "12" "2" "5" “641"
scan(“file.txt",sep="\n")
Read 3 items
[1] 12 25 641
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 6

Input/Output: Keyboard and Monitor


• Use scan() to read from the keyboard by specifying
an empty string for the filename:
scan("")
1: 23 4
3: 2
4:
Read 3 items
[1] 23 4 2

• Note that we are prompted with the index of the next


item to be input, and we signal the end of input with
an empty line.
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 7

Input/Output: Keyboard and Monitor


• To read in a single line from the keyboard use
readline():
readline("Input data: ")
Input data: 23 4 2
[1] "23 4 2"

• Note that we are prompted with the index of the


next item to be input, and we signal the end of
input with an empty line.
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 8

Input/Output: Print to the screen


•print() is a generic function, so the function
call depends on the class of the object that is
printed.
• If, for example, the argument is of class table,
then the print.table() function will be
called.
x <- 1:3

print(x^2)

[1] 1 4 9
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 9

Input/Output: Print to the screen


• It is better to use cat() instead of print(), as
the latter can print only one expression and its
output is numbered:

x <- 1:3
print(x^2)
[1] 1 4 9
cat(x^2)
1 4 9
cat(x^2, x, "hola")
1 4 9 1 2 3 hola
cat(x^2, x, "hola", sep="_")
1_4_9_1_2_3_hola
1
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores
0

Input/Output: Reading and Writing files


• We will use of the function read.table() to
read in a data frame.
• Suppose we have a file matrix.txt with the
following content:

nombre edad
John 25
Mary 28
Jim 19
1
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores
1

Input/Output: Reading and Writing files


• The first line contains an optional header, specifying column
names. We could read the file this way:
read.table("matrix.txt",header=TRUE)
nombre edad
1 John 25
2 Mary 28
3 Jim 19

• Note that scan() would not work here, because our file
has a mixture of numeric and character data (and a
header).
1
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores
2

Input/Output: Reading and Writing files


• If we want to write a file, we change
read.table() for write.table() function:

write.table(matrix(1:6, nrow=2), "output.txt",


row.names=FALSE, col.names=FALSE)

output.txt:
1 3 5
2 4 6
1
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores
3

Input/Output: Reading and Writing files


• The function cat() can also be used to write
to a file, one part at a time:

cat("abc\n",file="u.txt")
cat("de\n",file="u.txt",append=TRUE)

u.txt:
abc
de
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 14

Exporting Data: cat()


R objects can be exported to a text file using the cat() function:

cat (x , file = "", sep = " ", fill = FALSE, labels


= NULL, append = FALSE)

x: R object
file: character string naming the file to print to. If "" (the
default), cat prints to the console unless redirected by sink.
sep: a character vector of strings to append after each element
fill: controls how the output is broken into successive lines.
append: logical. If TRUE output will be appended to file;
otherwise, it will overwrite the contents of file.
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores

Read data from an excel: read.csv()

• Use R to read the file in .csv format:

# first row contains variable names, comma is


separator
# assign the variable id to row names
# note the “/” instead of “\” on mswindows systems

mydata <- read.csv("c:/mydata.csv", header=TRUE,


sep=",", row.names="id")
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 16

read.delim()
• They are intended to read TAB separated files

read.delim(file, header = TRUE, sep = "\t", dec=".”,


fill =TRUE, ...)

• sep: the field separator character. “\t” (default for


read.delim) stands for TAB separator;

•fill: if TRUE then in case the rows have unequal length,


blank fields are implicitly added
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores

Read data from an excel: read.csv2()


• Use R to read the file in .csv format from countries that
use a comma (“,”) as decimal point and a semicolon
(“;”) as field separator.
# first row contains variable names, comma is
separator
# assign the variable id to row names
# note the “/” instead of “\” on mswindows systems

mydata <- read.csv2("c:/mydata.csv", header=TRUE,


sep=“;", dec=",“, row.names="id")
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores

Read data from SPSS: spss.get()

# Import international.sav as a data frame: demo


demo <- read.spss("international.sav",
to.data.frame = TRUE)

Read data from SPSS: read.dta()


# input Stata file
library(foreign)
mydata <- read.dta("c:/mydata.dta")
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores

Read data from JSON Files: fromJSON()


# Activate `rjson`
library(rjson)

# Import data from json file


JsonData <- fromJSON(file= "<fichero.json>" )

# Import data from json file through an URL


JsonData
<- fromJSON(file= "<URL al fichero JSON >" )
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 20

Exporting Data
There are numerous methods for exporting R objects into
other formats . For SPSS, SAS and Stata you will need to
load the foreign packages. For Excel, you will need the
xlsReadWrite package.

• To an Excel Spreadsheet
library(xlsReadWrite)write.xls(mydata, "c:mydata.xls")
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 21

Writing data frames


• write() writes out a matrix or vector in a specified number
of columns.

• write.table() writes out a data frame (or an object that can


be coerced to a data frame) with row and column labels

write.table(mydata, "c:/mydata.txt", sep="\t")

write.table(x, file = "", append = FALSE, sep = "


“, na = "NA", dec = ".", row.names = TRUE,
col.names = TRUE)
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 22

Source Codes: Input


The input can come from a script file (a file containing R commands)

The source( ) function runs a script in the current session. If the filename
does not include a path, the file is taken from the current working
directory.
# input a script
source("myfile")
Máster Universitario OficialComputing
Applied Statistical en Ciencia and
de Datos e Ingeniería de Computadores
Graphics 23

Source Codes: Output


The sink( ) function defines the direction of the output.

# output directed to output.txt


# output overwrites existing file. no output to
terminal.
sink("myfile.txt", append=TRUE, split=TRUE)

# return output to the terminal


sink()
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 24

Anexos
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 25

Useful Packages for Data Analysis


Pre-modeling Post-modeling
Modeling stage
stage stage

Data visualization: Continuous regression:


ggplot2, googleVis car, randomforest General Model Validation:
LSMeans, Comparison
Data Transformation: Ordinal Regression:
plyr, dplyr, data.table Rminer, CoreLearn
Regression Validation:
Missing value Imputations: Classification: RegTest, ACD
Missforest, MissMDA Caret, BigRF

Outliers Detection: Clustering:


Outliers, EVIR CBA, RankCluster Clasification Validation:
ClustEval, SIgClust
Feature selection: Time Series:
Features, RRF, Boruta forecast, LTSA
ROC Analysis: PROC,
Dimension Reduction: Survival: TimeROC
FactoMineR, CCP survival, Basta
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores

General Subsetting Rules

Subsetting syntax:

# Subsetting of one dimensional objects (e.g.vectors,


factors)

my_object[row]

# Subsetting of two dimensional objects,(e.g. matrices,data


frames).

my_object[row, col]

# Subsetting of three dimensional objects, like arrays.


my_object[row, col, dim]
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores

General Subsetting Rules


There are three possibilities to subset data objects

(1) Subsetting by positive or negative index/position numbers

# Creates a vector sample with named elements.


my_object <- 1:26;
names(my_object) <- LETTERS

# Returns the elements 1-4.


my_object[1:4]

# Excludes elements 1-4.


my_object[-c(1:4)]
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores

General Subsetting Rules


There are three possibilities to subset data objects

(2) Subsetting by same length logical vectors

# Generates a logical vector as example.


my_logical <- my_object > 10

# Returns the elements where my_logical contains TRUE


values.
my_object[my_logical]

(3) Subsetting by field names

# Returns the elements with element titles: B, K, M


my_object[c("B", "K", "M")]
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 29

Character Functions
Function Description
substr(x, start=n1, stop=n2) Extract or replace substrings in a character vector.
x <- "abcdef"
substr(x, 2, 4) is "bcd"
substr(x, 2, 4) <- "22222" is "a222ef"

grep(pattern, x , Search for pattern in x. If fixed =FALSE then pattern is a regular expression. If
ignore.case=FALSE, fixed=FALSE) fixed=TRUE then pattern is a text string. Returns matching indices.
grep("A", c("b","A","c"), fixed=TRUE) #returns 2

sub(pattern, replacement, x, Find pattern in x and replace with replacement text. If fixed=FALSE then pattern is
ignore.case =FALSE, fixed=FALSE) a regular expression.
If fixed = T then pattern is a text string.
sub("\\s",".","Hello There") returns "Hello.There"

strsplit(x, split) Split the elements of character vector x at split.


strsplit("abc", "") returns 3 element vector "a","b","c"

paste(..., sep="") Concatenate strings after using sep string to seperate them.
paste("x",1:3,sep="") returns c("x1","x2" "x3")
paste("x",1:3,sep="M") returns c("xM1","xM2" "xM3”)

toupper(x) Uppercase
tolower(x) Lowercase
Máster Universitario Oficial en Ciencia de Datos e Ingeniería de Computadores 30

Gracias…

You might also like