CH 3
CH 3
Loading and
Handling Data in
R
Copyright © 2018
Assignment Operator
Operator Example
= X=5
<- X<-5
-> 5->x
<<- X<<-2
Expression, Variables and Functions
Operation Operator Description
Addition x+y y added to x
Subtraction x-y y subtracted from x
Expressions
Multiplication x*y x multiplied by y
Operator Description
Dates
The default format of date is “YYYY-MM-DD”
Print system’s date
Variables
(i) Assign a value of 50 to the variable by the name, “Var”.
Or
• Vectors- a = c(1,2,4,5,6)
• Sort(), #replace a[1]<-5, #index
Data Type
b=c(“abc”, “ddd”, “ccc”)
a[1:3] #first three position will be printed
• Lists #character a = list(“aa”, 55, 33, ‘bb’) b = list(“da”, 55, 1, ‘bb’)
• #merge list merge(a,b)
• C=list(a,b)
• Arrays – store data in more than 2 parameters
• arr1 = c(11,13,15,16,14)
• arr2 = c(55,54,2,15,12)
• arr3 = array(c(arr1,arr2),dim = c(3,3,5))
• Matrices
• Mtr = matrix(c(arr1,arr2),4,4)
• Factors
• Fact1 = factor(arr1)
• Data Frames
• data <- data.frame(x1 = 1:5, x2 = 2:6, x3 = 1:7)
• data1<-(“aa”, “bb”, “cc”, “dd”)
• Merge
Vectors
• Vectors are stored like arrays in C
• Vector indices begin at 1
• All Vector elements must have the same mode such as integer,
numeric (floating point number), character (string), logical (Boolean),
complex, object etc.
The c function (c is short for combine) creates a new vector consisting of three
values: 4, 7, and 8.
Vectors
A vector cannot hold values of different data types. Consider the example below.
We are trying to place integer, string and boolean values together in a vector.
Note: All the values are converted to the same data type, i.e. “character”.
Vectors
Accessing the value (s) in the vector
Create a variable by the name, “VariableSeq” and assign to it a vector consisting of
string values.
• Access values in a vector, specify the indices at which the value is present in the
vector. Indices start at 1.
Vectors
Vector math
Matrices
Create a matrix, “mat”, 3 rows high and 4 columns wide using a vector
Access the element present in the 2nd row and 3rd column of the matrix, “mat”.
Matrices
To access the 2nd column of the matrix, simply provide the column number and
omit the row number.
To access the 2nd and 3rd columns of the matrix, simply provide the column
numbers and omit the row number.
List
To create a list, “emp” having three elements, “EmpName”, “EmpUnit”, “EmpSal”.
To get the elements of the list, “emp” use the below command.
Output:
Delete an element with the name “EmpUnit” and value “IT” from the list, “emp”.
Recursive list
A recursive list means a list within a list.
Let us begin with two lists, “emp” and “emp1”.
The elements in both the lists are as shown below:
Combine both the lists into a single list by the name “EmpList”.
Functions Function Arguments Description
substr(a, start stop) Manipulating Text in Data
∙
∙
a is a character vector The function returns part of the string
Start and stop arguments contain a numeric starting from start argument and ends at
value the stop argument.
strsplit(a, split, …) ∙ a is a character vector The function splits the given text string
∙ Split is also a character vector that contains a into substring.
regular expression for splitting.
paste(…, sep= “”, …) ∙ The dots “…” define R objects The function concatenates string vectors
∙ sep argument is a character string for after converting the objects into strings.
separating objects
∙ paste('a',1:5,sep = ‘ ')
grep(pattern, a) ∙ Pattern argument contains matching pattern The function returns string after
∙ a is a character vector searching for a text pattern into a given
∙ x <- c("d", "a", "c", "abba") text string.
∙ grep("a", x)
∙ grepl("a", x)
class(dataset) Dataset argument contains the name of a dataset The function displays the class of the dataset.
dim(dataset) Dataset argument contains the name of a dataset The function returns the dimension of the dataset
which implies the total number of rows and columns
of the dataset.
table(dataset$variablenames) Dataset argument contains name of dataset The function returns the number of categorical value
Variable name contains the name of the variable names after counting it.
Aggregating and Group Processing of a Variable
The syntax of the aggregate() function is as follows:
aggregate(x, …) or
tapply (x, …) or
Find out mean ticket price for class type and for multiplex.
Selecting
• Install dplyr and tidyr packages
variables
library(dplyr)
• library(dplyr)
• library(dplyr)
• library(dplyr)
Reading Spreadsheets
read.xlsx(“filename”,…)
where, filename argument defines the path of the file to be read; the
dots “…” define the other optional arguments.