0% found this document useful (0 votes)
14 views

Session - 3 - 4 - Data Types and Data Structures

This document discusses different data types and structures used in R, including vectors, matrices, arrays, data frames, and lists. Vectors are one-dimensional arrays that can hold numeric, character, or logical data. Matrices are two-dimensional arrays where each element has the same mode. Arrays can have more than two dimensions. Data frames allow different columns to contain different data types. Lists are ordered collections that can contain different object types.

Uploaded by

Rajkumar Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Session - 3 - 4 - Data Types and Data Structures

This document discusses different data types and structures used in R, including vectors, matrices, arrays, data frames, and lists. Vectors are one-dimensional arrays that can hold numeric, character, or logical data. Matrices are two-dimensional arrays where each element has the same mode. Arrays can have more than two dimensions. Data frames allow different columns to contain different data types. Lists are ordered collections that can contain different object types.

Uploaded by

Rajkumar Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Data Types & Structures

Understanding datasets

Database Terminology: Records and Fields Age: Continuous Variable


Statisticians: Observations and Variables Diabetes: Categorical Variable
Machine Learning: Observations and Attributes Status: Categorical Variable

Source: R in Action, Manning Publications


R Datatypes
Numeric PatientID
Age
AdmDate
Character Diabetes
[Categorical Variables = Factor] Status
Logical True / False

Source: R in Action, Manning Publications


R Data Structures

Source: R in Action, Manning Publications


Vectors
Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data. The combine
function c() is used to form the vector.
a <- c(1, 2, 5, 3, 6, -2, 4) [Numeric Vector]
b <- c("one", "two", "three") [Character Vector]
c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE) [Logical Vector]

NOTE Scalars are one-element vectors. Examples include


f <- 3,
g <- "US",
and h <- TRUE. They’re used to hold constants.
Matrices
A matrix is a two-dimensional array in which each element has the same mode (numeric, character, or
logical). Matrices are created with the matrix function. The general format is

Where
vector contains the elements for the matrix
nrow and ncol specify the row and column dimensions,
dimnames contains optional row and column labels
The option byrow indicates whether the matrix should be filled in by row (byrow=TRUE) or by column
(byrow=FALSE).
Creating a Matrix
y <- matrix(1:20, nrow=5, ncol=4)
y <- matrix(1:20, nrow=5, ncol=4, byrow = TRUE)
y <- matrix(1:20, nrow=5, ncol=4, byrow = False)

cells <- c(1,26,24,68) [Value Vector]


rnames <- c("R1", "R2") [Row name Vector]
cnames <- c("C1", "C2") [Column name Vector]

mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,


dimnames=list(rnames, cnames))

mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=FALSE,


dimnames=list(rnames, cnames))

X[i,] refers to the ith row of matrix X, X[,j] refers to the jth column, and
X[i, j] refers to the ij th element, respectively.
Creating a Matrix - Exercise
Create the following Matrix with name GroundsandWins

Delhi Chennai Calcutta Colombo Dhaka Karachi Lahore


India 23 17 18 8 7 3 3
Pakistan 2 2 1 4 6 14 17
Bangladesh 1 1 1 2 30 1 1
Srilanka 4 2 3 40 4 2 2
Extract the number of Wins by India in Chennai, Calcutta and Dhaka
Extract the number of Wins at Karachi by Pakistan and Bangladesh
Creating a Matrix - Solution
Delhi Chennai Calcutta Colombo Dhaka Karachi Lahore
India 23 17 18 8 7 3 3
Pakistan 2 2 1 4 6 14 17
Bangladesh 1 1 1 2 30 1 1
Srilanka 4 2 3 40 4 2 2
cells <- c(23,17,18,8,7,3,3,2,2,1,4,6,14,17,1,1,1,2,30,1,1,4,2,3,40,4,2,2)
rnames <- c("India", "Pakistan", "Bangladesh", "Srilanka")
cnames <- c("Delhi", "Chennai", "Calcutta", "Colombo", "Dhaka", "Karachi", "Lahore")
GroundsandWins <- matrix(cells, nrow=4, ncol=7, byrow=FALSE, dimnames=list(rnames, cnames))

GroundsandWins["India",c("Chennai", "Calcutta", "Dhaka")]


GroundsandWins[c("Pakistan","Bangladesh"),"Karachi"]
Arrays
Arrays are similar to matrices but can have more than two dimensions. They’re created
with an array function of the following form:

myarray <- array(vector, dimensions, dimnames)

Where
vector contains contains the data for the array
dimensions is a numeric vector giving the maximal index for each dimension
dimnames is an optional list of dimension labels
Arrays
Arrays are similar to matrices but can have more than two dimensions. They’re created
with an array function of the following form:

dim1 <- c("A1", "A2")


dim2 <- c("B1", "B2", "B3")
dim3 <- c("C1", "C2", "C3", "C4")
z <- array(1:24, c(2, 3, 4), dimnames=list(dim1, dim2, dim3))
Z

If I must store 36 elements (1 to 36) what should I change in the above code ?
Data Frames
A data frame is more general than a matrix in that different columns can contain different
modes of data (numeric, character, and so on).

patientID <- c(1, 2, 3, 4)


age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(patientID, age, diabetes, status)
patientdata
Data Frames – With Factors
patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
diabetes <- factor(diabetes)
status <- factor(status, order=TRUE, levels=c("Poor", "Improved",
"Excellent"))
patientdata <- data.frame(patientID, age, diabetes, status)
str(patientdata)
summary(patientdata)
Data Frames – Exercise
Create a Data Frame:
First Column = country (India, Canada, Sudan, Nepal)
Second Column = continent (Asia, North America, Africa, Asia)
Third Column = economy (developing, developed, under developed, developing)
Treat Third Column as a Ordinal Factor under developed << developing << developed
Data Frames – Solution
Create a Data Frame:
country <- c("India", "Canada", "Sudan", "Nepal")
continent <- c("Asia", "North America", "Africa", "Asia")
economy <- c("Developing","Developed","Under Developed","Developing")
economy <- factor(economy, order = TRUE, levels = c("Under Developed", "Developing", "Developed"))
country_frame = data.frame(country,continent,economy)
Lists
A list is an ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated)
objects under one name.

g <- "My First List“


h <- c(25, 26, 18, 39)
j <- matrix(1:10, nrow=5)
k <- c("one", "two", "three")
mylist <- list(title=g, ages=h, j, k)

Try to extract the third element of the list

You might also like