0% found this document useful (0 votes)

25 views6 pages

Indexing Exercises

The document is a guided tutorial on indexing data in R, covering three main methods: by position, logical vector, and name. It provides examples of indexing vectors, matrices, arrays, lists, and data frames, illustrating how to manipulate and access data effectively. The tutorial also includes practical applications of indexing in epidemiological methods.

Uploaded by

ryszard.klucha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views6 pages

Indexing Exercises

Uploaded by

ryszard.klucha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

indexing to manipulate data: guided tutorial

Charles DiMaggio
applications of epidemiologic methods II
Spring 2014

February 26, 2014

Indexing is the key to working with and manipulating R data. There are three ways to index
data in R:
• position
• logical vector
• name
Run the following to see an example of each type of indexing.
> x <- c(chol = 234, sbp = 148, dbp = 78, age = 54)
> x
> x[1] # by position
> x[x>150]# by logical
> x["chol"] # by name
You can use indexing to replace or change a data entry.
> x[1] <- 250 #by position
> x[x<100] <- NA # by logical
> x["sbp"] <- 150 # by name
> x
Let’s look at the three approaches to indexing in a bit more detail.

1 indexing vectors

1.1 by position

> x<-1:40
> x[11] #only the 11th element
> x[-11] #exclude the 11th element

1
> x[11:20] #members 11 to 20
> x[-(11:100)] # all but members 11 to 20

1.2 by logical

R uses the following symbols to establish logical relationships between variables.

== IS equivalent to
! is NOT
& AND
| OR (if either or both comparison elements are TRUE)
xor EITHER (element-wise exclusive or operator, if either,
but not both, comparison elements TRUE)
&& || special operators, control flow in "if" functions, only the first
element of logical is used.
In addition, the which() function returns an integer vector of positions from a Boolean
operation, for example
> age <- c(8, NA, 7, 4)
> which(age<5 | age>=8)
Here, the positions 1 and 4 in the vector ”age” meet the Boolean definition.
To use a logical expression to index R data:
1. create a logical vector
2. use the logical vector to index data
Let’s take a look at an example. First create three vectors of related data.
> names<-c("dopey" , "grumpy" , "doc" , "happy" , "bashful" ,
+ "sneezy" , "sleepy" )
> ages<-c(142, 240, 232, 333, 132, 134, 127)
> sex<-c("m" , "m" , "f" , "f" , "f" , "m" , "m")
Now, do some indexing.
> young <- ages < 150 #create logical vector
> names[young] #index name vector using logical vector
> names[!young] # old dwarves
> male<- sex == "m" #logical vector male dwarves
> names[male] #index names using logical vector males
> names[young & male] # young male dwarves
One important use of logical indexing is to categorize a continuous variable.
> # simulate vector with 1000 age values
> age <- sample(0:100, 1000, replace = TRUE)

2
> mean(age) ; sd(age)
> agecat <- age # make copy
> #replace elements agecat with strings for q category
> agecat[age<15] <- "<15" # creating character vector
> agecat[age>=15 & age<25] <- "15-24"
> agecat[age>=25 & age<45] <- "25-44"
> agecat[age>=45 & age<65] <- "45-64"
> agecat[age>=65] <- "65+"
> table(agecat) # get freqs

2 indexing matrices and arrays

A vector has only one dimension, so it is indexed by a single number in a bracket. To index
matrices and arrays, you have account for their additional dimensions.

2.1 indexing matrices

Create the following matrix.

> m<-matrix(round(rnorm(16,50,5)),2,2)
> dimnames(m)<-list(behavior=c("type A", "type B"),
+ MI=c("yes", "no"))
> m
Now do some indexing.
1. by position
> m[1, ] #first row
> m[1, , drop = FALSE]
> m[1,2] # cell "d"
2. by name
> m["type A",]
> m[, "no"]
3. by logical
> m[, 2] < 45 # logical vector
> m[m[, 2] < 45] # data
You can achieve increasing levels of precision and complexity with indexing. In the following
statement, (don’t submit it, it’s just for illustration) the extra comma after 3 tells R to
return all the rows in x for which the 1st column is <3.
x[x[,1]<3,]

3
the extra comma after 3 tells R to return all the rows in x for which the 1st column is <3
The functions lower.tri() and upper.tri() use indexing to return the positions below or above
a matrix.
> m2<-matrix(round(rnorm(81,50,5)),3,3)
> m2
> lower.tri(m2)
> upper.tri(m2)
> m2[lower.tri(m2)]

2.2 indexing arrays

Create the following array.

> a<-array(sample(10:70,8, rep=T),c(2,2,2))
> dimnames(a)<-list(exposure=c("e", "E"), disease=c("d", "D"),
+ confounder=c("c", "C"))
> a
Now, index to return the cell count for unexposed, diseased, confounder negative individuals...
1. by position
> a[1,2,1]
2. by name
> a["e","D","c"]
3. by logical
> a[a==33]

3 indexing lists
Indexing lists can sometimes be challenging. Recall the bracket notation for lists, where
double brackets refer to the ”bin” of like objects, and a following single bracket refers to the
contents of that bin.
> l<- list(1:5, matrix(1:4,2,2),
+ c("John Snow", "William Farr"))
1. by position
> l[[1]]
> l[[2]][2,1]
> l[[3]][2]

4
2. logical
> char <- sapply(l, is.character)
> char
> epi.folk<-l[char]
> epi.folk

3.1 indexing the results of modeling

Indexing lists comes in handy when working with the results of statistical models, which
frequently return results in the form of lists. Fortunately, most package authors return the
results as named lists.
Work through the following conditional logistic regression of abortion and infertility to see
an example of extracting list elements from the results of a model.
> data(infert)
> library(survival) # package with clogit()
> mod1 <- clogit(case ~ spontaneous + induced +
+ strata(stratum), data = infert)
> mod1 # default results (7x risk c spont AB, 4x induced)
> str(mod1)
> names(mod1) #structure, names
> mod1$coeff # name to index result (list element)
> summod1<-summary(mod1) #more detailed results
> names(summod1) #detailed list components

4 indexing dataframes
Data frames can (generally) be indexed like matrices, with the added advantage of being
able to use column (variable) names.
Run through this code to get a sense of how dataframes can be indexed.
data(infert)
1. position
infert[1:4, 1:2]
infert[1:4, 2] <- c(NA, 45, NA, 23)
infert[1:4, 1:2]
2. name
names(infert)
infert[1:4, c("education", "age")]
infert[1:4, c("age")] <- c(NA, 45, NA, 23)

5
infert[1:4, c("education", "age")]
3. logical
table(infert$parity)
# change values of 5 or 6 to missing
infert$parity[infert$parity==5 | infert$parity==6] <- NA
table(infert$parity)
table(infert$parity, exclude=NULL)
In the following perhaps more realistic example you will read in a set of anonymized hospital
discharge data, and then index it in various ways.
> sparcs<-read.csv(file="http://www.columbia.edu/~cjd11/charles_dimaggio/
+ DIRE/resources/R/sparcsShort.csv", stringsAsFactors=F)
• index rows
> brooklyn<-sparcs[sparcs$county=="59",]
> nyc<- sparcs$county=="58"| sparcs$county=="59"|
+ sparcs$county=="60"| sparcs$county=="61"| sparcs$county=="62"
> nyc.sparcs<-sparcs[nyc,]
• index columns
> dxs<-sparcs[,"pdx"]
> vars<-c("date", "pdx", "disp")
> my.vars<-sparcs[,vars]
• index rows and columns
> sparcs2<-sparcs[nyc,vars]
• variables to include
> brooklyn.sparcs<-subset(sparcs, county=="59",
+ select=c(date, pdx,disp))

• range of variables
> nyc.sparcs<-subset(sparcs, county=="59":"62",
+ select=c(county, pdx,disp))
• excluding rows
> nyc.sparcs<-subset(sparcs, county=="59":"62",
+ select=-c(county, pdx,disp))

8609 Quiz
100% (3)
8609 Quiz
41 pages
UVvisible Spectroscopy-Forensic Application
No ratings yet
UVvisible Spectroscopy-Forensic Application
10 pages
R Cheat Sheet Merged
100% (2)
R Cheat Sheet Merged
35 pages
Kec R2022 Eie
No ratings yet
Kec R2022 Eie
234 pages
Robotics Unit 1 Notes
No ratings yet
Robotics Unit 1 Notes
20 pages
R Programming
100% (8)
R Programming
60 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
Conversation Course Book
No ratings yet
Conversation Course Book
41 pages
B2 Unit 4 Test Higher Answer Key
100% (1)
B2 Unit 4 Test Higher Answer Key
2 pages
ICT 8 Activity Sheet: Quarter 3 - Weeks 5-6
No ratings yet
ICT 8 Activity Sheet: Quarter 3 - Weeks 5-6
10 pages
Module 3 & Module 4 Thematic Lesson Plan/ Unit in Kindergarten
50% (2)
Module 3 & Module 4 Thematic Lesson Plan/ Unit in Kindergarten
6 pages
R Programming Basics
No ratings yet
R Programming Basics
27 pages
楊睿中統計學合併版
No ratings yet
楊睿中統計學合併版
557 pages
Daily Lesson LOG School: Grade Level:: Teacher: English Teaching Dates/Time: Quarter: Cot 1
100% (3)
Daily Lesson LOG School: Grade Level:: Teacher: English Teaching Dates/Time: Quarter: Cot 1
4 pages
Wireman CTS2.0 NSQF-4
No ratings yet
Wireman CTS2.0 NSQF-4
55 pages
R-Tutorial - Introduction
No ratings yet
R-Tutorial - Introduction
30 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
Introduction To R
No ratings yet
Introduction To R
74 pages
早年自敲代码
No ratings yet
早年自敲代码
96 pages
R Programming
No ratings yet
R Programming
60 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
Statistics With R Unit 1: Divya Arun Kumar
No ratings yet
Statistics With R Unit 1: Divya Arun Kumar
65 pages
Numerical Methods For Engineers and Scie
No ratings yet
Numerical Methods For Engineers and Scie
7 pages
R Programming
No ratings yet
R Programming
60 pages
Unit 3 - Problem Sheet No 3C - Corona
No ratings yet
Unit 3 - Problem Sheet No 3C - Corona
2 pages
R Cheat Sheet: 1. Basics 4. Input and Export of Data
100% (1)
R Cheat Sheet: 1. Basics 4. Input and Export of Data
4 pages
BIO259 Note
No ratings yet
BIO259 Note
55 pages
R Programming
No ratings yet
R Programming
50 pages
Source Code 1
No ratings yet
Source Code 1
40 pages
Rcourse3 PDF
No ratings yet
Rcourse3 PDF
35 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
R Introduction
No ratings yet
R Introduction
40 pages
Melting Pot Theory
No ratings yet
Melting Pot Theory
4 pages
5708OTHM L4 Diploma in Psychology Spec 2023
No ratings yet
5708OTHM L4 Diploma in Psychology Spec 2023
34 pages
Introdution To R - Network Analysis - Practical 1 - Sacha Epskamp - University of Amsterdam, 2013
No ratings yet
Introdution To R - Network Analysis - Practical 1 - Sacha Epskamp - University of Amsterdam, 2013
34 pages
CH 3
No ratings yet
CH 3
33 pages
Hardness Testing Technologies: Advanced
No ratings yet
Hardness Testing Technologies: Advanced
20 pages
Data Structure R
No ratings yet
Data Structure R
25 pages
R Practicals
No ratings yet
R Practicals
32 pages
H2 Chemistry (9729) Lecture Notes 2 Redox Reactions: Assessment Objectives
No ratings yet
H2 Chemistry (9729) Lecture Notes 2 Redox Reactions: Assessment Objectives
22 pages
Gold-First-NE-2015-Exam-Maximiser-Answer-Key First For Schools - Answer Key UNIT 1 Vocabulary 1 - Studocu PDF
No ratings yet
Gold-First-NE-2015-Exam-Maximiser-Answer-Key First For Schools - Answer Key UNIT 1 Vocabulary 1 - Studocu PDF
1 page
Guideline-For-Application-For - Energy-Auditor-Accreditation
No ratings yet
Guideline-For-Application-For - Energy-Auditor-Accreditation
16 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
Week3 2020
No ratings yet
Week3 2020
20 pages
MLlab 5 TH
No ratings yet
MLlab 5 TH
17 pages
2021 - Iqbal Et Al Servant Leadership and Organizational Deviant Behaviour
No ratings yet
2021 - Iqbal Et Al Servant Leadership and Organizational Deviant Behaviour
17 pages
R Intro STAT5000
No ratings yet
R Intro STAT5000
17 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Unit 12 - Day 3 - Presentation
No ratings yet
Unit 12 - Day 3 - Presentation
21 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
KD Lab - 1 Introductions To R
No ratings yet
KD Lab - 1 Introductions To R
12 pages
Fall 2005 Statistics 579 R Tutorial: Vectors, Matrices, and Arrays
No ratings yet
Fall 2005 Statistics 579 R Tutorial: Vectors, Matrices, and Arrays
8 pages
Lec 10
No ratings yet
Lec 10
15 pages
Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages
Sunil Test
No ratings yet
Sunil Test
15 pages
Data Manipulation: Ionut Bebu
No ratings yet
Data Manipulation: Ionut Bebu
19 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
Compression Test On Concrete: EN 12390-3
No ratings yet
Compression Test On Concrete: EN 12390-3
7 pages
R Tutorial
No ratings yet
R Tutorial
6 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Lecture 5 (Managing and Understanding Data)
No ratings yet
Lecture 5 (Managing and Understanding Data)
9 pages
Q4 LAW 2 English 7
No ratings yet
Q4 LAW 2 English 7
8 pages
R-Training For Print
No ratings yet
R-Training For Print
11 pages
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
Unit 3 Chatgpt
No ratings yet
Unit 3 Chatgpt
6 pages
Programming With R - Subsets of Data
No ratings yet
Programming With R - Subsets of Data
7 pages
FORMAT-LP-MG-SCHEME-E English 3-4
No ratings yet
FORMAT-LP-MG-SCHEME-E English 3-4
4 pages
English As A Second Language: Paper 1: Reading and Writing
No ratings yet
English As A Second Language: Paper 1: Reading and Writing
6 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
4 pages
Zelig For R Cheat Sheet: Plots Vectors
No ratings yet
Zelig For R Cheat Sheet: Plots Vectors
2 pages
Merging Technologies in North African An
No ratings yet
Merging Technologies in North African An
2 pages
Mid-Term Level 5 Answer Sheet
No ratings yet
Mid-Term Level 5 Answer Sheet
3 pages
Stats 102A Midterm Cheat Sheet
No ratings yet
Stats 102A Midterm Cheat Sheet
2 pages
Lembar Kerja Peserta Didik Kelas Xii Application Letter
No ratings yet
Lembar Kerja Peserta Didik Kelas Xii Application Letter
2 pages
Dba Midterm Cheatsheet
No ratings yet
Dba Midterm Cheatsheet
2 pages
Apply Functions With Purrr::: Cheat Sheet
No ratings yet
Apply Functions With Purrr::: Cheat Sheet
2 pages
Talc - and Serpentine-Like "Garnierites"
No ratings yet
Talc - and Serpentine-Like "Garnierites"
2 pages
Gaslight Meaning - Google Search
No ratings yet
Gaslight Meaning - Google Search
1 page
Matlab Exam 2 Review Matlab
No ratings yet
Matlab Exam 2 Review Matlab
3 pages
Basic R Commands
No ratings yet
Basic R Commands
1 page
Basic Calc q4 Wk4 Las1
No ratings yet
Basic Calc q4 Wk4 Las1
1 page
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet

Indexing Exercises

Uploaded by

Indexing Exercises

Uploaded by

indexing to manipulate data: guided tutorial

February 26, 2014

R uses the following symbols to establish logical relationships between variables.

2 indexing matrices and arrays

2.1 indexing matrices

Create the following matrix.

2.2 indexing arrays

Create the following array.

3.1 indexing the results of modeling

You might also like