Sources of Data Sets

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Sources of data sets

1/22/13 7:41 PM

Sources of data sets


Jeffrey Leek, Assistant Professor of Biostatistics Johns Hopkins Bloomberg School of Public Health

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 1 of 13

Sources of data sets

1/22/13 7:41 PM

Data are defined by how they are collected


Main types Census (descriptive) Observational study (inferential) Convenience sample (all types - may be biased) Randomized trial (causal) Other types Prediction study (prediction) Studies over time - Cross sectional (inferential) - Longitudinal (inferential, predictive) Retrospective (inferential)

2/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 2 of 13

Sources of data sets

1/22/13 7:41 PM

A population

3/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 3 of 13

Sources of data sets

1/22/13 7:41 PM

Pick a person and measure

4/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 4 of 13

Sources of data sets

1/22/13 7:41 PM

Census

5/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 5 of 13

Sources of data sets

1/22/13 7:41 PM

Observational study
set.seed(5) sample(1:8,size=4,replace=FALSE)

[1] 2 5 6 8

6/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 6 of 13

Sources of data sets

1/22/13 7:41 PM

Convenience sample
probs = c(5,5,5,5,1,1,1,1)/24 sample(1:8,size=4,replace=FALSE,prob=probs)

[1] 4 1 2 5

7/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 7 of 13

Sources of data sets

1/22/13 7:41 PM

Randomized trial
treat1 = sample(1:8,size=2,replace=FALSE); treat2 = sample(2:7,size=2,replace=FALSE) c(treat1,treat2)

[1] 8 1 3 4

8/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 8 of 13

Sources of data sets

1/22/13 7:41 PM

Prediction study: train


set.seed(5) sample(1:8,size=4,replace=FALSE)

[1] 2 5 6 8

9/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 9 of 13

Sources of data sets

1/22/13 7:41 PM

Prediction study: test


sample(c(1,3,4,7),size=2,replace=FALSE) [1] 1 4

10/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 10 of 13

Sources of data sets

1/22/13 7:41 PM

Study over time: cross-sectional

11/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 11 of 13

Sources of data sets

1/22/13 7:41 PM

Study over time: longitudinal

12/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 12 of 13

Sources of data sets

1/22/13 7:41 PM

Study over time: retrospective

13/13

file:///Users/jtleek/Dropbox/Public/008sourcesOfDataSets/index.html#1

Page 13 of 13

You might also like