0% found this document useful (0 votes)
11 views5 pages

Lab4Instructions Knitr

Lab #4 focuses on data analysis using R, specifically working with canid dietary data. It covers setting a working directory, reading CSV files, creating histograms, calculating central tendency measures, and using boxplots to visualize data grouped by diet. Additionally, it demonstrates how to calculate Z-scores and add them to the dataset for further analysis.

Uploaded by

Jai Calatrava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Lab4Instructions Knitr

Lab #4 focuses on data analysis using R, specifically working with canid dietary data. It covers setting a working directory, reading CSV files, creating histograms, calculating central tendency measures, and using boxplots to visualize data grouped by diet. Additionally, it demonstrates how to calculate Z-scores and add them to the dataset for further analysis.

Uploaded by

Jai Calatrava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Lab #4

2024-09-17

Begin by setting a working directory. Remember you can also set this using the menu in R-Studio.
Session>Set Working Directory>Choose Directory. . .
Note that your file will be invisible - navigate to the folder it resides within and hit ‘Open’
It will post something similar to the line below in your console. NOTE: Do not run the following line in
your console as the directories in your PC are completely different - R will output an error.

##Not run
setwd("~/Library/CloudStorage/OneDrive-DePaulUniversity/DePaul/Teaching/2025WQ/BIO206/Labs/Day_5")

Is my file in the directory I just selected?

list.files()

## [1] "CanidsData_DietPart.csv" "CanidsData_MassForcePart.csv"


## [3] "Day_5_Script.R" "FULLDataCanids.csv"
## [5] "Lab4Instructions_Knitr.pdf" "Lab4Instructions_Knitr.Rmd"
## [7] "LabWorksheet_4_Complete.docx" "LabWorksheet_4_Complete.pdf"
## [9] "LabWorksheet_4.docx" "Worksheet_Boxplot.pdf"
## [11] "Worksheet_BoxplotZscore.pdf" "Worksheet_Hist.pdf"

The data is in CSV format (comma separate value). Excel cannot save any plots in this format. It will only
save the text data.
Read in our data and name it something meaningful.

CanidDiet<-read.csv("CanidsData_DietPart.csv")
CanidForce<-read.csv("CanidsData_MassForcePart.csv")

CanidData<-merge(CanidDiet, CanidForce, by = "SpeciesID")

You can view your data by clicking in the ‘envionrment’ panel, in the top-right of the R-Studio windows.
Now, let’s make a histogram.
As we saw in the other labs, the dollar sign allows us to access a variable directly.
In R-Studio it will give you options that you can click as a shortcut as you start to type out the variable
name. You can hit the ‘tab’ key to autocomplete what R-Studio believes should be entered.
We add two other arguments separated by commas xlab and main.
xlab lets us change the axis labels.
main is the title, I set it as NULL so it removes it.

1
par(mfrow = c(1,2))
hist(CanidData$Mass_KG, xlab="Mass (KG)", main = NULL)
hist(CanidData[,3]) #You can also use indexing to access a variable.

Histogram of CanidData[, 3]
3

3
2

2
Frequency

Frequency
1

1
0

80 100 140 180 80 100 140 180

Mass (KG) CanidData[, 3]

Know that if you need to change your plotting window to only show a single chart, us par and mfrow again.
par(mfrow = c(1,1))
This tells the plotting window to place only a single plot as you ask for 1 row and 1 column. Before we asked
for 1 row and two columns.
Note that the distribution above is not normal. Right skew.
Now, let’s subset the data

Can<-subset(CanidData, subset = CanidData$Diet == "Carnivore")


Omn<-subset(CanidData, subset = CanidData$Diet == "Omnivore")

And then produce a histogram of the newly separated data.

hist(Can$Mass_KG, xlab="Carnivore Mass (KG)", main = NULL)


2
Frequency

1
0

100 120 140 160 180

Carnivore Mass (KG)

2
hist(Omn$Mass_KG, xlab="Omnivore Mass (KG)", main = NULL)
1
Frequency

85 90 95 100 105 110 115 120

Omnivore Mass (KG)

We can measure central tendency of the whole dataset, and by separating the data out by a categorical
variable. In this case diet.

mean(CanidData$Mass_KG) #Calculate a mean

## [1] 118.9944

median(CanidData$Mass_KG) #Calculate a median

## [1] 115.1652

#Note that the median is quite different to the mean

sd(CanidData$Mass_KG) #Calculate standard deviation

## [1] 27.84065

What if I wanted a mean for a given diet.


I can use the aggregate function.
First, you join all the continuous variables together that you’re interested in using the function cbind. Then
you tell it which categorical variable you want to find the mean/median/sd for - Diet FUN in this case means
‘function’

aggregate(x = cbind(Mass_KG,BiteForceN)~Diet, FUN="mean", data = CanidData)

## Diet Mass_KG BiteForceN


## 1 Carnivore 138.8240 46.0
## 2 Omnivore 99.1648 27.6

aggregate(x = cbind(Mass_KG,BiteForceN)~Diet, FUN="median", data = CanidData)

## Diet Mass_KG BiteForceN


## 1 Carnivore 129.91779 45
## 2 Omnivore 96.28355 30

3
aggregate(x = cbind(Mass_KG,BiteForceN)~Diet, FUN="sd", data = CanidData)

## Diet Mass_KG BiteForceN


## 1 Carnivore 25.52264 9.617692
## 2 Omnivore 10.46614 12.660964

Boxplots are a great way to illustrate a continuous variable grouped by a discrete variable
The general format is as follows:
boxplot(continuous~categorical)
boxplot(Dependent~Independent)
You pass the function your whole data frame (data = CanidData), so you do not need to use the $ here.

par(pty='s',mfrow=c(1,2))
boxplot(Mass_KG~Diet, data = CanidData, xlab = "Diet", ylab = "Mass (KG)")
boxplot(BiteForceN~Diet, data = CanidData, xlab = "Diet", ylab = "Bite Force (N)")
180

60
50
Bite Force (N)
Mass (KG)

40
140

30
20
100

10

Carnivore Omnivore Carnivore Omnivore

Diet Diet

Finally, we can use R to calculate a Z-score and the add the data back into our dataframe. We create a new
column for both mass and bite force.
The general formula for a z-score is: (value-mean)/standard deviation.
Create two pairs of box plots - these box plots, despite initially being on different scales, are now more
comparable.

4
CanidData$Mass_KG_Z <- (CanidData$Mass_KG-mean(CanidData$Mass_KG))/
sd(CanidData$Mass_KG)

CanidData$BiteForceN_Z <- (CanidData$BiteForceN-mean(CanidData$BiteForceN))/


sd(CanidData$BiteForceN)

par(pty='s',mfrow=c(1,2))
boxplot(Mass_KG_Z~Diet, data = CanidData, xlab = "Diet", ylab = "Mass Z-Score")

boxplot(BiteForceN_Z~Diet, data = CanidData, xlab = "Diet", ylab = "Bite Force Z-Score")


2.0

Bite Force Z−Score

1.0
Mass Z−Score

1.0

0.0
0.0

−1.0
−1.0

−2.0

Carnivore Omnivore Carnivore Omnivore

Diet Diet

You might also like