Shamsundar M2 Project2
Shamsundar M2 Project2
Vaishnavi Shamsundar
10/03/2020
2) Import libraries including: FSA, FSAdata, magrittr, dplyr, plotrix, ggplot2, and moments
NOTE: You must use R version 3.6.3 to gain access to the FSA data set. If you installed a
later version of R, you must uninstall Rstudio and R. Then reinstall R version 3.6.3; then
reinstall Rstudio.
4) Print the first and last 3 records from the BullTroutRMS2 dataset
Module 2
6) Display the first and last 5 records from the filtered BullTroutRML2 dataset
9) Create a scatterplot for “age” (y variable) and “fl” (x variable) with the following
specifications:
Limit of x axis is (0,500)
Limit of y axis is (0,15)
Title of graph is “Plot 1: Harrison Lake Trout
Y axis label is “Age (yrs)” X axis label is “Fork Length (mm)”
Use a small filled circle for the plotted data points
11) Create an overdense plot using the same specifications as the previous scatterplot. But,
Title the plot “Plot 3: Harrison Density Shaded by Era”
Y axis label is “Age (yrs)”
Y axis limits are 0 to 15
X axis label is “Fork Length (mm)”
X axis limits are 0 to 500
include two levels of shading for the “green” data points.
Plot solid circles as data points
Module 2
12) Create a new object called “tmp” that includes the first 3 and last 3 records of the
BullTroutRML2 data set
13) Display the “era” column (variable) in the new “tmp” object
14) Create a pchs vector with the argument values for + and x
15) Create a cols vector with the two elements “red” and “gray60”
Module 2
17) Initialize the cols vector with the tmp era values
18) Create a plot of “Age (yrs)” (y variable) versus “Fork Length (mm)” (x variable) with the
following specifications:
Title of graph is “Plot 4: Symbol & Color by Era”
Limit of x axis is (0,500)
Limit of y axis is (0,15)
X axis label is “Age (yrs)”
Y axis label is “Fork Length (mm)”
Set pch equal to pchs era values
Set col equal to cols era values
19) Plot a regression line overlay on Plot 4 and title the new graph “Plot 5: Regression
Overlay”
Module 2
20) Place a legend of on Plot 5 and call the new graph “Plot 6: :Legend Overlay”
The outliers help in understanding the number of points outside the range of the dataset
given. The outlier can be calculated using boxplot graph which represents the minimum,
maximum, mean and median of the dataset. Below, the boxplot of the age is calculated as
we can see there are no points outside the minimum and maximum which means there are
no points outside the range of the dataset and hence the outliers for the given dataset of
age is Zero.
Module 2
The outlier of the dataset of forklength(fl) is calculated below. In the screenshot below,
we can observe that there are outliers in the forklength which means there are points
which goes beyond the range of the dataset. That is outliers going beyond 500 and lying
within 15 which are the ranges as shown in the figure below.
B) Drawing the data visualization of the Scatter plot, Histogram, Regression line, Regression
line with legend and Boxplot. The scatter plot represents the dependency nature with an
uphill graph which means as the forklength increases the age also increases and vice
versa. The highest peak can be seen at (450,14). The histogram of age peaks at (7,4) and
it has the same peak for two bins at 11. The next red and the gray plot shows the same
result as the scatter plot which shows there is a direct relation between the age and the
frequency. In regression, the slop of the line is the heart and soul of the equation and tells
how the variables are dependent on each other. In regression line we can see the a peak
point touching the line which is at (450,14). The legend represents the data in a clear way
to understand the graph in a better way. The boxplot of the forklength represents the
outliers of the graph where the points lie outside the range of the forklength.
Module 2
C) By observing the data visualization of all the graphs and plots we can come to a
conclusion that all the minimal and maximum number and the maximum peak occurs
same place with exact magnitude in all the graphs and plots. In the boxplot of age, we
observed there is no outliers since all the points lie in the range of age unlike forklength
which has two outliers.
Module 2
Bibliography:
3. Holtz, Y. (n.d.). Data visualization with R and ggplot2: The R Graph Gallery. Retrieved
October 02, 2020, from https://www.r-graph-gallery.com/ggplot2-package.html
Appendix:
#Print your name at the top of the script. Include the prefix: “Plotting Basics:” such that it
#appears “Plotting Basics: Lastname
print("Plotting Basics : Vaishnavi Shamsundar")
summary(x)
sd(x$age)
sd(x$fl)
var(x$age)
var(x$fl)
skewness(x$age)
skewness(x$fl)
kurtosis(x$age)
kurtosis(x$fl)
boxplot(x$age)$out
boxplot(x$fl)$out
setwd("C:/Users/Vaishu/Desktop")
#Print the first and last 3 records from the BullTroutRMS2 dataset
head(BullTroutRML2,1)
tail(BullTroutRML2,3)
#Remove all records except those from Harrison Lake (hint: use the <filterD() function)
BullTroutRML2 %>%
k<-c(filterD(BullTroutRML2,lake == "Harrison"))
k<-data.frame(k)
k
#Display the first and last 5 records from the filtered BullTroutRML2 dataset
head(k,1)
tail(k,5)
#Create an overdense plot using the same specifications as the previous scatterplot
smoothScatter(k$fl,k$age,
xlim=c(0,500),ylim=c(0,15),
xlab="Fork Length(mm)", ylab="Age(yrs)",
main="Plot 3 : Harrison Density Shaded by Era",
pch= 17)
#Create a new object called “tmp” that includes the first 3 and last 3 records of the
# data set.
a<-head(k,3)
b<-tail(k,3)
temp<-rbind(a,b)
temp
#Create a cols vector with the two elements “red” and “gray60”
cols<- c("red", "gray60")
#Create a plot of “Age (yrs)” (y variable) versus “Fork Length (mm)” (x variable)
plot(x= k$fl, y = k$age, xlim = c(0,500), ylim = c(0,15), pch= ifelse(
k$era == "1977-80", pchs[1], pchs[2]),
col= ifelse(k$era == "1977-80", cols[1], cols[2]),
xlab = "Age(yrs)", ylab = "Fork Length(mm)",
main = "Plot 4: Symbol & Color by Era")
#Plot a regression line overlay on Plot 4 and title the new graph
plot(x= k$fl, y = k$age, xlim = c(0,500), ylim = c(0,15), pch= ifelse(
k$era == "1977-80", pchs[1], pchs[2]),
col= ifelse(k$era == "1977-80", cols[1], cols[2]),
xlab = "Age(yrs)", ylab = "Fork Length(mm)",
main = "Plot 5: Regression Overlay")
abline(lm(k$age~k$fl, data=k),col="blue")
#Place a legend of on Plot 5 and call the new graph “Plot 6: :Legend Overlay”
plot(x= k$fl, y = k$age, xlim = c(0,500), ylim = c(0,15), pch= ifelse(
k$era == "1977-80", pchs[1], pchs[2]),
col= ifelse(k$era == "1977-80", cols[1], cols[2]),
xlab = "Age(yrs)", ylab = "Fork Length(mm)",
main = "Plot 6: Legend Overlay")
abline(lm(k$age~k$fl, data=k),col="blue")
legend("topleft", c("1977-80","1997-01"),pch=c("+","x"),col=c("red","grey"))