Module2 R Report
Module2 R Report
Anuraag K. Macha
06/01/24
Introduction
This analysis focuses on the Iris dataset from the UCI Machine Learning Repository,
utilizing ggplot2 and psych packages in R to generate descriptive statistics and visualizations.
The dataset includes measurements of sepal length, sepal width, petal length, and petal width for
three species of Iris flowers. The goal is to understand the dataset's overall structure, compare
Data Analysis
To gain an overview of the dataset, descriptive statistics were produced using the
describe function from the psych package. This provided insights into the mean, standard
deviation, minimum, maximum, and sample size (N) for each variable, detailed in the three-line
table below:
The average sepal length of the Iris flowers is 5.84 cm with a standard deviation of 0.83 cm.
Sepal width has an average of 3.05 cm and a standard deviation of 0.43 cm.
Petal length varies significantly with an average of 3.76 cm and a standard deviation of 1.76 cm.
Petal width has an average of 1.20 cm, reflecting the varied petal sizes among the different
species.
how these measurements varied across different species of Iris flowers. This allowed an
understanding of the differences in sepal and petal dimensions among Iris setosa, Iris versicolor,
Three types of visualizations were then created using ggplot2. First, a scatter plot of sepal
length versus petal length was produced, adding a linear regression line with the geom_smooth
function and an abline using geom_abline. This helped visualize the relationship between these
species, using geom_jitter to avoid overplotting and provide a clearer view of data density. As
we can see in Figure 3 below, the species have distinct petal length with a few outliers.
Lastly, a boxplot of sepal length by species was generated using geom_boxplot, which
allowed the detection of potential outliers and comparison of the central tendency and spread of
sepal lengths among the different species. Boxplots are useful for detecting outliers, and Figure 4
The analysis of the Iris dataset provided valuable insights through descriptive statistics
and visualizations. The descriptive statistics revealed significant variation in sepal and petal
dimensions across different species. Scatter plots, jitter plots, and boxplots created using ggplot2
examination enhances the understanding of the Iris dataset, showcasing the differences and
Kabacoff, R. (2022). R in action: Data analysis and graphics with R and Tidyverse. Manning
Publications.
Kosourova, E. (2023, March 6). Apply functions in R with examples [apply(), sapply(), lapply (),
tapply/
Appendix
The written and executed R commands are included in the R script file that was submitted