0% found this document useful (0 votes)
28 views

Module2 R Report

Uploaded by

anuraag.macha
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Module2 R Report

Uploaded by

anuraag.macha
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Module 2: R Assignment

Anuraag K. Macha

ALY6010: Probability Theory and Introductory Statistics

Dr. Thomas Goulding

06/01/24
Introduction

This analysis focuses on the Iris dataset from the UCI Machine Learning Repository,

utilizing ggplot2 and psych packages in R to generate descriptive statistics and visualizations.

The dataset includes measurements of sepal length, sepal width, petal length, and petal width for

three species of Iris flowers. The goal is to understand the dataset's overall structure, compare

measurements across species, and visualize key relationships and distributions.

Data Analysis

To gain an overview of the dataset, descriptive statistics were produced using the

describe function from the psych package. This provided insights into the mean, standard

deviation, minimum, maximum, and sample size (N) for each variable, detailed in the three-line

table below:

Variable Mean Standard Dev. Minimum Maximum Number


Sepal Length (cm) 5.84 0.83 4.3 7.9 150
Sepal Width (cm) 3.05 0.43 2.0 4.4 150
Petal Length (cm) 3.76 1.76 1.0 6.9 150
Petal Width (cm) 1.20 0.76 0.1 2.5 150

 The average sepal length of the Iris flowers is 5.84 cm with a standard deviation of 0.83 cm.

 Sepal width has an average of 3.05 cm and a standard deviation of 0.43 cm.

 Petal length varies significantly with an average of 3.76 cm and a standard deviation of 1.76 cm.

 Petal width has an average of 1.20 cm, reflecting the varied petal sizes among the different

species.

Next, descriptive statistics by group, specifically by species, were generated to observe

how these measurements varied across different species of Iris flowers. This allowed an
understanding of the differences in sepal and petal dimensions among Iris setosa, Iris versicolor,

and Iris virginica.

Figure 1: Descriptive statistics by species

Three types of visualizations were then created using ggplot2. First, a scatter plot of sepal

length versus petal length was produced, adding a linear regression line with the geom_smooth

function and an abline using geom_abline. This helped visualize the relationship between these

two variables and displayed a positive correlation.

Figure 2: Scatter Plot of Sepal Length vs Petal Length


Second, a jitter plot was created to show the distribution of petal length across different

species, using geom_jitter to avoid overplotting and provide a clearer view of data density. As

we can see in Figure 3 below, the species have distinct petal length with a few outliers.

Figure 3: Jitter Plot of Petal Length by Species

Lastly, a boxplot of sepal length by species was generated using geom_boxplot, which

allowed the detection of potential outliers and comparison of the central tendency and spread of

sepal lengths among the different species. Boxplots are useful for detecting outliers, and Figure 4

below shows that the series iris-virginica has one outlier.

Figure 4: Boxplot of Petal Length by Species


Conclusion

The analysis of the Iris dataset provided valuable insights through descriptive statistics

and visualizations. The descriptive statistics revealed significant variation in sepal and petal

dimensions across different species. Scatter plots, jitter plots, and boxplots created using ggplot2

effectively illustrated relationships, distributions, and potential outliers. This comprehensive

examination enhances the understanding of the Iris dataset, showcasing the differences and

relationships among its key variables across species.


Works Cited

Kabacoff, R. (2022). R in action: Data analysis and graphics with R and Tidyverse. Manning

Publications.

Bluman, A. G. (2018). Elementary statistics: A step by step approach. McGraw-Hill Education.

R functions. (n.d.). https://www.w3schools.com/r/r_functions.asp

Kosourova, E. (2023, March 6). Apply functions in R with examples [apply(), sapply(), lapply (),

tapply()]. Dataquest. https://www.dataquest.io/blog/apply-functions-in-r-sapply-lapply-

tapply/

Appendix

The written and executed R commands are included in the R script file that was submitted

alongside this file.

You might also like