0% found this document useful (0 votes)
360 views9 pages

MATH 1280-01 Assignment Unit 2

This document contains a written assignment with 15 multiple choice and open-ended questions about analyzing flower data from an iris dataset. Key details include: - Definitions of sepal (outer leaf protecting flower bud) and petal (modified leaf attracting pollinators) are provided from cited sources. - Creating frequency tables and calculating frequencies from a cumulative relative frequency table to determine which petal length occurs most frequently. - Analyzing a frequency table for sepal width values to determine sums of early and later frequencies and how many flowers had sepal widths less than 4. - Noting that the tallest bar in a plot of sepal width frequencies represents the mode. - Creating

Uploaded by

Julius Owuonda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
360 views9 pages

MATH 1280-01 Assignment Unit 2

This document contains a written assignment with 15 multiple choice and open-ended questions about analyzing flower data from an iris dataset. Key details include: - Definitions of sepal (outer leaf protecting flower bud) and petal (modified leaf attracting pollinators) are provided from cited sources. - Creating frequency tables and calculating frequencies from a cumulative relative frequency table to determine which petal length occurs most frequently. - Analyzing a frequency table for sepal width values to determine sums of early and later frequencies and how many flowers had sepal widths less than 4. - Noting that the tallest bar in a plot of sepal width frequencies represents the mode. - Creating

Uploaded by

Julius Owuonda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

MATH 1280 --1 Introduction to Statistics

Written Assignment Unit 2


University of the People
2nd, February 2022
Tasks

1. Sometimes it is difficult to understand data if you do not know what the numbers represent.
Provide short definitions of two words: sepal, and petal (be sure to cite your sources even if
you paraphrase):

Sepal: A sepal refers to a leaf-like structure that is located in the outermost part of a flower in an
angiosperm, or flowering plant, and protects the flower during the bud stage (Zhang et al., 2018).

Petal: A petal is a modified leaf structure that surrounds the fertile reproductive parts of a flower
and is usually soft and also brightly colored or shaped in a way that attracts pollinators (Kavita,
2019).

2. There is a cumulative relative frequency table printed above for petal lengths (using
rounded values for petal length).  Below the number 3 in that table is the number .35.  What
does .35 represent? (multiple choice)

Answer: d. Of all the flowers measured in this sample 35% had a petal length of 3 or less (after
rounding the petal lengths).

3. Using only the cumulative relative frequency table printed above combined with some
simple paper-and-pencil calculations, which petal length occurs most frequently?

To answer this, I will take the relative frequency that was previously rounded to 0 decimals back.
We will notice that if we take the accumulated relative frequency of 2 and subtract 1, we will
arrive at the frequency of 2.

>rel.rfreq

1 2 3 4 5 6 7
0.16 0.17 0.02 0.23 0.23 0.16 0.03

>0.33-0.16

[1] 0.17

>0.35-0.33

[1]0.02

> 0.58-0.35

[1]0.23

>0.97-0.81

[1]0.61
>1-0.97

[0.03]

This implies that we can calculate the relative frequency from the cumulative frequency.
Therefore, the petals of lengths 4 and 5 occur the most frequently, with both having a frequency
value of 0.23.

4.  Describe how you determined your answer to the previous question (describe the
calculations that you used). Do not show R code for this task--it will not be counted as an
answer.__________________

One of the descriptions is given above step-by-step as I was getting to the answer. Alternatively,
we can solve this without R by taking the cumulative frequencies and subtracting from the
adjacent one, starting from the right to the left. Doing this on an excel table, we get:

1 2 3 4 5 6 7
0.16 0.33 0.35 0.58 0.81 0.97 1

5. Assuming that you read the flowers.csv file into an R object called flower.data, run the following
R code (do not paste the ">” character into R) and paste both the command and the output into
your answer (you should see five names, each of which should be enclosed in quotes--if you do
not see this, try again or contact your instructor):

wd()

[1] "C:/Users/Irene/Documents/UoPeople/MATH 1280-01 Introduction to Statistics"

Command

> flower.data<- read.csv('flowers.csv')

> names(flower.data)

Output

[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

>
6. The number of observations in the "flower.data" data frame is: 150

7. List the variables in the data frame (you can do this by entering the name of the R object
that holds that data that you read using the read.csv command--you should have called it
flower.data).  If you do not see five columns of data, then there was a problem reading the
input file--try again or contact your instructor.  For each variable identify the type of the
variable (factor or numeric).

The name and type of the 1st variable: Sepal.Length - numeric


The name and type of the 2nd variable: Sepal.Width- numeric
The name and type of the 3nd variable: Petal.Length- numeric
The name and type of the 4nd variable: Petal.Width – numeric
 The name and type of the 5nd variable: Species – factor

8. Round the data for the variable Sepal.Length so that it contains integers, then find the
frequency of the value 7 (not the relative frequency): 24.

Explanation

> round.flower <- round(flower.data$Sepal.Length)

> round.flower

[1] 5 5 5 5 5 5 5 5 4 5 5 5 5 4 6 6 5 5 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 5 5 6

[38] 5 4 5 5 4 4 5 5 5 5 5 5 5 7 6 7 6 6 6 6 5 7 5 5 6 6 6 6 7 6 6 6 6 6 6 6 6

[75] 6 7 7 7 6 6 6 6 6 6 5 6 7 6 6 6 6 6 6 5 6 6 6 6 5 6 6 6 7 6 6 8 5 7 7 7 6

[112] 6 7 6 6 6 6 8 8 6 7 6 8 6 7 7 6 6 6 7 7 8 6 6 6 8 6 6 6 7 7 7 6 7 7 7 6 6

[149] 6 6

> table(round.flower)
round.flower

4 5 6 7 8

5 47 68 24 6

Thus the frequency of the number 7 is 24

Assuming that you read the flowers.csv file into an R object called flower.data, run the following R
code (do not paste the ">” character into R).  Note that we are not rounding the numbers here. Use
the output for the next five tasks:

> table(flower.data$Sepal.Width)
> plot(table(flower.data$Sepal.Width))

> table(flower.data$Sepal.Width)

2 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4

1 3 4 3 8 5 9 14 10 26 11 13 6 12 6 4 3 6 2 1

4.1 4.2 4.4

1 1 1

> plot(table(flower.data$Sepal.Width))
9. What is the sum of the first three frequencies in the frequency table for sepal width? 8

Explanation: 1+3+4=8

10. What does your answer to the previous question represent (in terms of sepal width and frequency
and the percentage of all sepal measurements) ____

My answer represents the number of the first 3 observations of the variable (sepal width)
frequency within the population sample.

11. What is the sum of the last three frequencies in the frequency table for sepal width? 3

Explanation: add the1s; that is 1 + 1 + 1 = 3

12. How many flowers in the sample had sepal widths less than 4 (do NOT round the sepal
width numbers for this, but you can round your final answer to 3 decimal places)? 146
flowers

Explanation

We sum up all the first 19 relative frequencies, which represent all flowers that have a sepal
width of less than 4.

1+3+4+3+8+5+9+14+10+26+11+13+6+12+6+12+6+4+3+6+2= 146

Alternatively, we can use the R command to do that:

req1<-table(flower.dataSepal.Width)

> cumsum(freq1)

2 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4

1 4 8 11 19 24 33 47 57 83 94 107 113 125 131 135 138 144 146 147

4.1 4.2 4.4

148 149 150

Thus the answer is 146 flowers


13. What does the tallest bar in the plot represent?  (multiple choice)

a. mean
b. mode
c. median

Answer: b. mode

14. Create a frequency table that shows the frequencies for each species of flower in the sample.
Paste your R command and output into your answer (do NOT display data from a data
frame, display data using the table() command)_________

>table (flower.data$Species)

setosa versicolor virginica


50 50 50

15. Explain two things about the table that you created for the previous task:

Why did the frequency table for flower species contain words in the first row as opposed to
numbers?______

The Species variable contains qualitative data (word data or values), thus R codes them as factors
rather than numeric values.

What is the meaning of the numbers in the second row of the table?

The numbers in the second row show the number (frequency) of observations made on the sample
for each type of species, such that each species represented 1/3 of the entire population.
Therefore, 150 (50 + 50 + 50 = 150) is the total number of observations that were made.

Reference

Kavita Naik (June 24, 2019). Parts of flowers and what they do, https://sciencing.com/parts-flowers-do-
8173112.html.
Zhang, F. P., Carins Murphy, M. R., Cardoso, A. A., Jordan, G. J., & Brodribb, T. J. (2018). Similar
geometric rules govern the distribution of veins and stomata in petals, sepals and leaves. New
Phytologist, 219(4), 1224-1234.

You might also like