Final Project
Final Project
Report | Reflection
In this project, our group used a convenient sample of random 2.17 bags of Skittles
provided from each of the student in this class and compiled into one data set. We answered the
question of why we would expect to see a proportion size of 20% as well as made both a pareto
chart and a pie chart of relative frequencies of each color or skittles. Next, our team calculated
the mean, standard deviation, and a 5-number summary (maximum, Q1, median, Q2, minimum)
of the sample data of skittles. Then our team provided both a 99% and 90% confidence interval
as well as interpreted what these intervals meant. You will also find “My Take” at the end of
each section of our project. This briefly goes over another subject in each portion of this project.
Group Part #1
To start off this project, we need to start somewhere and that somewhere is finding the
relative frequency of the total number of colors per bag. We also thought it would be a good idea
to graph these.
1. What proportion (or percentage) of the Skittles do you expect to see of each color?
Why?
If you were to pick one skittle from an original bag of skittles you would expect the same
probability of pulling out a red, orange, yellow, green, or purple skittle from your bag. Now we
are assuming that the distribution of skittles is completely random and the number of skittles we
have in this bag is divisible by 5 to get us an even 20% (though this is not very probable). Then,
Page | 1 of 9
statistically, if you were to pick just one skittle, this would create a probability of 20% of anyone
pulling any 1 of the 5 colors of skittles. We have the results from opening and counting 31 bags
2. Now open the data set and compute the proportions of Red, Orange, Yellow, Green,
and Purple candies in the class data set. Note that the sample size is the total number
Pareto Chart to match our data: Pie chart to match our data:
3. Does the class data represent a random sample? What would the population be?
Collaborate to discuss sampling and our data in a paragraph or two. Look carefully at
Page | 2 of 9
the definition of random sample when you work on your group response. This will
The 31 students who opened a 2.17 ounce bag of original skittles represents our
somewhat random sample. The population is everyone who has purchased and opened a bag of
original skittles. As you can see from the graphs (though the pie chart colors may be deceiving),
the yellow skittle was the most pulled at a proportion of 21%. The purple skittle was the least
frequently pulled at 18.64% and came in under our expected proportion as well as the orange
skittle (which was pulled at a below average of 19.41%). The red skittle and the green skittle
were above average at 20.56% and 20.4%. Though we didn’t have a perfect proportion of each
skittle, if we were to take a bigger sample size, we may or may not observe us getting very close
My take:
Creating a table that displays the proportions by color and the total count from your own
bag of candies together with the proportions by color and total count for the entire class sample:
There are 5 categories of skittles and they are randomly put into bags, thus you would
think that all categories would be about 20% each if we surveyed all bags of skittles. The class
count graphs do represents this in a better fashion than my bag did. This illustrates that a sample
can or cannot represent the population as a whole with one or few experiments. It does appear
that I got jibbed out of some orange candies (my bag: 8.2%) while the average (19.41%) had an
Page | 3 of 9
expected proportion. If you were to just look at my bag, you would think that most or all skittles
bags have 10% orange though this is not the case as the average bag contained about 20% orange
skittles. However, I did have an above average amount of green which evened out the orange
skittle proportion. Overall, the class count represented what I thought I would see as we averaged
many samples out even though some of my proportions were way off the class counts. My
orange skittle count was low but my green skittle count was high, thus my total class data does
Now that we have some visual aids and the relative frequency, we are going to find the
mean number of candies per bag, standard deviation, and a 5 number summary to better describe
our data. We also thought it would be a good idea for some more visual aids in the form of a box
i. Minimum: 53
ii. Q1: 57
iii. Median: 59
iv. Q3: 61
v. Maximum: 63
Page | 4 of 9
2. Histogram: 3. Box Plot:
My Take:
In our findings of the variable number of candies in each bag, by looking at the
histogram, you notice it is bell-shaped (there is a gap at 54, but overall this is still relatively bell-
shaped). This indicates that the data is symmetric and also is proved by the fact the mean is equal
to the median. As each bag contains 2.17 ounces of skittles, you would expect that most of these
bags would have roughly the same number of whole skittles and the graph reflects this with its
bell shape. My bag contained 61 skittles and which was above the mean (59) and median (59)
out of 31 bags sampled. This tells us that my bag was above average but still agrees with the
whole class’s data as most data has above average data and below average data.
Along with discussing the differences between the number of skittles in my bag
compared to the class average and graph shapes, I am going to also discuss the differences in
categorical data and quantitative data and their graphs. Categorical data is data that is broken into
categories and as its name implies and no real math can be done with this data. Gender, for
Page | 5 of 9
instance, would be a category and the data that could fill this category would be “Male”,
“Female”, or “Other” and it wouldn’t make much sense if we added “Male” to “Male”.
Quantitative data on the other hand is data about numerical variables such as number of males or
number of skittles. Adding each skittle together in a bag of skittles would make sense as we
could produce actual data if we collected multiple bags of skittles unlike categorical data. Pie
graphs and bar graphs are very good for graphing categorical data as you wish to display
percentages or counts in the categories and wouldn’t make sense to use graphs like a box plot or
scatter plot as you are representing numerical data with those graphs. With quantitative data, you
do want to use graphs such as a scatter plot or box plot as it will summarize and show data in a
visual representation. Pie charts are very discouraged when graphing quantitative data as it can
be hard to see if numbers are close in value to each other. In summary, categorical data
represents the data about categories such as gender or eye color and are typically graphed with
pie charts or bar graphs and no real math can done with this type of data. Quantitative data is
numerical data such as height that is graphed using scatter plots, box plots, and other graphs, real
math can be done with this data making it very useful to statisticians.
Now that we have a mean, standard deviation, and some other useful statistics, we want
to find out how viable these statistics we have just found truly are. For this, we are going to use 2
different confidence intervals as well as find a margin of error to these confidence intervals.
Page | 6 of 9
1. Our 99% confidence interval estimate for the population proportion of yellow candies:
(0.1854, 0.2345).
a. Our yellow skittles had a sample proportion of .21 (where x=384 and n=1829).
b. We also verified that we need to use a z interval rather than a t interval by:
2. Interpret with a complete sentence the confidence interval estimate for the population
We are 99% confident that the population proportion of yellow Skittles lies between 0.1854
3. Our 90% confidence interval estimate for the population mean number of candies per
4. Interpret with a complete sentence the confidence interval estimate for the population
mean number of candies per bag.
Page | 7 of 9
With a 90% confidence interval, we can conclude we are 90% confident that the actual value
of the population mean number of Skittles in each bag is between 58.279 and 59.721 with a
My Take:
In statistics we take random samples and run computations on these samples such as
computing the mean of a given data set. We run into a problem when we try to compare these
computations to populations or other results. This is where confidence intervals come into play.
We can use confidence intervals to provide some margin of error and a range of values we would
expect to see given data like a mean. For example, if we had a 95% confidence interval, we
could have a margin of error along with a lower bound and upper bound to see if our data is
within these bounds. Overall, a confidence interval is a range of values that you can be 95%
Summary:
After seeing what all of the things we have done to just a convenient sample of Skittles,
it’s easy to see how important stats is. We used a lot of the basic but essential calculations in
statistics to describe our data and give further use to data. As you’ve seen above, our group ran
calculations such as the mean, confidence intervals, margins or error, standard deviation, as well
as graphed our results. Even with just a bag of Skittles, statistics can be applied.
At the beginning of the semester we sought out a 2.17 ounce bag of candy of skittles.
Our goal was to count how many skittles there were for each color in this bag. We then
Page | 8 of 9
submitted our results to Professor Maw to be compiled and given back to us for further
instructions. Over the semester, we calculated relative frequencies, frequencies, the mean,
standard deviation, 5-number summary, created confidence intervals, and graphed some of
these results.
Along with all of the statistical calculations listed above, we learned about z and t
intervals and when we should use one over the other. We also discussed what a margin of error
and why they are important. All of the things we learned in statics have a real world use. In
computer science, we often use statistics to monitor how efficiently an algorithm is. More than
often we use standard deviation to measure batches of processors to determine how many are
expected to be nonfunctional as well as a certain batches clock. Statistics constantly proves its
Through this course I was reminded how important it is to know and understand
statistics. Everywhere you go there are data banks filled with data from millions of people just
waiting to be gone through with statistical analysis. If there is anything I have learned from my
computer science classes is that data matters and the more efficient you are at parsing data
with statistics the more successful your company that you work for or own will grow to be.
Throughout the semester, statistics showed me all of the things you can do with data and how
Page | 9 of 9