Skittles Project 2
Skittles Project 2
We are
hoping to learn how to apply what we learn thought this semester to a real life scenario. Our
entire class was to go out and buy on regular size bag of skittles and then separate them out into
colors and count how many red, orange, yellow, green and purple skittles they have.
The second part of this project was to make predictions about the number of each color
per bag,. My hypothesis for this experiment was that the color ratio in each bag of skittles was
going to be roughly the same, and that each bag was also going to have roughly the same amount
of skittles per bag because the bags were all the same size.
We were also tasked to make a table that showed what my expectation or hypothesis was
for the colors, and what was actually observed, along with a pie chart and bar graph showing the
total number and percentages for each color for the class as a whole. The last table for this
section of the project allowed us to compare the ratio of colors in my bag vs. the average for the
whole class.
My Bag 10 10 13 17 10 60
(Count)
The graphs do show what I expected to show. In the Pie Chart you can see that all the
sections are very similar in size. In the Pareto Chart you can see that there is not a big difference
between bars. The biggest difference is from Red to Yellow with only about 140 skittles. There
does not appear to be any outliers in this scenario. My bag of skittles definitely was different
than the class averages. I have percentages ranging from 16% all the way to 28% where as the
class averages only range between 19% and 20%.
Our next task was to find the statistical summary of the class counts. This means we had
to find the mean, standard deviation, min, med, max, and the 3 quartiles. These were my
findings:
1. Mean is 112.6
2. Standard deviation is 378.1
3. Min = 53 Q1 = 58 med = 59 Q3 = 60 Max = 2,839
Next we had to use this information in order to make a histogram and a box plot.
The statistical summary did turn out similar to what I was expecting. The fact that we
have one such extreme outlier really throws everything off course. If we hadn’t had that extreme
outlier the graph would have looked like a normal distribution graph, but because of this outlier,
it has cause our data and graph to be skewed right. Other than the one extreme outlier the data
that the rest of the class collected did seem really normal and close to what my results were for
my own bag of skittles.
Quantitative Data is information that can be described by numbers, weather that be whole
numbers or continuous numbers. Quantitative data works best in box plots, histograms or even
regular linear graphs. This is because numbers are shown best in these types of graphs. You can
clearly see the mean, quartiles, and outliers in these types of graphs. Since quantitative data is
more number based, these calculations would also work best.
Categorical Data is things that you can describe it categories, gender, ethnicity, hair color, or
even education level. Since most categorical data has limited outcomes, it is easier to visually
picture, categorical data is best shown in bar graphs with categories in the x axis, or in pie charts
that show percentages of certain categories. Calculations like percentages or relative frequencies
work best for categorical data for those same reasons.
Finally we needed to find our confidence interval. We had to find the 99 % interval for
the number of yellow skittles along with the 95 % confidence level with the mean.
99 % level of confidence.
(.19459, .22140)
Lower Bound = .19459
Upper Bound = .22140
P-hat = .2079908
Based on the sample data of yellow skittles, we can say with 99 % confidence that
approximately 21 % of all candies are yellow.
95 % level of confidence.
Lower Bound = 103.1
Upper Bound = 122.1
Based on the mean of the data, we can say with 95 % confidence that the average number
of candies per bag will be between 103 and 122 candies.
A confidence interval is the parameter where every time a sample from a specific
population is taken, the results will fall within the range of the upper and lower bounds
determined by the confidence level. In this case 95 % of the bags of skittles, on average will have
between 103 to 122 skittles per bag.
In conclusion, this project has allowed me to see that statistics and math can be used in
regular day to day life, it difficult but possible. My initial hypothesis was that the colors of each
candy in the bags were going to vary depending on the bag, but that the average of each color
was still going to be very similar overall, and that all the bags were going to have roughly the
same amount of skittles since they all had to have the same net weight. All the tests that we did,
did prove my hypothesis correct to a certain point, but unfortunately we did have a extreme
outlier that I believe really skewed the rest of the data. Another thing that I learned throughout
this project was how to use statcrunch in order to create graphs and chart for data sets. It made it
very easy to color code my charts and graphs in order to make it easier to visualize my results.
Overall I would like to retry this experiment and see if we have another extreme outlier, or what
the results would look like without it.