Stats Project
Stats Project
Stats Project
In my statistic 1040 math course fall 2019 I was required to complete a gathering data
project. This project required one to work alone and also have group discussions with other
fellow students. For the first part of the assignment each student in the class was required to buy
a bag of skittles and count that number of each color of candy in the bag. The class data was
complied and used for a number of other different exercises involving methods a statistician
might use in their life. The second part of the project I determined the proportion of each color of
candy and created different charts for the total number of each color of candies in the entire
class. These charts compared the data to my own personal data and made it easier for myself and
others to notice any differences and similarities between my bag of candy and the other students
in the class. The third part of the project I used the skittles data from the class to create statistic
summaries of the mean, standard deviation and a five number summary. I was able to make a
frequency histogram of the total number of candies as well as a box plot. Under each chart one
was able to read a description at what they might be looking at. The last part of the project
involved confident interval. I found three different confidence intervals for the population
proportion, mean, and standard deviation. Each of the confidence intervals had an analysis
and created different charts for the total number of each color of candies in the entire class.
These charts compared the data to my own personal data and made it easier for myself and others
to notice any differences and similarities between my bag of candy and the other students in the
class.
Hypothesis: I believe that upon the data collection there will be significantly more green and
class complied their data together comparing their own bag to everyone else in the class. The x
axis represents the different colors of skittles that were found in each students bag of skittles. The
y axis represents the relative amount of skittles of each color every student in the class had in
For my prediction before I counted what was in my skittles bag I thought that green and yellow
would have the majority of the count in the bag, but I was wrong. It turns out that for my
particular bag green and red were the smallest counts that I had. As fore the rest of the class they
seemed to stay true to my prediction of green and yellow having the majority of the count in their
bags. I was surprised that red had an 18% proportion and being the smallest count next to purple.
It seems like for the overall data collection that the data seems very uniform all the proportions
Group Discussion:
Yes, the class data does represent a random sample. Although each student was asked to buy their
own bag of skittles and not every bag of skittles in the region had an equal chance of being
selected, the distribution of skittles from the central plant/warehouse was most likely random.
The skittles company most likely does not count colors as they load the bags and simply loads by
weight, and assuming students did not make any biased decisions about which bag to grab off the
shelf every bag produced had an equal chance of being shipped to any location in the country
In this study, the sample is the class data. Since not everyone in the class is currently living in the
same state, the population would be all 2.17 ounce skittles bags in the United States. There are
currently different manufacturing plants operating overseas, therefore the population can only
deviation and a five number summary. I was able to make a frequency histogram of the total
number of candies as well as a box plot. Under each chart one was able to read a description at
This table was made from the values of the total skittles in the class using a program called “Stat
Crunch.” This table shows the mean, standard deviation, and 5 number summary of each color or
skittles the class had found in their bags all compiled together with each student in the class.
Var2 shows all the data for the Red skittles. Var3 shows all the data for the Orange skittles. Var4
shows all the data for the Yellow skittles. Var5 shows all the data for the green skittles. Var 5
shows all the data for the Purple skittles. Comparing all the colors of the skittles together one can
see that Yellow and Green have the higher average per each students bag of skittles.
This image represents a Histogram of the Skittles Colors in each students bag of skittles. This
histogram shows the frequency of disruption. One can see that the histogram is slightly bell
analyzing this box plot one is able to distinguish that the distribution is skewed to the left.
4. Number of Candies:
By using the program, “Stat Crunch” one was able to analyze the average, standard deviation, 5
number summary, frequency of the colors of skittle in each students bag and also analyze the
follow box plot. Upon analyzing the individual images one is able to tell that the relative
distribution is lefty skewed. One was also able to determine that yellow and green are the most
common colors in each individual students bag by seeing the average is the highest in those two
colors. I was surprised by yellow and green being the most common colors because I thought
that red would be the most common. I can also see that the numbers in colors don’t differ by a lot
per bag. I would assume that the factory just randomly puts skittles in a bag by machine not
making sure the distribution was evenly through out, but upon analysis it shows that the
Group Discussion:
Categorical variables are also known as qualitative variables. These variables can be put into
different categories, such as a model of car, color, gender, etc. Quantitative data is data that can
be ordered and measured. The number of candies in a bag of skittles is quantitative, whereas the
Graphing quantitative data is best done with histograms, stem leaf plots, dot plots, bar graphs,
and box plots. All of these types of graphs can be used to measure the quantity of a certain
variable. Categorical data is best graphed using a method that lets you compare the groups to one
another. A bar graph can work for both quantitative and categorical data, but a pie chart doesn’t
make sense for quantitative data because it is comparing categories to the whole. A pie chart
would effectively show the percentage of each color of skittles in a bag (categorical data), but
cannot effectively be used to show the number of skittles in a bag (quantitative data).
When it comes to calculations, mean and median only make sense for quantitative data. The
mean is the average quantity of something in an entire sample, therefore it is a more meaningful
calculation when applied to quantitative data. The median represents the middle value of the data
and once again makes the most sense only when applied to quantitative data. The best central
tendency to apply to categorical data is the mode. When looking at the colors of candy in a
skittles bag, you may not able to find the average color or the median color, but you can establish
which color occurs the most often. Likewise, when looking at the number of candies in a skittles
bag, the best values for probability distributions are going to be the average and median number
of skittles.
Part #4: The last part of the project involved confident interval. I found three different
confidence intervals for the population proportion, mean, and standard deviation. Each of the
confidence intervals had an analysis written describing what each confidence interval meant.
99% Confidence Interval estimate for the population proportion of yellow candies
X= 410
n= 1874
Confidence Intervals estimated from a population proportion are used to determine, with the
relation to the skittles, we are 99% confident that the proportion of yellow skittles in any bag of
95% Confidence Interval estimate for the population mean number of skittles per bag
Sx= 2.422
n= 32
Confidence Interval estimates of the population mean use sample date to give an interval with
the specified degree of confidence that the mean characteristic of a population should fall within.
In this case, we are 95% confident that the mean number of skittles in any bag is between 57.876
and 59.258.
The purpose of taking sample data and calculating statistics from them is to apply those statistics
to a larger population. Since a population is larger than a sample, how well a sample statistic can
be used to estimate a population parameter is an issue. A confidence interval helps to solve that
issue by allowing us to provide a range of values that the population parameter is likely to fall
within. The intervals are constructed with a certain level of confidence, reflected as a percentage
such as 95%, or 99%. This means that if the same population were to be examined on multiple
occasions and a parameter interval calculated each time, the intervals would contain the true
parameter in X% of cases.
Conclusion:
From taking this course, I was able to not only become a lot more familiar with my calculator but
also with how one is able to collect data and show that data properly. I was able to better
understand promotions variables, the value of knowing how to calculate means and standard
deviations, reading graphs, bot plot and histograms, and also how to better communicate the data
that I was able to find. This project at times was challenging and as was the class. I am grateful
for everything that I have learned and I know it will better help reenforce my learning within the