0% found this document useful (0 votes)
100 views10 pages

Project Final

- The class analyzed data from bags of Skittles to determine the distribution of colors. - Students found that individual bags varied greatly in color distribution while the overall class data was more evenly distributed, ranging from 17.7%-22.2% per color. - The student's bag differed from the class averages, with only the red proportion being similar, indicating individual bags provide outliers compared to overall data.

Uploaded by

api-300399896
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views10 pages

Project Final

- The class analyzed data from bags of Skittles to determine the distribution of colors. - Students found that individual bags varied greatly in color distribution while the overall class data was more evenly distributed, ranging from 17.7%-22.2% per color. - The student's bag differed from the class averages, with only the red proportion being similar, indicating individual bags provide outliers compared to overall data.

Uploaded by

api-300399896
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

We started out the project by buying a bag of skittles and counting the number of skittles of

each color in our bags. After that we entered the numbers we gathered and entered them
into the professor. The professor compiled the information and from the data provided for
the whole class we did the following first portion:
Create a table that displays the proportions by color and the total count from your own bag of
candies together with the proportions by color and total count for the entire class sample.

Proportion Proportion Proportion Proportion Proportion Total


Red Orange Yellow Green Purple Count
My Bag 12=21.1% 14=24.6% 8=14.0% 14=24.6% 9=15.8% 57

Class 600=21.7% 613=22.2% 490=17.7% 517=18.7% 2761


541=19.6%
Counts avg=13 avg=13 avg=10 avg=11 avg=59
avg=10

Average

SKITTLE DATA COMPARISON


30
24.6% 24.6%
PROPORTIONS IN PERCENTAGE

25 22.2%
21.7%21.2%
19.6% 18.7%
20 17.7%
15.8%
14%
15

10

0
RED ORANGE YELLOW GREEN PURPLE
COLORS OF SKITTLES

CLASS MY BAG

Write a paragraph discussing your observations of this data. Respond to the following prompts:

 Do the graphs reflect what you expected to see? Are there any surprises?
 Are there any observations that appear to be outliers? If so, what impact might they have on graphics and
summary statistics?
 Does the distribution of colors in the total class data match with your own data from your single bag of
candies or are they different?

I haven’t thought about this before with a skittles bag but if you think about it there is a
possibility that each bag should have the same amount of skittles per color. With each bag
having the same ounces there should also be the same amount of skittles in each bag. That
is not how the results turned out to be because the number of skittles per bag differed but
there is an average of 59 skittles per bag. With that there should be about 11-12 of each
color, but that is not how the results turned out. There were an average of 11 per bag when
the data is taken as a whole but if you look at the individual data that were reported some
colors had 8 or 15. For example my bag had 14 of one color and 8 of another. Before I
opened my bag I recalled rumors of there never being enough red but a surplus of yellow
and it was funny to see that my bag had the opposite, I had 12 red and 8 yellow. When
taking a smaller sample of just one bag you can see the differences but when you take the
bigger sample of 47bags you can start to see that the numbers are a bit closer to the
averages than an individual bag. The percentage proportions taken from the class data
show that although the colors are not at 20% each they don’t vary as much as if one of the
colors only had 8%, instead the results range from 17.7%-22.2%. with this in mind the results
are a bit how I would expect them to be from the class data
There were two outliers in the data but one was taken out. Outliers such as these could
make the proportions different. I did the math by adding in the outlier that was taken away
and the difference was minimal, with the effect being of adding or subtracting 1 or 2 to the
proportions. I think if the data compiled by the class was smaller then the change would be
more drastic in the proportions because of the outlier.The outliers would be more visible in
graphs such as the dot plot supplied because you can easily see the ones that don’t belong
and it could make you think you that those are possibilities and misguide people.
In comparison to my bag the totals are different. The only color that is similar to the class
data in proportions are the red skittles with only a difference of 0.6%. There are big
differences with the yellow with a difference of 8.2% in the proportions. The rest of the other
colors in proportions from my bag aren’t close to the results from the class either as seen on
the graph in comparison

For this next part we contributed as a group. We discussed the proportions of the skittles in
comparison to the class and as a population. We also made charts to represent the
information and that way we were able to visually see the information
1. What proportion (or percentage) of the Skittles do you expect to see of each color?
Why?
Now open the data set and compute the proportions of Red, Orange, Yellow, Green, and
Purple candies in the class data set. Note that the sample size is the total number of
candies collected by the class.
Report the proportion of each color within the overall sample gathered by the class.
We decided that it would make sense for us to expect close to 20% of each color
because that would be an equal amount of each color in a bag of Skittles.
With 20%, the proportions would be as follows:
Red: 21.7%, Green:17.7%, Purple:18.7%, Yellow:22.2%, Orange:19.6%

2. In StatCrunch, create a pie chart and a Pareto chart for the total number of candies of
each color in our class data set. Submit copies of your graphs in this report.
3. Does the class data represent a random sample? What would the population be?
Collaborate to discuss sampling and our data in a paragraph or two. Think carefully
about the definition of random sample when you work on your group response.
We decided that the sample would be the whole class, whereas the population would
be all the Skittles in the world packaged in the 2.17 oz. bag. Because the Skittles were all
purchased from random locations, we decided that this would be considered a random
sample. We didn't just pick one store, one area, one box, or from just one rack. The only
thing we had to make sure of was that we has the same amount of ounces.

For this next section we worked individually. We took the data and described the distribution
according to what we learned in the recent section on distribution. We also describe the important
difference between qualitative and quantitative data and how it is applied. Through that through
explanation we were able to clearly understand this concept.

1. Write a paragraph discussing your findings about the variable “Total candies in each bag”.
Address the following in your writing: What is the shape of the distribution? Do the graphs
reflect what you expected to see? Does the overall data collected by the whole class agree with
your own data from a single bag of candies? Include the number of candies from your own bag
and the total number of bags in the class sample in your discussion.
From my opinion the data could be said that it is mostly skewed right because it tapers off to the
right after the 56-59 category. The two categories before 56-59 categories are a lot lower. When
I changed the categories to just one you can see that although it almost uniform, it isn’t, you can
still see that it looks skewed right(histogram included below). When you look at the boxplot and
the numbers taken for question 1 on the group part you can see that it does skew to the right.
Compared with my bag you can see that my bag fits in with the mode because my bag had 57
candies. In the histogram in the group project my bag falls in the largest category. To me the
data does show what I would expect to see, that the majority would be somewhere in the
middle, or almost middle and then the data would taper off to the sides although the data is
slightly skewed to the right.

2. In a half page, explain the difference between categorical and quantitative data. Address the
following in your writing: What types of graphs make sense and what types of graphs do not
make sense for categorical data? For quantitative data? Explain why. What types of
calculations make sense and what types of calculations do not make sense for categorical
data? For quantitative data? Explain why.

Categorical data is when the variable can be separated by a characteristic and the most easy
way to distinguish is when the data is not in numbers. The categorical data can be for example
eye color. It is that data can be separated into groups such as groups for favorite color and birth
city. We can see this in part 2 of our project when we used the colors instead. Quantitative data
are the numerical measures and that is what we used in the third part of our project. I had
trouble trying to figure out the difference between the two of the them with some data. I have
an easier time by taking the data variable in question and seeing if I can apply the “the quality of
this…” to the variable. If it fits in the sentence then it is a qualitative/categorical variable. For
categorical variable the most useful type of graphs are pie charts and cluster bar graphs. By
using these type of graphs for categorical variables you can more easily compare the difference
in size of each category, you can see the percentage each category may constitute compared as
a whole. For quantitative variables you would use stem and leaf plots, histograms, boxplots, or
dot plots. These type of graphs can show differences in the distribution and can show a
relationship between variables. For example in part 2 of the project we are not able to make a 5
plot summary but we can do that for part 3 of the project because we are using quantitative
data. We could use other graphs for quantitative data such as pie charts but they are not as
effective because it would not show key components such as shape of distribution or the
spread. For quantitative data you can find the 5 point summary, mean and standard deviation.
For categorical data you can’t do that but instead you can put it into categories, count the
frequencies, and find the mode.

The following was completed as a group. We took the data and prepared a 5-point summary for the
information and created a histogram and box plot to represent the data:

1. Using the total number of candies in each bag in our class sample, compute the following measures
for the variable “Total candies in each bag”:

(a) mean number of candies per bag

Mean= 58.7

(b) standard deviation of the number of candies per bag

Standard Deviation=3.6

(c) 5-number summary for the number of candies per bag

Min=51 Q1=57 Q2=58 Q3=60 Max=73

Frequency histogram for the variable “total candies in each bag”

box plot for the variable “Total candies in each bag”


outliers=51,52,65,73

In the following we describe confidence intervals in a paragraph as part of the individual portion of
the project. Through this paragraph we were able to clearly understand this concept.

It lets us know that there is a range of values in which a finding would be likely to fall in between.
The percentage is how confident you are the value will fall with in the range. The confidence
percentage also indicates how precise the person who does the interval are willing to be. A smaller
interval would show that if the research were repeated it would be more likely to fall in the
indicated range because the confidence percentage would also be higher. A wider interval would
indicate the repeat data would be less likely to fall in the same range with a lower confidence
percentage.

We worked as a group in the following portion. We took the data and took it several step further and
constructed confidence intervals. This section was a bit difficult for me because the concept of
confidence intervals was hard for me to understand, but through this I was able to grasp the concept.

1. Construct a 99% confidence interval estimate for the population proportion of yellow candies.
In your response, include the following:

o report the sample proportion of yellow candies

There were a total of 613 yellow candies and 2761 total candies. Therefore, the
sample proportion for the yellow candies is .2220.
0.01
n=2761, x=613, α=0.01,𝑧 2
=2.576
o what type of interval will you construct?

We would perform a normal sample distribution because, np(1-p)>= 10 then the


following would be true and satisfy the requirements.

2761(N)*.2220(P)*1-.2220(P)= 476.87 which would make it approximately


normal because it is greater or equal to 10

o verify the requirements for constructing this confidence interval

The information does satisfy the requirements for constructing a confidence


interval because it comes from a randomized or simple random experiment, N is
definitely less than .05 as 2761 Skittles or 613 yellow Skittles compared to all of
the Skittles in the world would definitely not be more than 5%, and because of
above calculations we can see that this would be considered a normal
distribution.

o report the confidence interval estimate in the form (lower limit, upper limit)

The mean of the yellow candies was 13.04 per bag with a standard deviation of
3.56. There were a total of 47 bags. With all of this information it gives us a
lower limit=0.2016 and an upper limit=0.2424

o report the margin of error

The margin of error=0.0204

2. Interpret with a complete sentence the confidence interval estimate for the population
proportion of yellow candies.

We are 99% confident that if the experiment were to be repeated the population
proportion of yellow Skittles would lie between 0.2016 and 0.2424.

3. Construct a 90% confidence interval estimate for the population mean number of candies per
bag. In your response, include the following:

o report the sample mean number of candies per bag

The sample mean for the total number of candies per bag was 58.74
0.10
n=46, s=3.59, x=613,α=0.10,=𝑡 = =1.679, degrees of freedom=45
2

o what type of interval will you construct?

We will construct a 90% confidence interval for the total number of Skittles per
bag.

o verify the requirements for constructing this confidence interval

any extra candies therefore making it random. The total number of Skittles
(2761) would be less than 5% of the total population of Skittles in the world.
o report the confidence interval estimate in the form (lower limit, upper limit)

With a mean of 58.74 and a standard deviation of 3.59, we can say with 90%
confidence that the bags of Skittles will have a lower bound of 57.85 and an
upper bound of 59.63.

o report the margin of error

The margin of error=0.89

4. Interpret with a complete sentence the confidence interval estimate for the population mean
number of candies per bag.

With a mean of 58.74 and a standard deviation of 3.59, we can say with 90%
confidence that if the experiment were to be repeated, a population mean of a
Skittles bag will have a lower bound of 57.85 and an upper bound of 59.63. This
means that we can say with 90% confidence that if someone were to buy a
random bag of Skittles at any given store, the purchased bag would contain
between 57.85 and 59.63 Skittles in the bag.

The following is my reflection on this group project and what I learned:

It is no secret that group projects are not the most popular among students, but this was an
interesting project. Through this project we were able to apply the mathematic that we were learning
along the way. This way we were able to see how it could be applied. It was also a good way to see and
to figure out some the math from beginning to end. I mean this in the way that we were able to take the
data and end it when we made confidence intervals. The project seemed simple at times and easy when
we started it. After that as we progressed it got a bit more difficult and we had to use our brains a bit
more. I struggled a bit more with the last portion of the group project which involved confidence
intervals and their interpretation. That being said, although I struggled with this portion it was a good
way to practice. Usually from the homework and the interactive assignments the interpretations were
given to us as multiple choice options. This means they were already written out and the challenge was
to write one out according to the results.
Another important lesson we can take from this project is the importance of double or triple
checking the results. From the information we later learned for hypothesis which tied everything
together, we furthered learned that even by changing or misplacing a few numbers it could change the
results completely. It can result in completely different interpretations. Many of us are going into
nursing and we already know the importance of making precise calculations and this project can further
demonstrate the importance of it.
With group projects it is always a bit difficult because often there are team members that don’t
participate or not everyone can agree. This is a result because we are all unique individuals that think
differently. It was a bit more difficult because we all are from an online class. Ultimately I have learned
in the past that each person has to take charge of their own education. Sometimes it may seem unfair
that others didn’t participate but in the end who is at a loss? When we put in the work we find that we
do learn from each step of the way. For example I had a hard time grasping the idea of confidence
intervals but when I helped with the team project I was able to understand and was able to pass my test
on this subject.
Another thing I found interesting is when using statistics sometimes the mean or the median can
be used depending on the purpose. This is a way that the results can be used according to what is
wanted to be represented for an advantage. In conclusion I learned a few things from this project and it
is fun testing candy when you get to eat it after.

You might also like