0% found this document useful (0 votes)
142 views12 pages

Skittles Term Project

Meghan Wadsworth analyzed data from a class project where students counted Skittles in bags. The data was organized by color and number of candies. Yellow Skittles made up the highest proportion at 22%, while purple was the lowest at 19%. Most bags contained between 57-60 candies. However, a few outliers of 53, 65, and 73 candies skewed the data right. Wadsworth explained how to construct a 99% confidence interval for the true proportion of yellow Skittles.

Uploaded by

api-429606575
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views12 pages

Skittles Term Project

Meghan Wadsworth analyzed data from a class project where students counted Skittles in bags. The data was organized by color and number of candies. Yellow Skittles made up the highest proportion at 22%, while purple was the lowest at 19%. Most bags contained between 57-60 candies. However, a few outliers of 53, 65, and 73 candies skewed the data right. Wadsworth explained how to construct a 99% confidence interval for the true proportion of yellow Skittles.

Uploaded by

api-429606575
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Wadsworth 1

Meghan Wadsworth
Nelson
Math 1040
04/09/2019
Math 1040 Skittles Term Project

Report Introduction
The students in Miss Nelson’s fourth period Math 1040 class were all asked to purchase a
2.17-ounce bag of Original Skittles and to record the different colors of Skittles they had in their
bag. Students were to only count whole candies, and to disregard any partial candies in the bag.
The instructor will compile the data from the whole class. This data includes the total number of
candies in each bag, the total number of bags, and the total number of candies. Throughout this
project the students will be using many of the concepts they are studying this semester. These
concepts include organizing and analyzing data, drawing conclusions using confidence intervals
and hypothesis tests, and presenting the work in a well organized paper.

Organizing and Displaying Categorical Data: Colors


Wadsworth 2

According to these two charts, the different colored skittles have almost even amounts.
Yellow, however, has the most with a proportion of .2197, and purple has the least with a
proportion of .188. I expected to see that the darker the colored skittles, the less there would be
due to the dye being more expensive for the company. These graphs do reflect what I expected
with the lighter colors being more common, and the darker less common. My personal data was
pretty similar to the data of the whole class in regards to the lighter colored skittles being more
common. Orange and Green each had a proportion of .2419, and Yellow had a proportion of
.1935. Red and Purple each had a proportion of .1613.

Totals and proportions of Skittles in my personal bag:

Red Orange Yellow Green Purple

10 .1613 15 .2419 12 .1935 15 .2419 10 .1613

Totals and proportions of Skittles in the class sample:

Red Orange Yellow Green Purple Total

11 8 13 7 15 54

12 10 13 13 9 57
Wadsworth 3

10 15 12 15 10 62

6 15 12 8 9 60

14 11 16 10 7 58

4 16 15 11 13 59

14 9 15 10 10 58

8 9 14 17 17 65

5 13 14 12 13 57

10 11 8 11 13 53

13 3 20 15 9 60

11 8 21 9 12 61

7 7 10 19 16 59

13 11 13 14 7 58

8 12 19 7 13 59

10 15 13 8 12 58

12 19 15 17 10 73

13 14 12 8 12 59

8 13 13 12 11 57

9 12 13 11 13 58

14 13 9 15 9 60

19 9 10 14 9 61

14 9 13 12 11 59

283 .191 293 .198 325 .2197 300 .203 278 .188 1479
Wadsworth 4

Organizing and Displaying Quantitative Data: the Number of Candies per Bag
Using the total number of candies in each bag in the class sample, I have calculated the
mean, standard deviation, and the 5-number-summary below.

Mean: ​59.2 candies per bag


Standard Deviation: ​3.79
Five Number Summary:
● Minimum: ​53
● Quartile 1: ​57.5
● Median: ​59
● Quartile 3: ​60
● Maximum: ​73

Number of Skittles per bag:


Wadsworth 5

Based on the graph, the data seems pretty uniform. However, the box plot makes it seem
as though the data is skewed right. This is incorrect because the box plot is taking into account
the outliers. To find the outliers you must find the fences. To find the fences you calculate
Wadsworth 6

Q1 - 1.5 x IQR for the lower fence and Q3 + 1.5 x IQR for the upper fence. Below are the
calculations to figure out the outliers.
IQR = Q3 -Q1 = 60 - 57.5 = 2.5 
Lower = Q1 - 1.5 x IQR = 57.5 - 1.5 x 2.5 = 53.75 
Upper = Q3 + 1.5 x IQR = 60 + 1.5 x 2.5 = 63.75 
Based off of our fences, there are 3 outliers: 53, 65, and 73. Due to two of the outliers
being past the upper fence, the box plot is skewed right.
I did expect to see the data as uniform because all the bags were 2.17-ounce Skittle bags.
If the data was skewed or bell-shaped, the company would be spending too much money, or the
customers would be cheated of their money. My bag consisted of 62 candies in the 2.17-ounce
Skittle bag. Even though this is close to the upper fence, it is still within boundaries and not
considered an outlier compared to the other 24 bags in the sample.

Reflection
The difference between categorical and quantitative data is that categorical data includes
categories. This would include any words or topics, some examples include: colors, names, days
of the week, etc. These are things that you cannot do math with. However, quantitative data
includes numbers that you are able to do math with. The charts that make sense to use for
categorical data are: frequency tables, bar graphs, pareto charts, and pie charts. Charts that don’t
make sense are histograms or line graphs. The charts that make sense to use for quantitative data
are: histograms, line graphs, box plots, and dot plots. Charts that don’t make sense are pie charts
and pictographs. Calculations that make sense for categorical data are finding frequencies and
totals. Calculations that do not make sense for categorical data are mean, standard deviation, and
the 5-number-summary. Calculations that make sense for quantitative data are mean, standard
deviation, 5-number-summary, etc. Calculations that don’t make sense for quantitative data are
frequencies. This is because categorical includes categories that don’t use heavy math, while you
can calculate a lot more with quantitative data.

Confidence Interval Estimates

The general purpose and meaning of a confidence interval is that they provide a range of
values for the estimated population parameter instead of a single point estimate or value. These
intervals might contain the true value of an unknown population parameter, which helps us
continue research and complete the necessary calculations and hypotheses.
Wadsworth 7

First we are going to calculate a 99% confidence interval estimate for the true proportion
of yellow candies. To do this, we must first figure out the point estimate of the yellow candies.
 
p̂ = x ÷ n 
 
p̂ is the point estimate and is said as “p-hat.” ‘X’ will be the number of yellow candies,
and ‘n’ will be the total sample size.

p̂ = 325 ÷ 1479 
p̂ = . 2197 
 
To determine the confidence interval we use the following formula: the point estimate +
or - the margin of error. The margin of error depends on 3 factors:

● Level of confidence: as it ​increases​, the margin of error also ​increases​.


● Sample size: as the random sample ​increases​, the margin of error ​decreases​.
● Standard deviation of the population: the ​more spread there is​, the ​wider​ our interval will
be for a given level of confidence.

After taking into consideration the above three factors, one must also take into
consideration the below rules.

● The distribution must be approximately normal. You can calculate that by making sure
the below statement is true.
○ np(1-p) ≥ 10 
● It must be independent. This can be calculated by making sure the below statement is
true.
○ n ≤ .05N​ ​where ‘n’ is the sample and ‘N’ is the population.
● After the above rules are met, you can then calculate the mean and standard deviation.
○ Mean: μ​ = p = p̂ = x ÷ n
○ Standard Deviation: ​σp̂ = s t [p(1-p) ÷ n] 

To construct a confidence interval you need to find the lower and upper bound. You can
do this either manually or by using a calculator. To find manually follow the below steps.
● Lower bound: p​ ̂ - Za/2 xr [p̂ (1-p̂ ) ÷ n] 
● Upper bound:​ p̂ + Za/2 x r [p̂ (1-p̂ ) ÷ n] 
 
Wadsworth 8

 
● Or you can follow these steps on a calculator and fill in the proper information:
○ Stat → Tests → 1-PropZint

Regarding our problem of a 99% confidence interval for the true proportion of yellow
candies we will enter the following:
■ X: 325
■ N: 1479
■ C-Level: 99
■ Calculate
After hitting calculate we receive the following information:

Confidence Interval: ( .19201 , .24748 )


p̂ = .2197
n = 1479
 
Using this information we can then write the conclusion. ​We are 99% confident that
the true proportion of yellow candies falls in the interval (.19201, .24748).

The next confidence interval we will be constructing is a 95% confidence interval


estimate for the true mean number of candies per bag. To find the answer to this, we have some
new steps while still having some similar steps. First we will want to find the mean, or x̅, of our
data. We can find this by using the calculator.
● Stat → Edit → Enter information into L1 → Stat → Calc → 1-Var Stats
○ List: L1
○ FreqList:
○ Calculate
After calculating, we receive the following information:
● X̅ = 59.3478
● Sx = 3.8566
● σx = 3.7718
● n = 23

We can then use this information to calculate the confidence interval. On the calculator,
follow these steps:
● Stat → Tests → TInterval
Wadsworth 9

From there you can either click Data and fill in accordingly:
● List: L1
● Freq: L2
● C-Level: 95%
● Calculate
Or you can click Stats and fill in the information manually:
● X̅: 59.3478
● Sx: 3.8566
● n = 23
● C-Level: 95
● Calculate

After calculating we then receive the following confidence interval: ​(57.68, 61.016) ​and
can then draw up the conclusion: ​We are 95% confident that the true mean of the number of
candies per bag falls between 57.68 and 61.016.

Hypothesis Tests

The general purpose and meaning of a hypothesis test is to make a statement regarding
the nature of the population by using Ho (null hypothesis) and H1 (alternative hypothesis). We
then analyze the data to assess the plausibility of the statement. Each statement is different
depending on the problem.

We are first going to use a .05 significance level to test the claim that 20% of all Skittles
candies are red. We need to first figure out what our null hypothesis and alternative hypothesis
will be.
H​o​: p = 20% of all Skittles candies are red
H​1​: p ≠ 20%of all Skittles candies are red

Then we need to make sure we run the following test:


np(1-p) ≥ 10 
Once that is found to be true, we need to find the p-value. We can find the p-value via
calculator by following these steps:

● Stat → Tests → 1-PropZTest

Below is the information that we will enter for our specific problem.
Wadsworth 10

● P​o​ = .20
● x = 283
● n = 1479
● prop≠P​o

We will receive the below information after hitting calculate:


● z = -.8321
● p = .4054
● p̂ = .1913
● n = 1479

With this we can then compare the information to the level of significance. (.05) The
p-value must be less than the level of significance to reject Ho.
.4054 < .05
Because the above statement is incorrect, we cannot reject our null hypothesis. After
concluding this, we can then draw up the conclusion: ​We have no evidence that 20% of all
Skittles candies are red.

The next hypothesis test we will be calculating is to use a .01 significance level to test the
claim that the mean number of candies in a bag of Skittles is 55.

The first step is to determine the null hypothesis and the alternative hypothesis.
H​o​: μ = 55 is the mean number of candies in a bag of Skittles
H​1​: μ ≠ 55 is the mean number of candies in a bag of Skittles

The next step is to recognize the level of significance. In this specific problem, it is a .01
significance level. The third step is to compute the test statistic. You can do this manually with
the equation below, or you can use the calculator.
t​o​ = ( x̅ - μo​ ) ÷ ( s √ n )
Calculator:
● Stat → Tests → T-Test (#2)
Enter the following:
● μ​o​:​ ​55
● x̅: 59.3478
● S​x​: 3​ .8566
● n: 23
● μ: ​ ≠​μo​
After hitting calculate, we receive the following information:
Wadsworth 11

● t = 5.4067
● p = .00001

Using this information, we can then move onto the fourth step, which is comparing the
p-value with the level of significance (.01). Once again, the p-value must be less than the level of
significance to reject H​o​.
.00001 < .01
Because the above statement is true, we can reject the null hypothesis. After we
acknowledge that, we can write up the conclusion: ​We have strong evidence that 55 is ​not​ the
mean number of candies in a bag of Skittles.

Reflection

There are several conditions that must be met when doing interval estimates and
hypothesis tests. They are as follows:
1. I​t must be a simple random sample. ​Our samples met this condition because all the
skittles were randomly purchased by students in the class.
2. The sample size must be small relative to the population size.​ Our samples also met
this condition because there are a lot of bags of Skittles that were not involved in our
sample.
3. The below equation proves that it is independent. Our samples met this condition due to it
being independent and not dependent samples.
a. n ≤ 0.05N
4. The sample must be normally distributed. ​Our sample meets this condition.
5. n ≥ 30​ (central limit theorem) We only had 23 bags, however, we can still meet this
condition by graphing the data as a boxplot and making sure it is normally distributed.

It is always possible that errors could have been made by using this data. One possible
error could have been counting wrong. A student may have messed up on accident, or even had a
typing error while submitting their totals. Another possible error could be that a student just
made up numbers, thus possibly creating an outlier or skewing the data.

The sampling method could be improved by having a larger class size, or finding a way
to guarantee that all the students purchased the correct bag of Skittles. These are hard things to
achieve, but if possible it would help reduce possible errors in the research.

During this statistical research project, I have drawn several conclusions in different areas:
● The different colored skittles have almost even amounts. Yellow, however, has the most
with a proportion of .2197, and purple has the least with a proportion of .188.
Wadsworth 12

● There were three outliers in our collected data: 53, 65, and 73.
● We are 99% confident that the true proportion of yellow candies falls in the interval
(.19201, .24748).
● We are 95% confident that the true mean of the number of candies per bag falls between
57.68 and 61.016.
● We have no evidence that 20% of all Skittles candies are red.
● We have strong evidence that 55 is ​not​ the mean number of candies in a bag of Skittles.

You might also like