Statsfinalproject
Statsfinalproject
The graphs in some ways reflected what I expected to see. I had heard
a rumor that yellows and oranges were by far the most common color in a
bag of Skittles, but based on our data, the colors are pretty evenly matched
up. Oranges appear to be the second least color which didnt match up with
my bags data, although yellow is the greatest color, as I predicted. It was
also a surprise that red Skittles were the second most common color, I
expected them to be less since I feel I never get a lot of red Skittles in a bag.
There were a few outliers, like my abnormally high orange Skittles
count, and I did see someone elses dta had a high purple Skittles count.
Outliers will add to the count of the data, which can make it higher than an
average sample.
I think the Skittles distribution mostly matches up with my data,
besides the orange count. Yellow Skittles were not my highest count like the
class data suggests, nor did I have a large amount of red Skittles.
The next part of the project I worked on included compiling the data
into numerical representation graphs. In order to collect the data for these
graphs, we had to calculate the average and standard deviation amount of
candies per bag. The results of our calculations and graphs are included.
Mean number of candies per bag: 59.5
Standard deviation of the number of candies per bag: 2.9
5-number summary for the number of candies per bag: 53, 58, 59.5, 61, 66
The shape of the distribution is symmetrical, or bell shaped. I was a
little surprised, because glancing over the data everyones number of
Skittles in each bag seemed completely random. However, seeing the data in
a graph made it apparent that the most common amount of Skittles in a bag
was around 60. Both the frequency and box-and-whiskers plot agreed with
this amount. I had 56 Skittles in my bag, so out of the classs 38 bags, my
bag agreed with the rest of the data. My bag would fit in with the first
quadrant.
Categorical data is qualitative data that can fit into groups such as the
number of yellow Skittles in a bag, as we demonstrated in this project.
Categorical data is further organized into groups for organizational and
statistical purposes. In our project, we individually recorded how many
Skittles of each color we had in a bag and combined our data into a class
sample. We grouped the data into categories because we wanted to see how
each color compared to each other in the sample. For categorical data, using
a pie chart makes sense because pie charts often represent the number of a
subject in each category. A Pareto chart is also useful for categorical data
because each category is arranged in descending order. Pareto charts put
focus on a significant part of the data, for instance, if we want to know the
frequency of the most common Skittles color. For categorical data, which
focuses on frequency, we would want to use calculations of frequency
distribution and relative frequency distribution. Relative frequency
distribution always adds up to one, which is ideal for pie chart
representations.
Quantitative data is numerical data that can be ordered and measured.
We would want to use this data for comparison of measurements. Graphs
that are used for quantitative data are frequency histograms and box-and-
whiskers plots. These graphs are useful because they group quantitative
data into numerical measurements which can be easily organized. These
graphs are also easy to analyze and determine whether the data is skewed
or symmetrical. Common calculations for quantitative data include
determining mean, mode, range, standard deviation, quadrants, and five
number summaries. We can also use lower and upper fence calculations to
determine whether there is an outlier in the data set.
The final part of the project I worked on included calculating
proportions, margin of error, and confidence intervals for different
proportions of Skittles. The following calculations were solving the confidence
intervals for yellow Skittles and the true value of the population mean. A
confidence interval is essential for providing a range of values that is likely to
contain the population parameter.
A confidence interval provides a range of values that is likely to contain
the population parameter of interest, or to express the degree of uncertainty
associated with a statistic. In statistics it is important that how well a sample
statistic estimates the population value. Specifically, a confidence interval is
an interval estimate combined with a probability statement. Typically
confidence intervals are preferred to point estimates because confidence
intervals provide the uncertainty and precision of the estimate.
During my time in Math 1040 I learned how exactly data is gathered,
organized, and calculated into readable information that is simple for one to
scan and understand. This class helped expand my understanding of
statistics as a whole and why gathering data about a population is important
in real-world applications. To summarize, I learned that statistics is about
gathering information in a relatively simple and reliable way and using data
results in a way that will better current existing conditions.
Prior to beginning this class I had no idea what statistics entailed. I had
an image in my head of a call center gathering data for personal interest but
I didnt know how that data was organized or used. I learned in this class that
there are diverse ways of sampling data from a population such as simple
random or systematic sampling. Statistics is not also a science only utilized
by call centers as I had previously believed, but can be used in very real
world settings such as customer responses for improving a company, or
optimal standardized testing in schools. One of my favorite problems in this
class involved a teacher missing an exam score but being able to find out
what it was using the mean and number of the exam scores. It seemed
simple but it had never occurred to me previously that there could be a way
to figure that out.