0% found this document useful (0 votes)
34 views15 pages

Case 1

Uploaded by

singularjity
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views15 pages

Case 1

Uploaded by

singularjity
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Case 1: Graphing Data to Communicate Ideas

A Case Study
Presented to the Department of Industrial and Systems Engineering
De La Salle University-Manila
1st Term, A.Y. 2023-2024

In partial fulfillment
of the requirements for the course
Advanced Quantitative Methods Laboratory (LBYIE2C)

Carosus, Christian Jeric D.


Esguerra, Julio Fernando G.
Gambong, Roddick Eumer V.
Yabut, Brian Christian G.
Zaragoza, Leigh Dominique C.
Section ER2

Sir Eric Siy


September 19, 2023
LBYIE2C Case 1

Situation 1: M&M weights


Data are from 100 weights (grams) of plain M&M candies are shown on the worksheet
“M&M” worksheet.
a) Provide descriptive statistics from the M and M data as a whole, and by color.
Descriptive Statistics: As a Whole

Variables Mean SE Mean StDev Variance Q1 Median Q3 Min Max N

M&M 0.85649 0.005179 0.051794 0.002683 0.82625 0.858 0.881 0.696 1.015 100
Weights

The standard error is relatively small, indicating that the sample mean is estimated with a
high level of precision. This suggests that the sample mean is likely a good representation
of the population mean. The standard deviation measures the spread or variability in the
data. Here, the relatively small standard deviation suggests that M&M weights in the
sample are relatively consistent, with most candies clustered around the mean weight. The
variance provides additional insight into the dispersion of the data. In this case, the
variance is small. This reinforces the observation that M&M weights in the sample are not
highly variable.

Descriptive Statistics: By color


Variables Mean SE Mean StDev Variance Q1 Median Q3 Min Max N

Red 0.863538 0.015974 0.057594 0.003317 0.825 0.859 0.8975 0.751 0.966 13

Orange 0.8578 0.010021 0.050104 0.00251 0.84 0.863 0.881 0.735 0.977 25

Yellow 0.8345 0.013958 0.039479 0.001559 0.794 0.8495 0.85875 0.769 0.883 8

Brown 0.84775 0.028107 0.0795 0.00632 0.8145 0.857 0.874 0.696 0.982 8
Blue 0.856037 0.008082 0.041995 0.001764 0.825 0.855 0.881 0.775 0.942 27

Green 0.863526 0.013068 0.056963 0.003245 0.814 0.865 0.881 0.778 1.015 19

Smaller standard errors indicate more precise estimates. In this case, Green M&Ms have
the smallest standard error, suggesting that the mean weight of Green M&Ms is observed
and estimated with relatively high precision. Brown M&Ms have the largest standard
deviation, indicating that the weights of Brown M&Ms are more variable compared to
other colors. Conversely, Yellow M&Ms have the smallest standard deviation, indicating
less variability in their weights. In this situation, Brown M&Ms have the highest variance,
further indicating greater dispersion in their weights.

b) Provide a pie chart of the distribution of colors if we can say that the data provides a
representative proportion of each color from the manufacturing production counts.

c) Give a histogram of weights for all the observations with a normal curve for all
observations and then by each color. Does it seem to show a normally distributed fit if a
normal curve is superimposed against the histograms?

The M&M Weights histogram remains normally distributed. If the distribution of M&M
weights is approximately normal, it can be beneficial for quality control and
manufacturing processes because it suggests that the production process is consistent and
predictable. In the future, deviations from normality may indicate issues with the
production process that need to be addressed.

Situation 2: Garbage Weights


Data are from 62 households consisting of weights (pounds) of discarded garbage in
different categories can be found in “Garbage Weight” tab. HH SIZE is household size. Data
provided by Masakuza Tani, the Garbage Project, University of Arizona.
a) Provide a percentage distribution of each category of discarded garbage from the 62
households.
b) Prove that the bigger the household, the more garbage it produces, and vice versa.-
As seen in the bar chart, the small households generate less garbage than the bigger
households, and vice versa.

c) Which category of garbage is most correlated with the total weight? Which categories are
just random and cannot be predicted if the total weight is known from a sample household?

The box plot suggests that plastics account for the majority of the overall weight of
garbage. While Textile, Yard, and Others have most data with an abnormal distance from
other values which makes it unable to predict their relationship with total weight.

Situation 3: Words Spoken by Men and Women


Refer to “Word Counts” datasheet in Case 1 T1 Sept 2023.xlsx Excel file, which includes
counts of words spoken by males and females. That data set includes 12 columns of data,
but first stack all of the male word counts in one column and stack all of the female word
counts in another column. Then proceed to generate histograms, any other suit-able
graphs, and find appropriate statistics that allow you to compare the two sets of data. Are
there any outliers? Do both data sets have properties that are basically the same? Are there
any significant differences? Are women really more talkative than men? What would be a
consequence of having significant differences? Write a brief summary paragraph
including your conclusions and supporting graphs.

The pie chart above illustrates that there is a noticeable difference in word counts between
men and women, with the women contributing significantly more to the total word count
with 53.9%, whereas the men’s contributions accounted for 46.1%. This suggests that in
this context, women are more talkative than men.
The bar chart provides a clear visual representation of the data, highlighting that women
surpass men when it comes to word count.
Firstly, it is evident that women consistently contribute more words than men, as reflected
in the higher median word count for women. Additionally, the box plot shows that while
women have a higher median, their word counts also exhibit a more extensive spread,
encompassing a wider range of values. On the other hand, men exhibit a lower median but
a narrower range of word counts, with the maximum word count being higher compared
to women. Moreover, the presence of the outliers in both genders’ data points indicates
extreme values in their word counts. These observations collectively offer a thorough
perspective on the distribution of word counts.

The histogram provides a comprehensive view of the word count distribution between
men and women. Women, on average, have a higher mean word count of 16,215 compared
to men’s mean of 15,669, indicating that, as a group, women tend to speak slightly more.
Moreover, the normally distributed shape of the histogram suggests that the differences in
word counts are not skewed, adding further credence to the conclusion that women
consistently have a higher word count compared to men.
Situation 4: Speed Dating
Data are from 199 speed-dating encounters. DEC BY FEM is decision (1 = yes) of female to
date again, AGE FEM is age of female, LIKE BY FEM is “like” rating by female of male
(scale of 1-10), ATTRACT BY FEM is “attractive” rating by female of male (scale of 1-10),
ATTRIB BY FEM is sum of ratings of five attributes (sincerity, intelligence, fun,
ambitious, shared interests) by female of male. Data for males use corresponding
descriptors. Higher scale ratings correspond to more positive impressions.
Answer each of the following questions by showing a graph and making a conclusion based
on the graph.
a) Is a woman more likely to decide to date again if the man is older than her, or less likely to
decide to date again if the man is younger than her? Is there a threshold on age difference
that affects a woman’s decision to date again?
The dotplots are normally distributed, showcasing that there is no threshold on age
difference that affects a woman’s decision.

b) Is a man more likely to decide to date again if the woman is younger than him, or less
likely to date again if the woman is older than him? Is there a limit on how older a woman is
before a man will not pursue the woman?

Both dotplots are normally distributed and have a limit wherein a man would decide not to
date a woman again if she is younger than 10 years or older by 14 years.

c) Is attraction and liking strongly related? Is the attraction score and the liking score both
high or else both low, and is less likely to be high-low or low-high? Create different graphs
for men and for women.
In both scatterplots, there is a linear correlation between attraction and liking with both
genders.

d) Compare and contrast the sample observations when both decide to pursue dating (dec
by male=1 and dec by fem=1) versus the observations when both decide to not pursue
dating (dec=0, dec=0 for both). Is liking, attraction, or attribution the factor that decides
this mutual decision?

The boxplot shows that average attribution, like, and attraction scores all correlate with a
higher probability to date.

e) Take the observations when the woman wants to pursue but the man says no (dec by
fem=1 and dec by male=0). Is liking, attraction or attribution a factor in predicting this
outcome? Can you identify a cause-and-effect statement?
Average like, attraction, and attribution scores do not affect this outcome.

f) Take the observations when the man wants to pursue but the woman says no (dec by
fem=0 and dec by male=1). Is liking, attraction or attribution a factor in predicting this
outcome? Can you identify a cause-and-effect statement?
Average like, attraction, and attribution scores do not affect this outcome.

You might also like