Homework 1
Homework 1
Please place a box around a final answer. Also, make sure your answer fits within the
allotted space.
1
d. What percent of patients in the treatment group were pain free 24 hours after
receiving acupuncture? (1 point)
Total number of people recieving treatmetn are 43. Number of people feeling
pain free after treatment are 10. Which mean:
(10/43)x100= 23.26%
Total number of people recieving treatmetn are 46. Number of people feeling
pain free after treatment are 2. Which mean:
(2/46)x100= 4.35%
2
f. In which group did a higher percent of patients become pain free 24 hours after
receiving acupuncture? (1 point)
In the treatment group, a higher percent of patients were pain free after 24 hours.
g. Do the data provide evidence that there is a real pain reduction for those patients
in the treatment group? Or do you think that the observed difference might just
be due to chance? (1 point)
The data provide support for a real pain reduction effect in the treatment group
receiving acupuncture specifically designed to treat migraines. There is strong
evidence to suggest that the observed difference between the treatment and
control groups is not due to chance alone.
3
2. Researchers studying the relationship between honesty, age and self-control conducted
an experiment on 160 children between the ages of 5 and 15. Participants reported their
age, sex, and whether they were an only child or not. The researchers asked each child
to toss a fair coin in private and to record the outcome (white or black) on a paper sheet,
and said they would only reward children who report white. Half the students were
explicitly told not to cheat and the others were not given any explicit instructions. In the
no instruction group probability of cheating was found to be uniform across groups
based on child’s characteristics. In the group that was explicitly told to not cheat, girls
were less likely to cheat, and while rate of cheating didn’t vary by age for boys, it
decreased with age for girls. In this study, identify:
a. The population (1 point)
In this case, the population is:
Children between the ages of 5 and 15
4
c. The variables and their types (4 points)
2. Controled variable
a. coin toss outcome
3. Dependant variable
a. Childs decision to cheat or not
5
3. Sir Ronald Aylmer Fisher was an English statistician, evolutionary biologist, and
geneticist who worked on a data set that contained sepal length and width, and petal
length and width from three species of iris flowers (setosa, versicolor and virginica).
There were 50 flowers from each species in the data set.
a. How many samples are there? (1 point)
Total number of samples is 50 of 3 types:
50x3=150
b. How many quantitative variables are included in the data? Indicate what they are,
and if they are continuous or discrete. (2 points)
Four Quantitative variables which are:
1. Sepal length, 2. Sepal width
3. Petal length, 4. Petal width
These variables are continuous since they can be any value within a
given range.
6
c. How many qualitative variables are included in the data, and what are they? List
the corresponding levels (categories). (2 points)
1. Species:
a. Setosa
b. Versicolor
c. Virginica
7
4. A survey was conducted on 193 Duke University undergraduates who took an
introductory statistics course in 2012. Among many other questions, this survey asked
them about their GPA, which can range between 0 and 4 points, and the number of
hours they spent studying per week. The scatterplot below displays the relationship
between these two variables.
a. What is the independent variable and what is the dependent variable? (2 points)
Independant variable are the study hours/week.
Dependant variable is the GPA
8
b. Describe the relationship between the two variables. Make sure to discuss un-
usual observations, if any. (1 point)
The scatterpoints show the relation between the total study hours per week
and GPA for 193 Duke University undergraduates.
General Trend:
While we expect that more study hours would lead to higher GPAs,
the data does not show a strong, clear trend.
The majority of data points cluster at the 0-20 study hours/week range, with
GPA ranging from 2.5 and all the way up to 4.0, indicating significant variety
in GPA even among those who study relatively little.
Unusual Observations:
There are a few students who report studying over 40 hours per week, and one
study close to 60 hours per week. Interestingly, these students don’t have the
highest GPA. Some students with very high study hours still have GPAs below 3.5.
c. Can we conclude that studying longer hours leads to higher GPAs? (1 point)
As I mentioned in the above answer, students studying more hours doesn't
neccesarily mean higher grades according to the scatterplot.
9
5. The first histogram below shows the distribution of the yearly incomes of 40 patrons at a
college coffee shop. Suppose two new people walk into the coffee shop: one making
$225,000 and the other $250,000. The second histogram shows the new income
distribution. Summary statistics are also provided.
a. Would the mean or the median best represent what we might think of as a typical
income for the 42 patrons at this coffee shop? What does this say about the
robustness of the two measures? (2 points)
In this scenario, the median best represents the typical income for the 42
patrons at the coffee shop. Because:
Median is less sensitive to extreme values. In this case, the two new patrons
with incomes of $225,000 and $250,000 are significantly higher than the incomes
of the original 40 patrons, which are mostly around $60,000 to $70,000.
The median remains relatively unchanged, moving from $65,240 to $65,350.
Median is robust: Because the median is not influenced much by extreme values.
Mean is not robust: The mean can be distorted by outliers, making it less reliable.
b. Would the standard deviation or the IQR best represent the amount of variability
in the incomes of the 42 patrons at this coffee shop? What does this say about
the robustness of the two measures? (2 points)
In this case, the Interquartile Range (IQR) would best represent the amount of
variability in the incomes of the 42 patrons at the coffee shop. Here's why:
IQR measures the range of the middle 50% of the data. It is less affected by
extreme values or outliers. The IQR reflects the spread of the central portion of
the distribution, providing a more reliable representation of the typical variability
in this situation, where outliers are present.
IQR is robust: Since it focuses on the middle 50% of the data and ignores outliers,
the IQR remains stable and is more robust with extreme values.
Standard deviation is not robust: The standard deviation is highly sensitive to outliers,
which can distort the true variability of the dataset when extreme values are present.
10
6. A company that dumps its industrial waste into a river has to meet certain restrictions.
One particular constraint involves the minimum amount of dissolved oxygen that is
needed to support aquatic life. A random sample of 10 specimens taken from a given
location gives the following results of dissolved oxygen.
a. Find the mean, median, and mode of the dissolved oxygen amounts. Interpret
the differences between them. (3 points)
The dissolved oxygen amounts from the sample have the following measures:
Mean: 8.9
Median: 9.0
Mode: 9.0
11
b. Compare the range, variance, and standard deviation of the dissolved oxygen
amounts and interpret them. (3 points)
Range: 1.4
Variance: 0.224
Standard Deviation: 0.474
Interpretation:
The relatively small range, variance, and standard deviation suggest that the
dissolved oxygen levels are tightly packed around the mean, indicating
low variability in the measurements. This consistency is a positive sign of
stable water quality in terms of dissolved oxygen levels.
12
7. Determine if the statements below are true or false, and explain your reasoning.
a. If a fair coin is tossed many times and the last eight tosses are all heads, then
the chance that the next toss will be heads is somewhat less than 50%. (1 point)
False.
Every coin toss has a different result depending on the ones that came before it.
This implies that the outcome of any previous toss, heads or tails, has no bearing
on subsequent tosses.
Regardless of previous results, the probability of heads or tails for a fair coin is always
50% on a single toss. The following throw, which still has a 50% probability of ending
in heads, is unaffected by the previous eight tosses being all heads.
b. Drawing a face card (jack, queen, or king) and drawing a red card from a full
deck of playing cards are mutually exclusive events. (1 point)
False.
If two occurrences are not possible to occur simultaneously, they are said to be
mutually exclusive. Drawing a face card and drawing a red card are not mutually
exclusive in this instance since it is possible to draw a card that meets both
requirements at the same time.
The jack, queen, and king of hearts as well as the jack, queen, and king of diamonds
are the two red face cards out of six.
As a result, drawing a card that is both a face card and a red card is feasible.
These things do not have to happen one after the other.
c. Drawing a face card and drawing an ace from a full deck of playing cards are
mutually exclusive events. (1 point)
True.
If two occurrences are not possible to occur simultaneously, they are mutually
exclusive. Since no card can be both an ace and a face card at the same time,
drawing an ace and a face card (jack, queen, or king) in this scenario are mutually
exclusive.
13
8. The game of roulette involves spinning a wheel with 38 slots: 18 red, 18 black, and 2
green. A ball is spun onto the wheel and will eventually land in a slot, where each slot
has an equal chance of capturing the ball.
a. You watch a roulette wheel spin 3 consecutive times and the ball lands on a red
slot each time. What is the probability that the ball will land on a red slot on the
next spin? (1 point)
18/38 or 47.4%
b. You watch a roulette wheel spin 300 consecutive times and the ball lands on a
red slot each time. What is the probability that the ball will land on a red slot on
the next spin? (1 point)
18/38 or 47.4%
14
c. Are you equally confident of your answers to parts (a) and (b)? Why or why not?
(1 point)
Not very sure. Since they both are the same, feels like the need of a recalculation
is necessary. But after rework I'm confident in my answers.
15
9. Below are four versions of the same game. Your archnemesis gets to pick the version of
the game, and then you get to choose how many times to flip a coin: 10 times or 100
times. Identify how many coin flips you should choose for each version of the game. It
costs $1 to play each game. Explain your reasoning.
a. If the proportion of heads is larger than 0.60, you win $1. (1 point)
You must win more than 60% of the time in this version of the game, which implies
you must flip at least 7 heads out of 10 or 61 heads out of 100 times.
Because of the law of large numbers, the likelihood of receiving more than 60% of
heads reduces as the number of flips increases. It is more difficult to obtain a
proportion of heads higher than 0.60 over a longer number of flips, as the proportion
of heads will typically be closer to 0.5.
However, there is a greater risk of variance when there are just 10 flips, which
increases the likelihood of a higher proportion of heads owing to random fluctuation.
Selecting 10 flips, thus, provides you with a superior chance of exceeding 60% heads
and winning the game.
b. If the proportion of heads is larger than 0.40, you win $1. (1 point)
To win in this edition, you must flip more than 40% of the time, or at least 5 heads
out of 10 or 41 heads out of 100 flips.
For a fair coin, the predicted percentage of heads is 50%. Because of the law of
large numbers, the result will tend to stabilize around this expected value as the
number of flips grows. This increases the likelihood that you will have more flips
and a proportion of heads higher than 0.40.
The percentage of heads will probably converge at 50% after 100 flips, which is
comfortably higher than 0.40. There is little likelihood of receiving fewer than 41
heads (less than 40%).
There's more uncertainty with 10 flips, and while you still probably will get more than
40% heads, the increased unpredictability may cause a smaller percentage of heads
to occur by accident.
16
c. If the proportion of heads is between 0.40 and 0.60, you win $1. (1 point)
In this variation, you are the winner if the percentage of heads is between 0.40
and 0.60, which is a fairly small range. A head percentage that hovers around 0.50,
neither too high nor too low, is the aim.
Because of the law of large numbers, the proportion of heads is more likely to
stabilize at or near the predicted 50% in a hundred flips. This increases the
likelihood that the outcome will be between 0.40 and 0.60, which is the desired range.
There is a lot more variety when there are 10 flips. It would be less likely to land in
the 0.40 to 0.60 range if it were easier to achieve extreme outcomes (either too few
heads, like three or fewer, or too many heads, like seven or more).
You want your proportion to be as near to 50% as possible, so selecting 100 flips
increases your chances of staying in the 0.40 to 0.60 range that you want.
d. If the proportion of heads is smaller than 0.30, you win $1. (1 point)
For 10 flips, the chance of winning is about 17.1%, which is relatively low.
For 100 flips, the chance of winning is significantly higher, likely greater than 80%.
It would be best to chose to flip the coin 100 times in order to increase your chances
of winning. This is due to the fact that, in comparison to 10 flips, there is a significantly
higher chance of having a proportion of heads less than 0.30 with 100 flips.
17
10. The American Community Survey is an ongoing survey that provides data every year to
give communities the current information they need to plan investments and services.
The 2010 American Community Survey estimates that 14.6% of Americans live below
the poverty line, 20.7% speak a language other than English (foreign language) at home,
and 4.2% fall into both categories.
a. Are living below the poverty line and speaking a foreign language at home
disjoint? (1 point)
18
c. What percent of Americans live below the poverty line and only speak English at
home? (1 point)
d. What percent of Americans live below the poverty line or speak a foreign
language at home? (1 point)
The percentage of Americans who either live below the poverty line or speak a foreign
language at home is 31.1%
19
e. What percent of Americans live above the poverty line and only speak English at
home? (1 point)
The percentage of Americans who live above the poverty line and only speak
English at home is 68.9%.
f. Is the event that someone lives below the poverty line independent of the event
that the person speaks a foreign language at home? (1 point)
The two events are not independant. 0.042 isnot equal to 0.03.
20