0% found this document useful (0 votes)
44 views15 pages

Stats Lab 2

This document summarizes the results of a statistics lab assignment. It includes details about the group members, missing deadlines, and responses to 3 questions. For question 1, the document examines how the standard deviation affects the percentage of underfilled bottles in a cola filling process. For question 2, it analyzes data from 400 bottle samples and describes the shape of the volume distribution as right skewed. For question 3, it looks at data from 200 samples of 6 bottles each and compares the distribution of sample means to the distribution of individual bottle volumes.

Uploaded by

Sneha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views15 pages

Stats Lab 2

This document summarizes the results of a statistics lab assignment. It includes details about the group members, missing deadlines, and responses to 3 questions. For question 1, the document examines how the standard deviation affects the percentage of underfilled bottles in a cola filling process. For question 2, it analyzes data from 400 bottle samples and describes the shape of the volume distribution as right skewed. For question 3, it looks at data from 200 samples of 6 bottles each and compares the distribution of sample means to the distribution of individual bottle volumes.

Uploaded by

Sneha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

COURSE: Statistics-151

SECTION: D1

LAB #: 2

GROUP #: 44___

GROUP LEADER:

GROUP MEMBER(S):

PLEASE FILL OUT THIS SECTION ONLY IF YOUR GROUP MEMBER(S) MISSED
ANY OF THE DEADLINES:

MEMBER(S) EXCLUDED FROM THIS LAB: Isaac Cook

WHICH DEADLINE IS MISSED : Both the deadline to communicate and actively


participate.

SUMMARY OF ISSUES (FURTHER EXPLANATION OF WHY THE ABOVE GROUP


MEMBER(S) IS(ARE) EXCLUDED FROM THIS LAB REPORT):
LAB 2 ASSIGNMENT :SAMPLING DISTRIBUTIONS, CENTRAL LIMIT THEOREM

1. Suppose the amount of cola dispensed by a filling machine follows a normal


distribution with a mean (μ) and a standard deviation (σ). Select the Distributions
option in the R commander menu and then the Normal distribution among
continuous distributions options. This allows you to obtain a graph of the normal
density function, and to calculate normal probabilities when the parameters (μ
and σ) are provided. Use R Commander to answer the following questions. (Hint:
Numerical answers for parts (b)and (c) should be rounded to three decimal
places.)

(a) Assume that the mean amount dispensed by the machine is set at μ = 8 oz. Describe what
happens to the percentage of underfilled bottles (the bottles containing less than 8 oz) when σ
decreases or increases? In general, how does the magnitude of the standard deviation affect
the filling process?

Standard deviation values are a measure of the spread of the data in accordance to the mean.
The area under the graph will always equal 1
When we assume the standard deviation to be 0.2 given the mean=8 oz the curve looks like:

It has a density of 2.0 and the area to the left and right are the same

When we decrease the standard deviation to be 0.1 given the mean=8 oz the density curve
looks like this:
It has a density of 4.0 and the area to the left and right are the same
When we decrease the standard deviation to be 0.4 given the mean=8 oz the density curve
looks like this:

It has a density of 1.0 and the area to the left and right are the same.
From the graphs above we can conclude that when standard deviation increases or decreases
the percentage of underfilled bottles(area under the graph to the left of the mean) does not
change.
The magnitude of standard deviation controls the density which is the spread of the distribution
and not the percentage of underfilled bottles
(b) Now assume that the mean amount dispensed by the machine is set at μ = 8.1 oz. Enter the
value of σ as 0.1 oz. Calculate the percentage of underfilled bottles (the bottles containing less
than 8 oz) in this case. What is the percentage of underfilled bottles if σ were 0.05 oz and 0.04
oz? In general, what is the effect of decreasing σ on the percentage of underfilled bottles?

When the mean is 8.1 and the standard deviation is 0.1 the percentage of underfilled bottles is
15.866%

When the mean is 8.1 and the standard deviation is 0.05 the percentage of underfilled bottles is
2.275%
When the mean is 8.1 and the standard deviation is 0.04 the percentage of underfilled bottles is
0.062%
As we can see from above, keeping the mean constant (μ = 8.1 oz), as we decrease the
value of the standard deviation (σ), the percentage of underfilled bottles also decreases.

(c) Now set the standard deviation to 0.05 oz and change the mean. Enter the value of μ as 8,
then 8.05, and eventually 8.1 oz. Calculate the percentage of underfilled bottles in each case.
Describe briefly how the shape of the corresponding curve changes. How does changing the
value of μ affect the filling process? Does the percentage of underfilled bottles increase or
decrease? Do not print the density curves.
When the standard deviation is 0.05 oz and the mean is 8 oz, the percentage of underfilled
bottles is 50%.
When the standard deviation is 0.05 oz and the mean is 8.05 oz, the percentage of underfilled
bottles is 15.866%.
When the standard deviation is 0.05 oz and the mean is 8.1 oz, the percentage of underfilled
bottles is 2.275%.
The shape of the curve does not change as it is a symmetric bell shaped curve in all three
cases as the data is normally distributed. Increasing the value of mean while keeping standard
deviation constant decreases the percentages of underfilled bottles as most bottles are filled
closer to the mean which is closer to the overfilled bottles.
2. Consider a random sample of 400 bottles obtained from the population of all
bottles filled by the machine over a specific short time period. The volume of cola
in each bottle is determined. The 400 observations recorded in the first column
Volume are available in the data file Lab2-Q2.txt on eClass. Given the very large
sample size, we may assume that the distribution of the volume of cola in bottles
in the sample (data file) is close enough to the population distribution while its
mean and standard deviation are close to the population parameters (μ and σ).

(a) Obtain a frequency histogram of the 400 observations with the bins starting at 8.07, ending
at 8.18, and using a width of 0.01. (Hint: R assumes that the right endpoint of each interval is
included. Your histogram should include the left endpoints.) Paste the histogram into your
report. The format of the histogram should be the same as the format of the histogram in Lab 1
Instructions (labels at the axes, title).
(b)Describe the shape of the histogram obtained in part (a). Does the histogram support the
claim of the company that the bottles are slightly overfilled?

The histogram shape is unimolar as it consists of one peak and it is slightly right skewed. The
peak is around 8.11 which is the peak and as it is right skewed which means the mean is
greater than the mode which also supports the company’s claim as the bottles are filled more
than 8.11 which is greater than 8.

(c)Obtain a Q-Q plot and a boxplot for the 400 observations. Add a title to each plot. Paste both
plots into your report. (TIP: Click “Options” and select Outliers “(Interactively) with mouse” when
you make the boxplot in R commander to see to which observation the outlier(s) corresponds.)
Is (are) there any outlier(s)? Based upon the Q-Q plot, does the distribution of volume of cola in
the bottles appear to be normal? What conclusions can be made about the shape of the
distribution from the Q-Q plot and boxplot? What does the relationship between the whiskers tell
us about the shape of the distribution? Do the plots collectively confirm your findings in part (b)
about the shape of the distribution?
The Q-Q plot is not normally distributed as not every point is on the line hence it is right skewed.
It has a lot of outliers. The distribution is not symmetric.

The box plot is not symmetrical as it has a tail to the right and is right skewed as the upper
whisker is longer than the lower whisker.
Hence we can conclude that both the plots confirm our findings in part b about the shape
of the distribution.From both plots, we can conclude that the data is right skewed. The
plots do collectively confirm the findings in b. about the shape of the distribution.

(d)Obtain the summary statistics (mean, standard deviation, IQR, min, Q1, median, Q3, max,
and n) of the 400 observations. Paste the summary statistics into your report. Briefly describe
the relationship between the mean and median, as well as the relationship between the three
quartiles. Are the relationships consistent with the observed shape of the histogram in part (b)?
Here we can see that compared to the IQR(Q3-Q2) the value of Q3 and Q1 is very large. Hence
we can conclude that the spread is very small which means that the mean and median are far
apart compared to the large value of n and the small spread. This shows that the median <
mean from which we know that the data is skewed towards the right. Hence the relationships
are consistent with the observed shape of the histogram in part(b).

3. Suppose that 200 packs are randomly selected, each consisting of 6 bottles of
cola obtained from the population of all bottles filled over a certain short time
period. The amount of cola in each bottle is determined. The measurements are
saved in a table consisting of 6 rows (sample size) and 200 columns (number of
random samples) that occupies the columns Sample1 – Sample200 in the
lab2-Q3.txt file.
Obtain the mean amount of cola for each sample consisting of 6 bottles. Make
sure that all 200 columns are included in the panel of the “Numerical Summaries”
dialog box.

(a) Obtain a frequency histogram of the 200 means with the bins starting at 8.08, ending at 8.15,
and using a width of 0.005. (Hint: R assumes that the right endpoint of each interval is included.
Your histogram should include the left endpoints.) Paste the histogram into your report. The
format of the histogram should be the same as the format of the histogram in Lab
Instructions.(labels at the axes, title)
(b) Refer to the histogram obtained in part (a). Does the data appear to be normally distributed?
Compare the distribution of the means to the distribution of individual observations studied in
Question 2 in terms of their degree of skewness and spread.

The data from the histogram in a. appears to be normally distributed, though potentially slightly
right skewed. The histograms for distribution of the means and (a) are both not normally
distributed however (a) is less skewed than the histogram in qn 2 and that is obviously skewed
to the right and the one in part (a) is slightly skewed to the right. The histogram qn 2 is more
spread out as standard deviation is larger and because it creates a bigger interval range than
the histogram in part(a).

(c) Obtain a Q-Q plot and a boxplot for the 200 means. Add a title to each plot. Paste the plots
into your report. Is (are) there any outlier(s)? Do the plots collectively confirm your findings in
part (b)? Compare the plots with the ones in Question 2, part (c).
There are many outliers in the Q-Q plot but there are only 2 outliers for the box plot which are
observation numbers 138 and 165.
Both the Q-Q plot and boxplot are skewed slightly to the right, So, we can conclude that the
boxplot and the Q-Q plot confirm the findings of part b.
Both the Q-Q plots and box plots from 2(c) and 3(c) are slightly skewed to the right; however
there is a significant decrease in the skewness of the in 3(c).

(d) Obtain the sample size, mean, and standard deviation of the 200 means. Paste the
summaries into your report. Compare the values with the mean and the standard deviation of
the sampling distribution of the sample mean predicted by the theory of sampling distributions.
What does the standard deviation mean here?

The sampling mean by the theory of standard distributions is given by σ/sqrt(n) Where n = 6
Th sampling mean = 8.111008 and the sampling standard distribution = 0.01078817
Here we can see that the mean and the sample mean are the same however the standard
deviations differ.Here, the standard deviation represents the average of how overfilled or
underfilled bottles of cola would be.(i.e how far away they would be from the mean).
4. Now suppose 200 boxes are randomly selected, each consisting of 30 bottles of cola
obtained from the population of all bottles filled over the same short time period. The
amount of cola in each bottle is determined. The measurements are saved in a table
consisting of 30 rows (sample size) and 200 columns (number of random samples) that
occupies the columns Sample1 – Sample200 in the lab2-Q4.txt file.
Obtain the mean amount of cola for each sample consisting of 30 observations. Make
sure that all 200 columns are included in the panel of the “Numerical Summaries” dialog
box.
(a) Obtain a frequency histogram of the 200 means with the bins starting at 8.09, ending at 8.13,
and using a width of 0.003. Paste the histogram into your report. (Hint: R assumes that the right
endpoint of each interval is included. Your histogram should include the left endpoints.) The
format of the histogram should be the same as the format of the histogram in Lab 1 Instructions
(labels at the axes, title).

(b)Describe the shape of the histogram in part (a). Does the data appear to be approximately
normally distributed? Compare the histogram with the histogram obtained in Question 2, part (a)
and the one in Question 3, part (a). In particular, comment about differences in degree of
skewness and spread between each pair of graphs.

The histogram is unimodal as it only has one prominent peak and it is symmetric and uniformly
distributed.
When we compare the histogram in question 2 and the histogram obtained above we can find
that the histogram in question 2 is skewed slightly towards the right whereas the histogram
mentioned above is symmetric and unimodal.
Similarly when we compare the degree of skewness and spread we can look at the standard
deviations and we can see that the graph in 3(a) is more skewed to the right and spread more
as it has a higher standard deviation value.

(c) Obtain a Q-Q plot and a boxplot for the 200 means. Add a title to each plot. Paste the plots
into your report. Is (are) there any outlier(s)? Does it appear that the sample means come from
a normal distribution? Explain. Do the plots collectively confirm your findings in part (b)?
Compare the plots with the plots obtained in part (c) of Questions 2 and 3. What do you
conclude?
When we observe the Box plot and Q-Q Plot, we see that there are no outliers in the box plot
however very few in the Q-Q plot diagram.
From the following Plots we observe that the points on the Q-Q Plot aren't far away from the line
(no outliers), that the given Plot is normally distributed, which checks our finding from 4(b).
From the Q-Q Plot of Q4 we interpret that the Plot is normally distributed, comparing it to Q2
and Q3, as the Plot in Q2 is more right skewed while the Plot in Q3 is slightly skewed to the
right. This could be due to an increase in the sample size.

(d) Use the Summary Statistics (Columns) feature to obtain the sample size, mean, and
standard deviation of the 200 means. Paste the summaries into your report. Compare the value
of the standard deviation of the sample mean for n = 30 with the standard deviation of the
sample mean in Question 3, part (d) (for n = 6). Compare the values with the mean and the
standard deviation of the sampling distribution of the sample mean predicted by the theory of
sampling distributions. Which sample mean tends to be a more accurate estimate of the
population mean?

The mean is 8.1111 and standard deviation is 0.0261 for this dataset (as per Q2)
From 3d: the Sampling Mean predicted by the Theory of standard distribution for Q3 is: Mean is
8.111008 and Standard Deviation is 0.0107.
From 4d:The Sampling Mean predicted by the Theory of standard distribution for Q4 is: Mean is
8.111145 and Standard Deviation is 0.0048.
The standard deviation of 3b>3c(s 0.0107>0.0048) : the standard deviation of the volume
means with 30 cans per pack is about half of that of which with 6 per pack.
Now, when comparing predicted sample mean given by the theory of sampling distributions: Q3
is a more accurate measure for the entire population. Since, 8.111008 is closer to 8.1111 than
8.111145.
Therefore , it can be concluded that the sample mean with the greater sample size is the one
that tends to be a more accurate estimate of the population mean, in this example, being the
volume means with 30 per pack.

You might also like