Wilkinson PS2 ECON560
Wilkinson PS2 ECON560
Problem Set 2
10 Points
All answers to problem set questions must be typed so they can be reviewed by Turnitin. Please
submit your completed problem set through Canvas using the instructions found on the syllabus.
Problem 1 (3 points)
Suppose that you currently work in the IT department for a company that manufactures and sells
grills and grilling accessories. While working remotely one day, you receive an email from your
department head about the results of a stress test of the two new potential member registration
systems for new customers. The two registration systems both have the goal of having new
customers create an account with the company and verify their email address, but they use
different backends to accomplish this task.
Your department head, who does not have a background in statistics, tells you in the email that
the two systems had practically the same average delivery time of the verification email: 30
seconds. They excitedly remark that both systems seem to be performing well and that maybe we
should just “flip a coin!?” to determine which one to use, followed by an unsettling amount of
smiley emojis.
Craft a roughly 5-8 sentence email response to your department head explaining, in as non-
technical language as possible, why just relying on the average in this instance might not tell the
whole story. Feel free to make up a fitting name for your boss if you desire.
Dear,
Mr. Williams
I appreciate you sharing the results of the two systems with me. The systems average delivery
time of 30 seconds is a good idea it might not give us all the results. As an example, even if the
average is 30 seconds one system could send it in 25 seconds but sometimes it could take a
minute or two. While the other system might send the email in a range of 28-32 seconds. We
want a more consistent system because if it is more consistent then we will provide a better
experience for our customers. We should also look at the range and standard deviation of the
delivery times to help us understand the consistency of the systems. This will help us make a
more informed decision on the systems.
Sincerely,
Ben Wilkinson
Problem 2 (7 points)
This problem uses the Housing.xlsx data posted on Canvas and requires Excel’s Data
Analysis Toolpak. Note that you must copy the output when asked for, but if it’s not asked for
then no output is required and you can just write the answer.
After taking a minute to familiarize yourself with the data, answer the following questions:
A. (1 point) Use the Data Analysis Toolpak to find descriptive statistics for the variable price
and copy your output below. What is the minimum, maximum, and range of the variable price?
Price
52048
Mean 2.8
Standard 10740
Error .93
43500
Median 0
65000
Mode 0
Standard 33965
Deviation 8
Sample 1.15E
Variance +11
13.95
Kurtosis 81
2.976
Skewness 073
30000
Range 00
Minimum 80000
30800
Maximum 00
5.2E+
Sum 08
Count 1000
The minimum is $80,000, the maximum is $3,080,000, and the range is $3,000,000.
B. (1 point) What price is the 60th percentile using the greater than or equal to method? What
price is the 60th percentile using the interpolation method?
The price in the 60th percentile using the greater than or equal to method is $488,000. The price
in the 60th percentile using the interpolation method is $487,600.
C. (1 point) What is the z-score for a house worth $1 million? Report this number and explain
how to interpret it.
The z-score for a house that is worth $1 million is 1.4. This means that $1 million is 1.4 standard
deviations above the mean since the z-score is a positive number.
D. (2 points) Use the Data Analysis Toolpak to create a 3 variable correlation matrix using Price,
Sqft Living¸and Condition. Excel will use the appropriate correlation coefficient automatically
and the interpretation is the same.
Copy the output below and briefly comment on the strength and direction of the correlations you
found, and if they make logical sense (a few sentences in total are fine here).
Sqft Condit
Price Living ion
Price 1
Sqft 0.704
Living 92 1
Condit 0.073 0.006
ion 839 292 1
All the directions of the correlations are all positive. The strength of the correlations is pretty
weak they aren’t really that strong. But if you look at the table it makes sense as to the higher the
price a house is going to be to bigger the Sqft living usually and the condition is usually better as
well.
E. (2 points) Create a graph of your choice using the housing data provided and copy it to your
problem set. When copying the graph, use the dropdown paste menu to paste it as a picture
rather than an Excel object, otherwise the data in the graph will be deleted if you delete it in
Excel.
The graph can be of any style that was covered in Module 2, so you have plenty to choose from.
However, if that makes you feel overwhelmed then I recommend sticking with either a pie chart,
bar graph, or histogram.
Make sure that whatever type of graph you choose, it has a clear title and the axes are labeled.
You should also include a brief 2-3 sentence summary of what the graph shows us about the
data.
Note: you don’t have to use all the variables, just pick whatever one or more would fit the graph
you want to create. Remember that you may need to manipulate some variables a bit to get them
to play nice with Excel’s graphing features.
I chose to compare the price of a house to the year it was built to see if there were any
relationships. By looking at the graph there is a cluster of data points with most house being built
between 1970-1995 with a price max being around $500,000. Looking at the outliers or the most
expensive houses you have some that were built from 1905-1920 that are still in good condition
that cost between 2-2.5 million. While the most expensive house that cost over 3.5 million one is
built in the 1950s while the other is built like right at the beginning of the 2000s. Looking at this
graph is taking away most houses being built from 1970-1995 are primarily built for families and
trying to be affordable housing for families.