0% found this document useful (0 votes)
22 views4 pages

Cases

The document describes 5 business statistics cases involving the analysis of: 1) NFL player salary data to describe distributions and compare teams. 2) Exchange rate data to model changes over time and assess risks. 3) NFL salary data to estimate average salary and test hypotheses. 4) Real estate data to build a regression model for house prices. 5) Nuclear energy generation data to fit trends and seasonal patterns and forecast values.

Uploaded by

muzammilbabar91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views4 pages

Cases

The document describes 5 business statistics cases involving the analysis of: 1) NFL player salary data to describe distributions and compare teams. 2) Exchange rate data to model changes over time and assess risks. 3) NFL salary data to estimate average salary and test hypotheses. 4) Real estate data to build a regression model for house prices. 5) Nuclear energy generation data to fit trends and seasonal patterns and forecast values.

Uploaded by

muzammilbabar91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

BUSINESS STATISTICS

CASES

CASE 1: Football Players Salaries (descriptive statistics)

Much of the Football player salary information is public, and the National Football
League data can be accessed for example at http://www.spotrac.com/nfl/, where
you can find year-by-year listings of salaries for NFL players (neglect players with
salaries below 300000, since they correspond to non full time players). The file
NFL.xlsx includes salaries for 5 randomly picked teams in 2015.
1. Consider just the first available team; illustrate the highest salary, the
lowest, the mean, and the median team salary. What can you say about
the shape of the distribution of players' salaries for this team based on
these statistics?
2. Graph the salaries of all players of the team you have chosen. Does the
shape of the data correspond to your answer to question one above?
3. Which measure of central tendency is preferred for this type of data, the
mean or the median and why? What other summary statistics would you
be interested to know to have a clear picture of the distribution of salaries?
4. Do other teams appear to have similar distributions?
5. Which of the available teams seems to have the highest paid players?

Case 2 - Foreign Exchange Rates (using the normal distribution)

Most countries of the world have their own currency, and when the buyer of a
product deals in a different currency than the seller, some exchange has to be
made. Markets set exchange rates for most major currencies, but these market
levels vary over time. Therein lies the problem for business.
Assume that you have made a deal to sell something that you price at US$130
for Eur100, based on an exchange rate of US$1.30 = Eur1.00. In the meanwhile,
before you complete the deal, the rate changes to $/Eur 1.25. Consequently, the
Eur100 you agreed to take for your product are now only worth $125. You just
lost almost 4% of the purchase price simply because of exchange rate
movements.
There are ways to protect against these risks, called hedging. Hedging costs
money too, however, so the "insurance" is not free. Thus, businesspeople are
constantly monitoring exchange rates, trying to predict their movements, and
deciding how much risk (and insurance) to take on in international deals.
National monetary authorities constantly monitor and publish exchange rate
information (see for example the site of the Monetary Authority of Singapore,
https://eservices.mas.gov.sg/Statistics/msb/ExchangeRates.aspx). Assume that
your company’s headquarter is in Singapore, and you are planning to conduct a
business deal in US. The data in the file exchange_rates.xlsx, taken from MAS
website, reports the relevant Exchange Rates (Singapore $ vs US$, from Jan
2000 to Mar 2015, end of the month values).
1. Transform the data into growth rates = ln(exchange rate in month t)-
ln(exchange rate in month t-1)
2. Assume that growth rates are independent and normally distributed. Let’s
define by μ the expected value, and by σ the standard deviation. Use the
sample average as an estimate of μ and the sample standard deviation as
an estimate for σ.
3. Using your estimates, assess:
1. The probability that exchange rates will grow next month.
2. The probability that exchange rates will decrease next month
4. Suppose that the profitability of the business deal for both partners is
based on the exchange rate staying within 2% of the last value in either
direction, what is the probability that it will turn out ok?
5. All of this planning is based on the assumption that this sample and its
underlying population are indipendent and Gaussian. Is there any
evidence against these assumptions?

Case 3 – Inference on Football players salaries

Using the dataset on NFL salaries used for the first case, answer the following
questions:
1. Based only on the data of the first team, provide a point and a 95%
confidence interval estimate of the average salary of NFL players.
2. The average salary in the UK Premier League (Soccer) is reported to be
£1.16 million a year (http://soccerlens.com/finance-in-english-football-
wage-disparities-between-the-divisions/92692/). Based only on the data of
the first team you picked, test the null hypothesis that the average salary
of US NFL football players is equal to the average salary of UK Premier
League soccer players (do not forget to convert £ in $!)
3. Repeat steps 1 and 2 using all teams. Explain the difference among the
results with one team and four teams. Which is more reliable?
4. Do you think your sample can be assumed to be random and
independent? Is the population from which the sample has been drawn
normal?

Case 4 - Using Regression Analysis to Model House Prices

Often regression models are used to calculate prices of houses. The models can
include many independent variables such as square footage, number of
bedrooms and bathrooms, acreage, whether the house has a fireplace,
basement or patio, the number of stories, the age of the home, etc. The
dependent variable is usually price. This exercise allows you to build your own
real estate model.
Visit the Realtor.com site (http://www.realtor.com/) to obtain data on houses for
sale, their descriptions and their prices. Data on price (dependent variable),
square feet, acres, number of bedrooms and number of bathrooms are usually
available for most houses. It is also possible to collect additional information
(such as the presence of fireplace, basement, swimming pool …).
The file real_estate_Valparaiso2015.xlsx includes data on price (dependent
variable) and some relevant independent variables for a random sample of 50
houses in Valparaiso, zip code 46385, single-family homes, 0-5 years old.1

1. Illustrate the main characteristics of the real estate market in the area through
appropriate descriptive statistics.
2. Run a regression using all of the explanatory variables. Analyze the results in
terms of significant t (which variables should be kept in the model and which
are not significant predictors?), R2adj (how much of the error is explained by
the model?) and residual standard deviation. Drop the insignificant variables:
does the model improve in terms of R2adj?
3. Try to interpret the sign and magnitude of each coefficient. Are they as
expected?
4. Analyze the residuals using appropriate graphs and statistics.
5. Assume you are interested in buying a house in the area, with given
characteristics (number of square feet, ...).2 Based on your model, work out
the expected price (point estimate and 95% confidence bound) for the house
you are interested in.

Case 5 – Forecasting energy variables


1
TIPS IF YOU WANT TO DO YOUR OWN SEARCH: The listings are organized by zip or by city.
you might start from a ZIP code (if needed, use http://www.geopostcodes.com/index.php to find
ZIP codes starting from City names at). In order to have a more homogeneous population, you
might restrict your search to a specific segment, like “single family homes”, “Age: 0-10 years” (or
another segment you are interested in). If you start from a zip code where Realtor cannot find
houses, you might change it or use the option “Add nearby areas”, selecting some or all the
nearby areas. If you find too many houses (for example, using the zip code 46385 – Valparaiso,
IN - and some restrictions on type of house and age, one can find more than 400 homes in that
area). In order to limit the list, try to be as random as possible, for example taking every seventh
house, excluding those where you do not find the relevant information. It is absolutely wrong to
pick the 50 houses with the highest/lowest price. Price, square footage, number of bedrooms and
bathrooms, acreage are included in most of the listings without viewing the details on the house:
other variables might also be interesting, but collecting them is more time consuming since you
have to view the details of the house. In some areas, the information on square feet and acres is
not systematically reported: in this case, you might consider changing the area.
2
Tip: do define the characteristics of the house of interested, refer to the distribution of the
independent variables in your sample: this will prevent from defining unusual characteristics for
the area.
The web site http://www.eia.gov/totalenergy/data/monthly/index.cfm has lots of
monthly time series on energy related variables. The file Nuclear_energy.xlsx
include a monthly time series on Nuclear Electricity (Net Generation - Million
Kilowatthours), from January 2000 to February 2015.
1. Describe the time series.
2. TREND FITTING: Run a regression of your selected time series versus
time. Illustrate your result discussing if the model looks appropriate or not.
Whichever your answer, use the model to forecast the next six months,
possibly with confidence bounds.
3. AUTOREGRESSIVE MODELS: Run a regression of your time series
versus its own value one month before and time. Illustrate your result
discussing whether the model is better than the previous one, and if it
looks appropriate or not. Whichever your answer, use the model to
forecast the next six months, possibly with confidence bounds.
4. SEASONALITY: if you think it is appropriate, introduce seasonal dummies
in your model and forecast with the resulting model.
5. Answer the previous questions using the time series transformed in
natural logarithms. Do you think log transforming is a good idea here?

You might also like