0% found this document useful (0 votes)
75 views16 pages

PS 1 Resubmit Final

The document describes variables related to cell phone usage and consumer behavior. It includes questions about categorizing variables, the type of data a market research director might collect, creating charts and tables from financial data, and comparing mutual fund risk profiles. Key details include identifying which cell phone variables are categorical or numerical, discrete or continuous; determining large cap is the most frequent mutual fund category; and noting differences in central tendency, variability, and shape between mutual fund risk levels based on box plots and statistics.

Uploaded by

Ashley Plemmons
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views16 pages

PS 1 Resubmit Final

The document describes variables related to cell phone usage and consumer behavior. It includes questions about categorizing variables, the type of data a market research director might collect, creating charts and tables from financial data, and comparing mutual fund risk profiles. Key details include identifying which cell phone variables are categorical or numerical, discrete or continuous; determining large cap is the most frequent mutual fund category; and noting differences in central tendency, variability, and shape between mutual fund risk levels based on box plots and statistics.

Uploaded by

Ashley Plemmons
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Problem Set 1 BUS 310

WF 10:30-11:45

1. For each of the following variables, determine whether the variable is categorical or
numerical. If the variable is numerical, determine whether the variable is discrete or
continuous.

a) Number of cell phones in the household –Discrete numerical because it's counted items.
b) Monthly data usage (in MB) - Continuous numerical because it’s measuring data usage.
c) Number of text messages exchanged per month –Discrete numerical because it's counted
items.
d) Voice usage per month (in minutes)- Continuous numerical because it’s measuring usage in
minutes.
e) Whether the cell phone is used for streaming video – Categorical because it defines a
category, either cell phone used for streaming or not.

2. The director of market research at a large department store chain wanted to conduct a
survey throughout a metropolitan area to determine the amount of time working women
spend shopping for clothing in a typical month.

a) Indicate the type of data the director might want to collect.


The director of market research could collect data about how much time working women
typically spend shopping and the type of places they like to shop. This can be used to see if
there will be enough foot traffic in that general area to open up another large department store.

3. From the file MUTUALFUNDS.xls create a Summary table, Pie chart, Bar chart for the
Category variable using PHStat – One-Way Tables and Charts. Copy and paste the table
and charts into your word document and answer the following question:
(a) Write a short comment on which Category value(s) occur more frequently. Answer
using data collected from table and charts.

Category Total Frequency Percentage Cumulative


Percentage

Large Cap 450 450 51.84% 51.84%

Mid Cap 174 244 28.11% 79.95%

Small Cap 244 174 20.05% 100.00%

Grand 868
Total

a.) According to the data Large Cap is the most frequently occurring category.

4. From the file Mutual Funds.xls


a) create a stem and leaf display of Return by using PHStat – Stem and Leaf Display.
What is the lowest return? What is the highest? What is the middle return?

Lowest Return: -9.00


Highest Return: 35.00
Middle Return: 13.10
Statistics
Sample Size 868
Mean 12.5142
Median 13.1000
Std. Deviation 6.2916
Minimum -9.0000
Maximum 35.0000

5. Pareto chart (1 pt) Make sure you interpret the resulting table.
The data below represents problems encountered by staff in setting up rooms. You need to
enter the data into an Excel Spreadsheet

We believe that we could work on the top three complaints from customers, which is over 80%
of the complaints. If we have a time constraint, we at least need to focus on the top and first
complaint which make up 63.16% of the total complaints. The order is as followed:
1. External Noise = 63.16%
2. Improper Management = 13.16%
3. Not Clean = 7.89%
6. From the file MUTUALFUNDS.xls, create a frequency, percent, and cumulative % table for
the Returnvariable. (1-point) Start by typing in a new column of bin numbers 10, 20, …, 80.
Now create another column of Midpoints (you can compute these or add them by hand). If you
have the “special” version of PHStat you can try leaving the bins and midpoints boxes empty to
let PHStat compute the bins and midpoints.Then select PHStat – Histogram & Polygons and
check the histogram and Ogive boxes in Output Options (you will need these for the
problems below).

This table summarizes that the most frequent return is in the 20 return range.

This table shows the frequency distribution of the data with 54% of all data in the 10-19 range.
Another 32% is in the 0-9 range. This means about 86% of all mutual fund returns in 2006 are
within the 0-19 return range.

7. Copy the Histogram plot created in the previous problem here and use it to describe the
frequency histogram of Return. The fewest mutual funds have returns in which range? Which
range has the most mutual funds? What is the shape (symmetric or skewed; flat or peaked) of
the histogram? Are there any gaps in Return ranges? Explain what a gap would mean for the
Returndata. Also comment on how the histogram relates to the stem-and-leaf diagram for
Return you made in question 4. 1 pt
-The fewest mutual funds have returns in which range?
The fewest mutual funds have returns in ranges -10 to -1 and anything over 20, specifically the
smallest will be over 30.
-Which range has the most mutual funds?
From 0 to 20 is where the most mutual funds are.
-What is the shape (symmetric or skewed; flat or peaked) of the histogram?
The shape is skewed to the left with a midpoint at 15.
-Are there any gaps in Return ranges?
There are no gaps in the data- a gap means a bin is empty or there is no data at all.
-Explain what a gap would mean for the Returndata. Also comment on how the histogram
relates to the stem-and-leaf diagram for Return you made in question 4.
The histogram is a visual representation of the stem and leaf plot based on categories.

8. Use the cumulative percent polygon for Return you created in the previous problem. The
category X axis labels should equal to the bin values instead of midpoints. (Estimate the 25th,
50th, and 75th percentile from the cumulative percent chart and estimate what percent of the
lengths are less than or equal to 180. Try to indicate how you estimated these by drawing on the
plot.)(1.5 pt)
The 25th percentile from the cumulative percent polygon is about 6.2.
The 50th percentile from the cumulative percent polygon is about 12.3
The 75th percentile from the cumulative percent polygon is about 17.
About 99% of the returns is less than or equal to 21.
I got this data by using the graph above; I found the percentile percentage and saw where the
line intersected the data for the horizontal axes (6.2,12.3, and 17).
9. From the file MUTUALFUNDS.xls, create a Crosstab table for the categorical variables,
Objective and Risk using PhStat – Two-Way Tables and Charts. Write up a short statement
discussing how the Value and Growth mutual funds differ in their Risk profiles. (1.5 pt)

Two-Way Table

Count of Objective Risk


Objective Average High Low Grand Total
Growth 140 302 22 464
Value 171 53 180 404
Grand Total 311 355 202 868

Count of Low
Growth

Growth Average High Low Grand Total

Growth 30.24% 65.23% 4.54% 100.00%

Value 42.33% 13.12% 44.55% 100.00%

Grand Total 35.87% 40.95% 23.18% 100.00%


Analysis:

After analyzing our data, when it came to the risk profiles two extreme accounts were the low
and high levels (Green and Red). When it came down to the average (blue) there wasn’t much
of a difference. Analyzing the high levels (red), I found there was a 65% of growth but then only
12% when looking at value for the high level. In the low (green) level I found the opposite
extreme; Growth was small at 4%, while value was a contrasting 45%.

Part 2
Question: You wish to compare Risk of mutual funds. For each of these groups, for the
variables Expense ratio and 5-Year Return: (you will be analyzing if the Risk level of the
Mutual fund is reflected in the Expenses ratios and the Returns)
When you create the Box Plots check the 5 number summary and then the quartiles will
be created for you.

a) Compute the mean, median, first quartile, and third quartile.


b) Compute the range, interquartile range, standard deviation, and coefficient of variation.
c) Construct boxplots
d) Write a report that includes 3 paragraphs for each variable to describe how the 3 categories
are similar or different with respect to central tendency, variability and shape. You should
include box plots and descriptive statistics as your appendix.
d) Write a report that includes 3 paragraphs for each variable to describe how the 3 categories
are similar or different with respect to central tendency, variability and shape. You should
include box plots and descriptive statistics as your appendix.

Differences:

In-between the three sets of data there are many differences in central tendency. For example
in the box and leaf plot, there is a visual example of this; the first plot (green) has a central
tendency skewed to the left, with less extreme outliers. The second (red) box and leaf plot has
more extreme outliers, with a centralized tendency visually and according to the data.

Similarities:

A similarity found between these box and leaf plots were between the first (green) and third
(black). They both had mostly centralized box plots skewed to the left with a closer range (or
smaller) range of data.
Five-Number
Summary
Risk/Return

Low High Average

Minimum 0.15 0.2 0.15

First Quartile 0.86 1.04 0.99

Median 1.1 1.27 1.18

Third Quartile 1.24 1.49 1.35

Maximum 2.42 3.36 2.59

a) Compute the mean, median, first quartile, and third quartile.

Mean: 1.19
Median: 1.18
First Quartile: 0.96
Third Quartile: 1.36

b) Compute the range, interquartile range, standard deviation, and coefficient of variation.

Range: 3.21
Interquartile Range: 0.40
Standard Deviation: 0.38
Coefficient of Variation: 32.30%
d) Write a report that includes 3 paragraphs for each variable to describe how the 3 categories
are similar or different with respect to central tendency, variability and shape. You should
include box plots and descriptive statistics as your appendix.

Risk and Expense Ratio Analysis

mean<median = left skewed                                         mean>median = right skewed

         In regards to central tendency, the median at low is 1.10 where the high is at 1.22. The
average of high risk for the expense ratio is higher than the low. The mean for the high ratio
seems to be outside the norm when compared to the average and low ratio. In this case, there
is most likely an outlier influencing the increased distance of the mean from the average point.
Most of the data from low and high seems to be in check based off the average. However, when
looking at the high mode of 1.04 and comparing it to the high mean of 1.23, this can give us
further evidence that an outlier is in play due to the vast difference between the two numbers.
The low mean of 1.07 in comparison to the low median of 1.10 shows that the data is skewed to
the left. The high mean of 1.23 in comparison to the high median of 1.22 shows that the data is
skewed to the right. The average mean of 1.13 in comparison to the average median of 1.14
shows that the data is very slightly skewed right.

        

         In regards to variability the high range of 2.77 and low range of 2.27 when compared to
the average of 2.44 shows how the data is generally distributed unevenly resulting in another
case of an outlier. When subtracting from the average the low data is only 0.17 away from the
average whereas the high is 0.33 away from the average, almost double in comparison to the
low. This shows that there is a definite outlier in the high set of data. Looking at the IQR, both
data sets seem to be normal in regards to the average, but high value is closer to the average.
This means that most of the data is within that range of 0.44, leading to another confirmation
that an outlier is in play due to the severe difference seen in central tendency. Standard
Deviation is within the value of 1 which shows that the data is “normal” but conflicts with other
findings.

         When analyzing the boxplots, the average, high, and low data sets are right skewed with
peaked data. In addition, the low and high boxplots have peaked shape whereas the average
data set seems to be flat. Kurtosis is 2.78 for low, 3.12 for high, and 1.62 for average. The data
is not normal based on the -1 to 1 scale resulting in peaked data and previous findings.

         k>0  peaked            k<0 flat       sk>0 right                    sk<0 left

        

5 Year Return Analysis

In regards to central tendency, more specifically the medians, the low risk of 9.4 and high risk of
5.3 shows how much more profit one would gain with the low risk. Based off the kurtosis, we
can already tell how abnormal the data is. Looking at the Range, we see how the high data
(43.6) is far away from the average (21.1) and low (19.4). Due to this, using the mean is not a
good way of analyzing the data due to the amount of outliers. The IQR Q3 +1.5 IQR for the 5-
year low return is 18.7 which suggests an outlier to be present. The low risk IQR (12.1 - 7.7) =
4.4 and High risk IQR of (8.2 - 2.4) = 5.8 and the average risk was (12.1 - 5) = 7.1. Based off the
IQR, the low risk shows the lowest amount of risk.

The kurtosis of high data of 3.5 is greater than 0 which results in peaked shape. The kurtosis of
average data of -0.7 is less than 0, which results in a flat shape. The kurtosis of low data of 1.4
is greater than 0 which results in peaked shape. Skewness for high data is 0.22 > 0 which is
rightly skewed. Skewness for low data is 0.98 < 0 which is slightly left skewed. Skewness for
average data is 0.28 > 0 which is rightly skewed. 

In regards to boxplots, the average and low are rightly skewed ad peaked where as the high
boxplot seems to be symmetric and flat from a visual perspective. When putting the x axis into
play, we see how all boxplots are greater than 0 which suggests all data to be rightly skewed.
Part 3:

Use the Stock Index 2010 data set in your PS1 for the following problems.

2.

a. Judge whether the SP500 variable has a normal shape (1pt)

The data is far from normal distribution. According to kurtosis of 16.9 being severly
greater than 0 we can conclude that the data is peaked. Based on the skewness of -3.1224
which is less than 0 the data is left skewed. When looking at central tendency in relevance to
the min and max, we see how the data is clustered towards the max. This also shows how the
data contains outliers.
b. Whether there are outliers? If there are, list all outliers (1pt)

Mean = 92.2227(3) = 276.6681 + 1129.75 = 1406.181

No numerical data values exceed this number, so in regards to data on the upper end, there are
no outliers.

1129.75 - 276.6681 = 853.081

There is one outlier on the lower end of data of 625.59 on September 13, 2010.

3.

a. Judge whether the NIKKEI variable has a normal shape (1pt)


There seems to be a slight positive skewness going on base off the difference between quartile
2 - quartile 3 being greater than quartile 1 - quartile 2. Overall, the data generally seems to be
normal. No signs of severe skewness or outliers.

b. Whether there are outliers? If there are, list all outliers (1pt) (3 standard deviations from
mean, or IQR + and – 1.5)
No outliers to be found. Three standard deviations above the mean is about 685.27
while three standard deviations below the mean is around 391.97. No data points lie outside
these ranges.

You might also like