F.4. Data Analytics Part 2
F.4. Data Analytics Part 2
using the computer spreadsheet below. The actual results for Year 1 are shown in the
first column. Net sales and cost of goods sold are projected to increase at the rate of 20
percent each year. Selling and administrative expenses are projected to increase at an
annual rate of 10 percent. However, several mistakes have been made in the
spreadsheet below.
A B C D E
1 Year 1 Year 2 Year 3 Year 4
2 Net sales $400,000 $480,000 $480,000 $576,000
3 Cost of goods sold 310,000 372,000 372,000 446,400
4 Gross profit 90,000 108,000 108,000 129,600
5 Selling expenses 30,000 33,000 36,300 39,930
6 Administrative expenses 40,000 44,000 48,400 58,080
7 Income before taxes 20,000 31,000 23,300 31,590
8 Income taxes at 40% 8,000 12,400 9,320 12,636
9 Net income 12,000 18,600 13,980 18,954
Choice “D” is correct. The question states that administrative expenses increase by 10
percent per year. In Year 2 and Year 3, administrative expenses increased by 10
percent over the prior year; Year 1’s expense was $40,000, and a 10 percent increase
over Year 1 for Year 2 equals $44,000 ($40,000 × 1.10). For Year 3, Year 2’s $44,000 is
multiplied by 1.10 to get $48,400.
Cell E6 does not have the correct amount for administrative expenses. A 10 percent
increase of Year 3’s administrative expense is $53,240 ($48,400 from Year 2 × 1.10).
Instead, Cell E6 indicates that the amount is $58,080, which represents a 20 percent
increase [$58,080 ÷ $48,400 = 1.20 – 1 (base amount) = 0.20 increase].
The primary difference between the coefficient of correlation and the coefficient of
determination is that:
Choice "C" is correct. The coefficient of correlation measures the strength and direction
of the relationship between two variables. Since the company increases advertising
when sales are low and decreases advertising when sales are high, the movement is in
directly opposite directions and the coefficient would be close to -1.0.
Choice "A" is incorrect. A coefficient of correlation of 1.0 would imply that both variables
move in the same direction at approximately the same rate. An increase in advertising
when sales are increasing would be characteristic of a correlation of coefficient of 1.0.
Choice "D" is incorrect. A relationship exists between advertising and sales. According
to the facts of the question, the relationship is an inverse relationship. The coefficient of
correlation is expressed as a range between -1.0 and +1.0.
Statsinc uses the simple linear regression formula Y = A + Bx to estimate total cost (Y),
with X representing volume, A equaling fixed costs, and B equaling variable costs per
unit. If the correlation coefficient between X and Y is 0.90, which of the following is
correct?
Choice "C" is correct. Variable costs per unit (B) will be constant, meaning that overall
variable costs will vary in direct proportion to changes in volume (x).
Choice "A" is incorrect. Y (total cost) is the dependent variable, while x (volume) is the
independent variable. Volume will drive total cost.
Choice "B" is incorrect. The coefficient of determination (R-squared) will be equal to the
correlation coefficient, squared. Since the correlation coefficient is 0.90, the coefficient
of determination will be 0.81.
Choice "D" is incorrect. Fixed costs overall (not fixed costs per unit) will remain constant
regardless of production.
In regression analysis, the coefficient of determination:
Choice "A" is correct. This is the definition of the coefficient of determination. It is the
square of the coefficient of correlation. The higher the coefficient of determination, the
greater the proportion of the total variation in y that is explained by the variation in x.
The higher it is, the better is the fit of the regression line.
Choice "D" is incorrect. It becomes larger as the fit of the regression line improves.
A regression analysis on the prices of local housing resulted in the following summary
output:
Which field(s) shows the overall likelihood that all regression coefficients in this model
are actually zero?
Choice "B" is correct. Regression analysis software provides output that details the
strength of the regression model. Key values from this output include the model's overall
significance (F statistic), the degree to which the model explains the variation (Adjusted
R Squared), and the significance of each independent variable's effect on the
dependent variable (p-values). If all of these values are within tolerance, then the
regression coefficients specify the relationship between the independent and dependent
variables.
A regression analysis on the prices of local housing resulted in the following summary
output:
Which field(s) shows the likelihood that the effect a particular independent variable has
on the dependent variable is not reliably different from random numbers?
Choice "D" is correct. Regression analysis software provides output that details the
strength of the regression model. Key values from this output include the model's overall
significance (F statistic), the degree to which the model explains the variation (Adjusted
R Squared), and the significance of each independent variable's effect on the
dependent variable (p-values). If all of these values are within tolerance, then the
A regression analysis on the prices of local housing resulted in the following summary
output:
Which field(s) shows the calculated estimate for how much of the variation in the
dependent variable is explained by the independent variable(s)?
Choice "A" is correct. Regression analysis software of any brand provides output that
details the strength of the regression model. Key values from this output include the
model's overall significance (F statistic), the degree to which the model explains the
variation (Adjusted R Squared), and the significance of each independent variable's
effect on the dependent variable (p-values). If all of these values are within tolerance,
then the regression coefficients specify the relationship between the independent and
dependent variables.
A regression analysis on the prices of local housing resulted in the following summary
output:
Which field(s) precisely expresses the relationship between the independent variable(s)
and the dependent variable?
Choice "C" is correct. regression analysis software of any brand provides output that
details the strength of the regression model. Key values from this output include the
model's overall significance (F statistic), the degree to which the model explains the
variation (Adjusted R Squared), and the significance of each independent variable's
effect on the dependent variable (p-values). If all of these values are within tolerance,
then the regression coefficients specify the relationship between the independent and
dependent variables.
Ivey Company uses regression analysis in examining its costs. It has determined that
there is a correlation coefficient of 0.90 between two variables X and Y. Which of the
following statements is correct for a correlation coefficient of 0.90?
In this question, they want to know which statement is correct with respect to regression
analysis. The only given fact is that there is a 0.90 correlation coefficient between the
variables X and Y.
Statement I says that there is little relationship between X and Y. The correlation
coefficient is the strength of the relationship between the independent and dependent
variables X and Y. Because correlation coefficients range between −1.00 and 1.00, a
correlation coefficient of 0.90 would indicate a strong relationship. Statement I is
incorrect.
Statement II says that variation in X explains 90% of the variation in Y. This statement is
discussing the coefficient of determination, not the correlation coefficient. Statement II is
incorrect.
Statement III says that, if X increases, Y will never decrease. If the correlation were
perfect with a correlation coefficient of 1.00, "never" would be correct. Statement III is
incorrect.
Choice “D” is correct. Regression analysis is complex and does not always produce a
positive result. Models that are not statistically significant often have one or more of the
following warning signs: a small (near zero) correlation coefficient, a small (near zero)
coefficient of determination, or a large (in proportion to the dependent variable) standard
error.
In this example, all three of these warning signs are present. This regression equation is
unlikely to be a true representation of the relationship between these demographic
variables and sales, if such a relationship even exists. There is no support for the
regression.
Choice “A” is incorrect. While the regression equation would indeed suggest that this is
the way to maximize sales given the signs on the slopes of the β terms for the two
independent variables (positive for wealth, negative for age), the regression model itself
is of poor quality.
Choice “B” is incorrect. The regression equation would suggest the opposite strategy
given the signs on the slopes of the β terms for the two independent variables (positive
for wealth, negative for age).
Choice “C” is incorrect. The fact that this regression model is of such poor quality is not
a reflection on the product being studied. A poor regression model does not mean the
product is bad, merely that the regression model cannot provide reliable
recommendations for how best to market it under the studied conditions.
The OutdoorPeople Co. has identified several subgroups among the company's
customer base. These groups have particular combinations of age, wealth, geographic
location, etc. The company is about to release a new product and it wants to measure
how much of an effect the customer's wealth has on buying the product after viewing
the advertising message(s) for the product.
What kind of analysis will be most useful to answer OutdoorPeople's need for
information?
Choice "B" is correct. Regression analysis uses statistics software to discover and
quantify the relationship between a dependent variable and one or more independent
variables. The resulting coefficients can be used to predict values of the dependent
variable from any values the independent variable may have in the future.
OutdoorPeople wants to know if it can predict how likely a customer is to buy its product
based on the customer's wealth. After performing a successful regression analysis,
OutdoorPeople will have a regression equation that will contain this information.
Choice "A" is incorrect. Cluster analysis is used to identify subgroups within a larger
group based on shared characteristics. OutdoorPeople has already identified subgroups
but is asking a question across all its subgroups. Cluster analysis will not answer that
question.
Choice "D" is incorrect. Classification analysis is used to place newly encountered data
into subgroups already established by cluster analysis. OutdoorPeople already has
clusters, but they are not being used in this study, and no new customers are being
classified into existing clusters.
A time series analysis shows that sales have risen steadily over the last 10 years. This
is an example of:
Choice "A" is correct. A time series analysis that shows a steadily increasing or
decreasing pattern is an example of a trend line.
Choice "B" is incorrect. A time series analysis showing that sales have risen steadily
over the last 10 years is not an example of a cyclical pattern.
Choice "C" is incorrect. A time series analysis showing that sales have risen steadily
over the last 10 years is not an example of an irregular pattern.
Choice "D" is incorrect. A time series analysis showing that sales have risen steadily
over the last 10 years is not an example of a seasonal pattern.
Which of the following is true regarding exponential smoothing?
Choice "D" is correct. Exponential smoothing uses a weighted average of past time
series, selecting only one weight, that of the most recent set of data. The calculation
computes a value comparing the forecast and actual value for the last time series
before the series being forecast. It employs a smoothing constant between zero and 1
that is derived through a series of trials and errors on an initial set of data.
Choice "A" is incorrect. Exponential smoothing does not multiply the most recent set of
data by a smoothing constant as well as the next or previous set of data.
Choice "B" is incorrect. Exponential smoothing does not use the last three sets of data,
giving equal weighting to each.
Choice "C" is incorrect. Exponential smoothing does not use a weighted average, with
the most recent data receiving the lowest weighting.
A time series analysis shows that company revenues decline during recessions and
sales increase dramatically during expansions. This pattern is an example of:
Choice "D" is correct. A time series analysis that shows fluctuations with economic
cycles (revenues down during recessions) is an example of a cyclical pattern.
Choice "A" is incorrect. A time series analysis showing that company revenues decline
during recessions and sales increase dramatically during expansions is not an example
of an irregular pattern.
Choice "B" is incorrect. A time series analysis showing that company revenues decline
during recessions and sales increase dramatically during expansions is not an example
of a seasonal pattern.
Choice "C" is incorrect. A time series analysis showing that company revenues decline
during recessions and sales increase dramatically during expansions is not an example
of a trend line.
A time series analysis of a business' sales show a decline in sales every summer, with a
peak during the winter. These results could be:
Choice "B" is correct. The seasonal component is a measure of data with time as the
independent variable within a single fiscal or calendar year. Unlike a straight trend line,
you see peaks and troughs over time with regular patterns.
Choice "A" is incorrect. A downward trend would not explain a time series analysis
showing a decline in sales every summer, with a peak during the winter.
Choice "C" is incorrect. Cyclical fluctuations would not explain a time series analysis
showing a decline in sales every summer, with a peak during the winter.
Choice "D" is incorrect. An upward trend would not explain a time series analysis
showing a decline in sales every summer, with a peak during the winter.
A time series analysis shows that sales normally have been rising but last year there
was a big drop. Analysts believe that the drop was due to massive regional fires that
hurt business. This is an example of:
Choice "D" is correct. An irregular component is random variability in a time series that
deviates from values that can be observed as a trend or a pattern.
Choice "A" is incorrect. Massive regional fires that hurt business would most likely not
be considered a seasonal pattern.
Choice "B" is incorrect. Massive regional fires that hurt business would most likely not
be considered a cyclical pattern.
Choice "C" is incorrect. Massive regional fires that hurt business would most likely not
be considered a trend line.
Automite Company is an automobile replacement parts dealer in a large metropolitan
community. Automite is preparing its sales forecast for the coming year. Data regarding
both Automite's and industry sales of replacement parts as well as both the used and
new automobile sales in the community for the last 10 years have been accumulated.
If Automite wants to determine if there is a historical trend in the growth of its sales as
well as the growth of industry sales of replacement parts, the company would employ:
Choice "A" is incorrect. Automite would not employ simulation techniques to determine if
there is a historical trend in the growth of its sales as well as the growth of industry
sales of replacement parts.
Choice "B" is incorrect. Automite would not employ statistical sampling to determine if
there is a historical trend in the growth of its sales as well as the growth of industry
sales of replacement parts.
Choice "D" is incorrect. Automite would not employ queuing theory to determine if there
is a historical trend in the growth of its sales as well as the growth of industry sales of
replacement parts.
Snow Bird Co. has recently collected information on all customers who have either
purchased skis, snowboards, or other merchandise from the company's ski shop. Snow
Bird would like to send postcards to these customers and, moving forward, send
postcards to customers a week following a purchase in hopes of increasing the number
of repeat customers. After collection of the initial data, Snow Bird realized that the
amount of data collected is too large for the current software and it must upgrade to be
able to analyze the data. Snow Bird also realized that some of the customers listed in
the data may potentially only have purchased a small item (i.e., a snack) and it does not
want to waste money sending these customers postcards. To be able to create a list of
potential repeat customers, the data first must be scrubbed.
Which of the following four dimensions of big data is not discussed in the above
scenario?
The variety of data refers to the best "big data" coming from a variety of sources,
including customer relationship management systems, social media feedback, point-of-
sale records, and other sources. The scenario only discusses big data collected from
the Snow Bird ski shop, not other potential sources from which Snow Bird may also
collect data (ticket office, restaurant, etc.).
Choice "A" is incorrect. The volume of big data refers to data being too large for
traditional database software to store. The scenario discussed that Snow Bird realized
the data collected was too large for the current system.
Choice "B" is incorrect. The velocity of big data refers to the flow of data being
continuous and the real value is being able to analyze data in real time. Although Snow
Bird has not yet had the chance to analyze the data, the goal discussed in the scenario
is to analyze the data in real time so customers receive a postcard a week after a
purchase.
Choice "D" is incorrect. The veracity of big data refers to biases or irrelevant data being
mined from big data to minimize the chance of making decisions based on the wrong
data. The scenario discussed that Snow Bird needed to scrub the data of customers
who the company believed would not become repeat customers.
Jacks Capital Inc. is putting together financial results for its annual report. Gains were
reported for each month during the past year but those were completely offset by heavy
losses in the last two months. If Jacks wants to show the relative cumulative
incremental impact of each month's results, which of the following charts would best
illustrate that?
Choice "D" is correct. The cumulative impact of data points over time can be shown by
a waterfall chart. Each point contributes to the total of all data points, with each
incremental contribution shown at a given point in time.
A waterfall chart is the best answer because it will show both the cumulative and
incremental impact of each month's financial results for Jacks Capital. This will allow
investors to see that all months, except for two, were consistent.
Choice "A" is incorrect. Scatter plots are more for data sets that have a high volume and
they can have overlapping time periods. They also do not show the cumulative effect of
all data points.
Choice "B" is incorrect. Flowcharts are for processes. They show a path from beginning
to end with different options along the way. They do not show cumulative value.
Choice "C" is incorrect. Pyramids are for communicating foundational relationships. The
data in this example does not have this sort of relationship and does not report
cumulative value.
A local bank is looking for any patterns in its data for which customers pay back their
loans and which ones do not. The data the company has decided to use is the final
disposition of the loan (paid or defaulted), the customer's income, the amount of the
loan, and the proportion between those two values.
Which of the following data visualization techniques would be most suited to facilitate
the recognition of any patterns present?
Choice "A" is correct. A bubble chart is a scatter plot (a mapping of data points onto a
grid according to two or more qualities of the data, one quality for each axis forming the
grid (usually two). The spatial distribution of the data points enables pattern recognition
such as correlation and the direction of any covariant relationship. Bubble charts are
particularly useful because they can display more than two types of data without
resorting to a third or higher dimensional graph through the use of symbols, color, and
the size of the data points.
For this example, if the bank mapped its customer's income to one axis, then the bank
could use either of the other measures for the other axis, leaving the third quality to
determine the size of the bubble. Either way, the bank would have an image showing
which loans left customers more financially stretched relative to other customers.
Coloring the dots differently to show defaults versus paid loans would help the bank
discover an association between loaning a customer a higher proportion of the
customer's income and the likelihood of default.
Choice "B" is incorrect. A pie chart is used to show what proportion of the whole
comprises each subgroup. A pie chart could be made to show the relative proportions of
paid loans to defaulted loans, and a separate pie chart could show the proportions
among designated segments of income, but this visualization technique would have no
way to combine the two in a single image to discover patterns.