0% found this document useful (0 votes)
20 views

F.4. Data Analytics Part 2

Uploaded by

Kondreddi Saku
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

F.4. Data Analytics Part 2

Uploaded by

Kondreddi Saku
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

A financial analyst has been asked to forecast net income over the next three years

using the computer spreadsheet below. The actual results for Year 1 are shown in the
first column. Net sales and cost of goods sold are projected to increase at the rate of 20
percent each year. Selling and administrative expenses are projected to increase at an
annual rate of 10 percent. However, several mistakes have been made in the
spreadsheet below.

A B C D E
1 Year 1 Year 2 Year 3 Year 4
2 Net sales $400,000 $480,000 $480,000 $576,000
3 Cost of goods sold 310,000 372,000 372,000 446,400
4 Gross profit 90,000 108,000 108,000 129,600
5 Selling expenses 30,000 33,000 36,300 39,930
6 Administrative expenses 40,000 44,000 48,400 58,080
7 Income before taxes 20,000 31,000 23,300 31,590
8 Income taxes at 40% 8,000 12,400 9,320 12,636
9 Net income 12,000 18,600 13,980 18,954

Which one of the cells listed below contains an incorrect formula?

CalculatorTime Value Tables


A. D4
B. D8
C. E9
D. E6
Explanation

Choice “D” is correct. The question states that administrative expenses increase by 10
percent per year. In Year 2 and Year 3, administrative expenses increased by 10
percent over the prior year; Year 1’s expense was $40,000, and a 10 percent increase
over Year 1 for Year 2 equals $44,000 ($40,000 × 1.10). For Year 3, Year 2’s $44,000 is
multiplied by 1.10 to get $48,400.
Cell E6 does not have the correct amount for administrative expenses. A 10 percent
increase of Year 3’s administrative expense is $53,240 ($48,400 from Year 2 × 1.10).
Instead, Cell E6 indicates that the amount is $58,080, which represents a 20 percent
increase [$58,080 ÷ $48,400 = 1.20 – 1 (base amount) = 0.20 increase].
The primary difference between the coefficient of correlation and the coefficient of
determination is that:

CalculatorTime Value Tables


A. The coefficient of determination may be either negative or positive depending upon the slope
of computed trends.
B. The coefficient of determination refines the coefficient of correlation.
C. The coefficient of correlation refines the coefficient of determination.
D. The coefficient of determination is always negative.
Explanation

Choice "B" is correct.

The coefficient of determination (R2) is a refinement of the coefficient of correlation (R).


The coefficient of correlation measures the strength of a linear relationship, positive or
negative, while the coefficient of determination explains the proportion of the variation in
the dependent variable caused by the independent variable.

Choice "A" is incorrect. The coefficient of determination (R2) is always positive.

Choice "C" is incorrect. The coefficient of determination (R2) is a refinement of the


coefficient of correlation (R.).

Choice "D" is incorrect. The coefficient of determination (R2) is always positive.


What would be the approximate value of the coefficient of correlation between
advertising and sales where a company advertises aggressively as an alternative to
temporary worker layoffs and cuts off advertising when incoming jobs are on backorder?

CalculatorTime Value Tables


A. 1.0
B. 0
C. -1.0
D. -100.0
Explanation

Choice "C" is correct. The coefficient of correlation measures the strength and direction
of the relationship between two variables. Since the company increases advertising
when sales are low and decreases advertising when sales are high, the movement is in
directly opposite directions and the coefficient would be close to -1.0.

Choice "A" is incorrect. A coefficient of correlation of 1.0 would imply that both variables
move in the same direction at approximately the same rate. An increase in advertising
when sales are increasing would be characteristic of a correlation of coefficient of 1.0.

Choice "B" is incorrect. A coefficient of correlation of 0 would imply that there is no


relationship between advertising and sales. There is an inverse relationship between
advertising and sales.

Choice "D" is incorrect. A relationship exists between advertising and sales. According
to the facts of the question, the relationship is an inverse relationship. The coefficient of
correlation is expressed as a range between -1.0 and +1.0.
Statsinc uses the simple linear regression formula Y = A + Bx to estimate total cost (Y),
with X representing volume, A equaling fixed costs, and B equaling variable costs per
unit. If the correlation coefficient between X and Y is 0.90, which of the following is
correct?

CalculatorTime Value Tables


A. Total cost (Y) is the independent variable.
B. The coefficient of determination is equal to 0.95.
C. Variable costs will vary in direct proportion to volume changes.
D. Fixed costs per unit will remain constant regardless of production.
Explanation

Choice "C" is correct. Variable costs per unit (B) will be constant, meaning that overall
variable costs will vary in direct proportion to changes in volume (x).

Choice "A" is incorrect. Y (total cost) is the dependent variable, while x (volume) is the
independent variable. Volume will drive total cost.

Choice "B" is incorrect. The coefficient of determination (R-squared) will be equal to the
correlation coefficient, squared. Since the correlation coefficient is 0.90, the coefficient
of determination will be 0.81.

Choice "D" is incorrect. Fixed costs overall (not fixed costs per unit) will remain constant
regardless of production.
In regression analysis, the coefficient of determination:

CalculatorTime Value Tables


A. Is used to determine the proportion of the total variation in the dependent variable (y)
explained by the independent variable (x).
B. Ranges between negative one and positive one.
C. Is used to determine the expected value of the net income based on the regression line.
D. Becomes smaller as the fit of the regression line improves.
Explanation

Choice "A" is correct. This is the definition of the coefficient of determination. It is the
square of the coefficient of correlation. The higher the coefficient of determination, the
greater the proportion of the total variation in y that is explained by the variation in x.
The higher it is, the better is the fit of the regression line.

Choice "B" is incorrect. It ranges between 0 and 1. Remember, the coefficient of


determination is the square of the coefficient of correlation. Because it is a number
squared, it will be positive.

Choice "C" is incorrect. This is not a use of the coefficient of determination.

Choice "D" is incorrect. It becomes larger as the fit of the regression line improves.
A regression analysis on the prices of local housing resulted in the following summary
output:

Which field(s) shows the overall likelihood that all regression coefficients in this model
are actually zero?

CalculatorTime Value Tables


A. Adjusted R Squared
B. F statistic of ANOVA
C. Calculated regression coefficients for each independent variable
D. P-value for each independent variable
Explanation

Choice "B" is correct. Regression analysis software provides output that details the
strength of the regression model. Key values from this output include the model's overall
significance (F statistic), the degree to which the model explains the variation (Adjusted
R Squared), and the significance of each independent variable's effect on the
dependent variable (p-values). If all of these values are within tolerance, then the
regression coefficients specify the relationship between the independent and dependent
variables.
A regression analysis on the prices of local housing resulted in the following summary
output:

Which field(s) shows the likelihood that the effect a particular independent variable has
on the dependent variable is not reliably different from random numbers?

CalculatorTime Value Tables


A. Adjusted R Squared
B. F statistic of ANOVA
C. Calculated regression coefficients for each independent variable
D. P-value for each independent variable
Explanation

Choice "D" is correct. Regression analysis software provides output that details the
strength of the regression model. Key values from this output include the model's overall
significance (F statistic), the degree to which the model explains the variation (Adjusted
R Squared), and the significance of each independent variable's effect on the
dependent variable (p-values). If all of these values are within tolerance, then the
A regression analysis on the prices of local housing resulted in the following summary
output:

Which field(s) shows the calculated estimate for how much of the variation in the
dependent variable is explained by the independent variable(s)?

CalculatorTime Value Tables


A. Adjusted R Squared
B. F statistic of ANOVA
C. Calculated regression coefficients for each independent variable
D. P-value for each independent variable
Explanation

Choice "A" is correct. Regression analysis software of any brand provides output that
details the strength of the regression model. Key values from this output include the
model's overall significance (F statistic), the degree to which the model explains the
variation (Adjusted R Squared), and the significance of each independent variable's
effect on the dependent variable (p-values). If all of these values are within tolerance,
then the regression coefficients specify the relationship between the independent and
dependent variables.
A regression analysis on the prices of local housing resulted in the following summary
output:

Which field(s) precisely expresses the relationship between the independent variable(s)
and the dependent variable?

CalculatorTime Value Tables


A. Adjusted R Squared
B. F statistic of ANOVA
C. Calculated regression coefficients for each independent variable
D. P-value for each independent variable
Explanation

Choice "C" is correct. regression analysis software of any brand provides output that
details the strength of the regression model. Key values from this output include the
model's overall significance (F statistic), the degree to which the model explains the
variation (Adjusted R Squared), and the significance of each independent variable's
effect on the dependent variable (p-values). If all of these values are within tolerance,
then the regression coefficients specify the relationship between the independent and
dependent variables.
Ivey Company uses regression analysis in examining its costs. It has determined that
there is a correlation coefficient of 0.90 between two variables X and Y. Which of the
following statements is correct for a correlation coefficient of 0.90?

CalculatorTime Value Tables


A. There is little relationship between X and Y.
B. Variation in X explains 90% of the variation in Y.
C. If X increases, Y will never decrease.
D. If X increases, Y will generally increase.
Explanation

Choice "D" is correct.

In this question, they want to know which statement is correct with respect to regression
analysis. The only given fact is that there is a 0.90 correlation coefficient between the
variables X and Y.

Statement I says that there is little relationship between X and Y. The correlation
coefficient is the strength of the relationship between the independent and dependent
variables X and Y. Because correlation coefficients range between −1.00 and 1.00, a
correlation coefficient of 0.90 would indicate a strong relationship. Statement I is
incorrect.

Statement II says that variation in X explains 90% of the variation in Y. This statement is
discussing the coefficient of determination, not the correlation coefficient. Statement II is
incorrect.

Statement III says that, if X increases, Y will never decrease. If the correlation were
perfect with a correlation coefficient of 1.00, "never" would be correct. Statement III is
incorrect.

Statement IV says that, if X increases, Y will generally increase. Statement IV is correct.


The Happy Smiles Co. has a new advertisement and the company has collected data
from focus groups about how effectively the advertisement leads to purchases of the
company's products. The company has measured the demographic information of the
focus group and has prepared a regression analysis. The output of that regression
analysis is that the correlation coefficient is 0.22, the coefficient of determination is 0.10,
the standard error is 8,435.20, and the regression equation is:

Sales = 10,743 + 1.7 (Household income) – 8.2 (Average child age)

Which of these business decisions is most appropriate given the data?

CalculatorTime Value Tables


A. Focus marketing messages on wealthy customers with young children.
B. Focus marketing messages on poorer customers with older children.
C. Do not produce or market this product.
D. The data supports none of these recommendations.
Explanation

Choice “D” is correct. Regression analysis is complex and does not always produce a
positive result. Models that are not statistically significant often have one or more of the
following warning signs: a small (near zero) correlation coefficient, a small (near zero)
coefficient of determination, or a large (in proportion to the dependent variable) standard
error.

In this example, all three of these warning signs are present. This regression equation is
unlikely to be a true representation of the relationship between these demographic
variables and sales, if such a relationship even exists. There is no support for the
regression.

Choice “A” is incorrect. While the regression equation would indeed suggest that this is
the way to maximize sales given the signs on the slopes of the β terms for the two
independent variables (positive for wealth, negative for age), the regression model itself
is of poor quality.

Choice “B” is incorrect. The regression equation would suggest the opposite strategy
given the signs on the slopes of the β terms for the two independent variables (positive
for wealth, negative for age).

Choice “C” is incorrect. The fact that this regression model is of such poor quality is not
a reflection on the product being studied. A poor regression model does not mean the
product is bad, merely that the regression model cannot provide reliable
recommendations for how best to market it under the studied conditions.
The OutdoorPeople Co. has identified several subgroups among the company's
customer base. These groups have particular combinations of age, wealth, geographic
location, etc. The company is about to release a new product and it wants to measure
how much of an effect the customer's wealth has on buying the product after viewing
the advertising message(s) for the product.

What kind of analysis will be most useful to answer OutdoorPeople's need for
information?

CalculatorTime Value Tables


A. Cluster analysis
B. Regression analysis
C. Fourrier analysis
D. Classification analysis
Explanation

Choice "B" is correct. Regression analysis uses statistics software to discover and
quantify the relationship between a dependent variable and one or more independent
variables. The resulting coefficients can be used to predict values of the dependent
variable from any values the independent variable may have in the future.

OutdoorPeople wants to know if it can predict how likely a customer is to buy its product
based on the customer's wealth. After performing a successful regression analysis,
OutdoorPeople will have a regression equation that will contain this information.

Choice "A" is incorrect. Cluster analysis is used to identify subgroups within a larger
group based on shared characteristics. OutdoorPeople has already identified subgroups
but is asking a question across all its subgroups. Cluster analysis will not answer that
question.

Choice "C" is incorrect. Fourrier analysis is used to represent a repeating waveform as


a series of trigonometric functions so that repeating oscillating phenomena (such as
sound, light, heat, etc.) can be mathematically reproduced and compared. Fourrier
analysis is unlikely to be of any help to OutdoorPeople.

Choice "D" is incorrect. Classification analysis is used to place newly encountered data
into subgroups already established by cluster analysis. OutdoorPeople already has
clusters, but they are not being used in this study, and no new customers are being
classified into existing clusters.
A time series analysis shows that sales have risen steadily over the last 10 years. This
is an example of:

CalculatorTime Value Tables


A. A trend line.
B. A cyclical pattern.
C. An irregular pattern.
D. A seasonal pattern.
Explanation

Choice "A" is correct. A time series analysis that shows a steadily increasing or
decreasing pattern is an example of a trend line.

Choice "B" is incorrect. A time series analysis showing that sales have risen steadily
over the last 10 years is not an example of a cyclical pattern.

Choice "C" is incorrect. A time series analysis showing that sales have risen steadily
over the last 10 years is not an example of an irregular pattern.

Choice "D" is incorrect. A time series analysis showing that sales have risen steadily
over the last 10 years is not an example of a seasonal pattern.
Which of the following is true regarding exponential smoothing?

CalculatorTime Value Tables


A. Exponential smoothing multiplies the most recent set of data by a smoothing constant as well
as the next or previous set of data.
B. Exponential smoothing uses the last three sets of data, giving equal weighting to each.
C. Exponential smoothing uses a weighted average, with the most recent data receiving the
lowest weighting.
D. Exponential smoothing uses a weighted average of past time series, selecting only one weight.
Explanation

Choice "D" is correct. Exponential smoothing uses a weighted average of past time
series, selecting only one weight, that of the most recent set of data. The calculation
computes a value comparing the forecast and actual value for the last time series
before the series being forecast. It employs a smoothing constant between zero and 1
that is derived through a series of trials and errors on an initial set of data.

Choice "A" is incorrect. Exponential smoothing does not multiply the most recent set of
data by a smoothing constant as well as the next or previous set of data.

Choice "B" is incorrect. Exponential smoothing does not use the last three sets of data,
giving equal weighting to each.

Choice "C" is incorrect. Exponential smoothing does not use a weighted average, with
the most recent data receiving the lowest weighting.
A time series analysis shows that company revenues decline during recessions and
sales increase dramatically during expansions. This pattern is an example of:

CalculatorTime Value Tables


A. An irregular pattern.
B. A seasonal pattern.
C. A trend line.
D. A cyclical pattern.
Explanation

Choice "D" is correct. A time series analysis that shows fluctuations with economic
cycles (revenues down during recessions) is an example of a cyclical pattern.

Choice "A" is incorrect. A time series analysis showing that company revenues decline
during recessions and sales increase dramatically during expansions is not an example
of an irregular pattern.

Choice "B" is incorrect. A time series analysis showing that company revenues decline
during recessions and sales increase dramatically during expansions is not an example
of a seasonal pattern.

Choice "C" is incorrect. A time series analysis showing that company revenues decline
during recessions and sales increase dramatically during expansions is not an example
of a trend line.
A time series analysis of a business' sales show a decline in sales every summer, with a
peak during the winter. These results could be:

CalculatorTime Value Tables


A. A downward trend.
B. Seasonal fluctuations.
C. Cyclical fluctuations.
D. An upward trend.
Explanation

Choice "B" is correct. The seasonal component is a measure of data with time as the
independent variable within a single fiscal or calendar year. Unlike a straight trend line,
you see peaks and troughs over time with regular patterns.

Choice "A" is incorrect. A downward trend would not explain a time series analysis
showing a decline in sales every summer, with a peak during the winter.

Choice "C" is incorrect. Cyclical fluctuations would not explain a time series analysis
showing a decline in sales every summer, with a peak during the winter.

Choice "D" is incorrect. An upward trend would not explain a time series analysis
showing a decline in sales every summer, with a peak during the winter.
A time series analysis shows that sales normally have been rising but last year there
was a big drop. Analysts believe that the drop was due to massive regional fires that
hurt business. This is an example of:

CalculatorTime Value Tables


A. A seasonal pattern.
B. A cyclical pattern.
C. A trend line.
D. An irregular component.
Explanation

Choice "D" is correct. An irregular component is random variability in a time series that
deviates from values that can be observed as a trend or a pattern.

Choice "A" is incorrect. Massive regional fires that hurt business would most likely not
be considered a seasonal pattern.

Choice "B" is incorrect. Massive regional fires that hurt business would most likely not
be considered a cyclical pattern.

Choice "C" is incorrect. Massive regional fires that hurt business would most likely not
be considered a trend line.
Automite Company is an automobile replacement parts dealer in a large metropolitan
community. Automite is preparing its sales forecast for the coming year. Data regarding
both Automite's and industry sales of replacement parts as well as both the used and
new automobile sales in the community for the last 10 years have been accumulated.

If Automite wants to determine if there is a historical trend in the growth of its sales as
well as the growth of industry sales of replacement parts, the company would employ:

CalculatorTime Value Tables


A. Simulation techniques.
B. Statistical sampling.
C. Time series analysis.
D. Queuing theory.
Explanation

Choice "C" is correct. Time series analysis is a series of measurements of a variable


over time. Its purpose is to find patterns (trend, cyclical, seasonal, or random) in the
data and to use these patterns to forecast future values of the variable.

Choice "A" is incorrect. Automite would not employ simulation techniques to determine if
there is a historical trend in the growth of its sales as well as the growth of industry
sales of replacement parts.

Choice "B" is incorrect. Automite would not employ statistical sampling to determine if
there is a historical trend in the growth of its sales as well as the growth of industry
sales of replacement parts.

Choice "D" is incorrect. Automite would not employ queuing theory to determine if there
is a historical trend in the growth of its sales as well as the growth of industry sales of
replacement parts.
Snow Bird Co. has recently collected information on all customers who have either
purchased skis, snowboards, or other merchandise from the company's ski shop. Snow
Bird would like to send postcards to these customers and, moving forward, send
postcards to customers a week following a purchase in hopes of increasing the number
of repeat customers. After collection of the initial data, Snow Bird realized that the
amount of data collected is too large for the current software and it must upgrade to be
able to analyze the data. Snow Bird also realized that some of the customers listed in
the data may potentially only have purchased a small item (i.e., a snack) and it does not
want to waste money sending these customers postcards. To be able to create a list of
potential repeat customers, the data first must be scrubbed.

Which of the following four dimensions of big data is not discussed in the above
scenario?

CalculatorTime Value Tables


A. Volume
B. Velocity
C. Variety
D. Veracity
Explanation

Choice "C" is correct.

The variety of data refers to the best "big data" coming from a variety of sources,
including customer relationship management systems, social media feedback, point-of-
sale records, and other sources. The scenario only discusses big data collected from
the Snow Bird ski shop, not other potential sources from which Snow Bird may also
collect data (ticket office, restaurant, etc.).

Choice "A" is incorrect. The volume of big data refers to data being too large for
traditional database software to store. The scenario discussed that Snow Bird realized
the data collected was too large for the current system.

Choice "B" is incorrect. The velocity of big data refers to the flow of data being
continuous and the real value is being able to analyze data in real time. Although Snow
Bird has not yet had the chance to analyze the data, the goal discussed in the scenario
is to analyze the data in real time so customers receive a postcard a week after a
purchase.

Choice "D" is incorrect. The veracity of big data refers to biases or irrelevant data being
mined from big data to minimize the chance of making decisions based on the wrong
data. The scenario discussed that Snow Bird needed to scrub the data of customers
who the company believed would not become repeat customers.
Jacks Capital Inc. is putting together financial results for its annual report. Gains were
reported for each month during the past year but those were completely offset by heavy
losses in the last two months. If Jacks wants to show the relative cumulative
incremental impact of each month's results, which of the following charts would best
illustrate that?

CalculatorTime Value Tables


A. Scatter plot
B. Flowchart
C. Pyramid
D. Waterfall chart
Explanation

Choice "D" is correct. The cumulative impact of data points over time can be shown by
a waterfall chart. Each point contributes to the total of all data points, with each
incremental contribution shown at a given point in time.

A waterfall chart is the best answer because it will show both the cumulative and
incremental impact of each month's financial results for Jacks Capital. This will allow
investors to see that all months, except for two, were consistent.

Choice "A" is incorrect. Scatter plots are more for data sets that have a high volume and
they can have overlapping time periods. They also do not show the cumulative effect of
all data points.

Choice "B" is incorrect. Flowcharts are for processes. They show a path from beginning
to end with different options along the way. They do not show cumulative value.

Choice "C" is incorrect. Pyramids are for communicating foundational relationships. The
data in this example does not have this sort of relationship and does not report
cumulative value.
A local bank is looking for any patterns in its data for which customers pay back their
loans and which ones do not. The data the company has decided to use is the final
disposition of the loan (paid or defaulted), the customer's income, the amount of the
loan, and the proportion between those two values.

Which of the following data visualization techniques would be most suited to facilitate
the recognition of any patterns present?

CalculatorTime Value Tables


A. Bubble chart
B. Pie chart
C. Line graph
D. Flowchart
Explanation

Choice "A" is correct. A bubble chart is a scatter plot (a mapping of data points onto a
grid according to two or more qualities of the data, one quality for each axis forming the
grid (usually two). The spatial distribution of the data points enables pattern recognition
such as correlation and the direction of any covariant relationship. Bubble charts are
particularly useful because they can display more than two types of data without
resorting to a third or higher dimensional graph through the use of symbols, color, and
the size of the data points.

For this example, if the bank mapped its customer's income to one axis, then the bank
could use either of the other measures for the other axis, leaving the third quality to
determine the size of the bubble. Either way, the bank would have an image showing
which loans left customers more financially stretched relative to other customers.
Coloring the dots differently to show defaults versus paid loans would help the bank
discover an association between loaning a customer a higher proportion of the
customer's income and the likelihood of default.

Choice "B" is incorrect. A pie chart is used to show what proportion of the whole
comprises each subgroup. A pie chart could be made to show the relative proportions of
paid loans to defaulted loans, and a separate pie chart could show the proportions
among designated segments of income, but this visualization technique would have no
way to combine the two in a single image to discover patterns.

Choice "C" is incorrect. A line chart is used to show a progression between


observations and the trend demonstrated. The bank could use a line chart to show the
changing proportions of default as income increased, but this visualization technique
would have no way to represent individual loans or the other two data types called for
by management.

Choice "D" is incorrect. A flowchart is a diagram used to represent each step of a


complex process, such as the operation or building of a computer program. Each

You might also like