0% found this document useful (0 votes)
121 views12 pages

Chapter 3

Uploaded by

abdualrahman.i.q
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views12 pages

Chapter 3

Uploaded by

abdualrahman.i.q
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 12

Analytics, Data Science, & Artificial Intelligence, 11e (Sharda)

Chapter 3 Nature of Data, Statistical Modeling, and Visualization

1) It is possible to perform analytics without data.


Answer: FALSE
Diff: 1 Page Ref: 121

2) When we refer to data, we are only referring to information that is well structured.
Answer: FALSE
Diff: 2 Page Ref: 121

3) Nominal data refers to measurements.


Answer: FALSE
Diff: 1 Page Ref: 125

4) Numeric data is an example of structured data.


Answer: TRUE
Diff: 2 Page Ref: 125

5) Data preprocessing is generally simple, straightforward, and quick.


Answer: FALSE
Diff: 2 Page Ref: 129

6) Normalizing data is a common step in the data consolidation process.


Answer: FALSE
Diff: 2 Page Ref: 130

7) The OLAP branch of descriptive analytics has also been called business intelligence.
Answer: TRUE
Diff: 2 Page Ref: 139

8) Skewness is a measure of symmetry in a distribution.


Answer: FALSE
Diff: 2 Page Ref: 145

9) A time series is a sequence of data points of the variable of interest, measured and represented
at successive points in time spaced at uniform time intervals.
Answer: TRUE
Diff: 2 Page Ref: 156

10) The linearity assumption in a regression model states that the errors of the response variable
are uncorrelated with each other.
Answer: FALSE
Diff: 2 Page Ref: 154
11) A data dashboard is any communication artifact prepared with the specific intention of
conveying information in a digestible form to whoever needs it whenever and wherever.
Answer: FALSE
Diff: 2 Page Ref: 163

12) There has been an increase in the use of computing power to produce unified reports.
Answer: TRUE
Diff: 2 Page Ref: 164

13) Information is the aggregation, summarization, and contextualization of data.


Answer: TRUE
Diff: 2 Page Ref: 166

14) Data visualization is closely related to the fields of DW management and MIS application
development.
Answer: FALSE
Diff: 2 Page Ref: 166

15) The Gantt chart (also called a network diagram) is developed primarily to simplify the
planning and scheduling of large and complex projects.
Answer: FALSE
Diff: 3 Page Ref: 173

16) Some charts or graphs are better at answering certain types of questions.
Answer: TRUE
Diff: 1 Page Ref: 171

17) There is a growing number of data visualization techniques being used to better portray
business results.
Answer: TRUE
Diff: 2 Page Ref: 176

18) According to Eckerson the most distinctive feature of a dashboard is its three layers of
information.
Answer: TRUE
Diff: 1 Page Ref: 185

19) Dashboards are not a new concept and their roots can be traced at least to the executive
information system of the 1980s.
Answer: TRUE
Diff: 2 Page Ref: 184

20) An ideal dashboard would not be transparent to the user.


Answer: FALSE
Diff: 2 Page Ref: 187
21) To satisfy this requirement, this data is recorded at or near the time of the event or
observation so that the time delay—related misrepresentation is minimized.
A) data source reliability
B) data accessibility
C) data granularity
D) data currency
Answer: D
Diff: 2 Page Ref: 124

22) To satisfy this requirement, this data has variables that are defined at the lowest (or as low as
required) level of detail for the intended use.
A) data source reliability
B) data accessibility
C) data granularity
D) data currency
Answer: C
Diff: 2 Page Ref: 124

23) These contain codes assigned to objects or events as labels that also represent the rank order
among them.
A) numeric data
B) ordinal data
C) interval data
D) nominal data
Answer: B
Diff: 2 Page Ref: 126

24) These contain measurements of simple codes assigned to objects as labels, which are not
measurements.
A) numeric data
B) ordinal data
C) interval data
D) nominal data
Answer: D
Diff: 2 Page Ref: 125

25) Eliminating duplicate data is typically a part of which data preprocessing step?
A) Data Consolidation
B) Data Cleaning
C) Data Transformation
D) Data Reduction
Answer: B
Diff: 2 Page Ref: 130
26) Reducing a data set's volume can be a portion of which data preprocessing step?
A) Data Consolidation
B) Data Cleaning
C) Data Transformation
D) Data Reduction
Answer: D
Diff: 2 Page Ref: 130

27) What measure is used to characterize the peak/tall/skinny nature of the distribution?
A) skewness
B) standard deviation
C) whisker plot
D) kurtosis
Answer: D
Diff: 3 Page Ref: 146

28) Which of the following is not a measure of central tendency?


A) mean
B) median
C) range
D) mode
Answer: C
Diff: 2 Page Ref: 142

29) This assumption in a regression analysis states that the explanatory variables are not
correlated.
A) linearity
B) multicollinearity
C) independence
D) normality
Answer: B
Diff: 2 Page Ref: 155

30) This assumption in a regression analysis states that the relationship between the response
variable and the explanatory variables is linear.
A) linearity
B) multicollinearity
C) independence
D) normality
Answer: A
Diff: 2 Page Ref: 154
31) This type of report presents an integrated view of success in an organization.
A) metric management report
B) dashboard report
C) balanced scorecard report
D) none of these
Answer: C
Diff: 2 Page Ref: 165

32) This type of report may include color-coded traffic lights for different performance levels.
A) metric management report
B) dashboard report
C) balanced scorecard report
D) none of these
Answer: B
Diff: 2 Page Ref: 165

33) The predecessors to data visualization date back to:


A) second century AD.
B) 1600's.
C) 1800's.
D) 1990's.
Answer: A
Diff: 3 Page Ref: 167

34) What made the digital distribution of both data and visualization more accessible to a
broader audience?
A) the printing press
B) the pony express
C) personal computers
D) the Internet
Answer: D
Diff: 3 Page Ref: 168

35) This figure is often used to explore the relationship between two or three variables (in 2D or
3D visuals).
A) line chart
B) bar chart
C) pie chart
D) scatter plot
Answer: D
Diff: 2 Page Ref: 172
36) This figure portrays project timelines, project tasks/activity durations, and overlap among the
tasks/activities.
A) PERT chart
B) Gantt chart
C) histogram
D) bubble chart
Answer: B
Diff: 2 Page Ref: 173

37) The combination of visualization and predictive analytics is referred to as:


A) visualization+.
B) predictive analytics+.
C) visual analytics.
D) advanced histograms.
Answer: C
Diff: 3 Page Ref: 178

38) Which of the following is not a best practice in dashboard design?


A) validate the dashboard design by a usability specialist
B) disregard guided analytics in lieu of a detailed manual
C) present information in three different levels
D) enrich the dashboard with business-user comments
Answer: B
Diff: 2 Page Ref: 187-8

39) This layer of dashboard information allows for of key performance metrics.
A) monitoring
B) analysis
C) management
D) export
Answer: A
Diff: 2 Page Ref: 185

40) This layer of dashboard information provides summarized dimensional data.


A) monitoring
B) analysis
C) management
D) export
Answer: B
Diff: 2 Page Ref: 185

41) Automated data collection systems are not only enabling businesses to collect more volumes
of data but also enhancing the data quality and ________.
Answer: integrity
Diff: 2 Page Ref: 121
42) Making data ________ for prediction means that data sets must be transformed into a flat-
file format and made ready for ingestion into those predictive algorithms.
Answer: analytics ready
Diff: 2 Page Ref: 122

43) The ________ data can be subdivided into nominal or ordinal data.
Answer: categorical
Diff: 2 Page Ref: 125

44) ________ data include measurement variables commonly found in the physical sciences and
engineering.
Answer: Ratio
Diff: 3 Page Ref: 126

45) In data reduction, reducing the number of variables is referred to as ________ reduction.
Answer: dimensional
Diff: 2 Page Ref: 131

46) Data are ________ between a certain minimum and maximum for all variables to mitigate
the potential bias.
Answer: normalized
Diff: 3 Page Ref: 130

47) Measures of ________ are the mathematical methods used to estimate or describe the degree
of variation in a given variable of interest.
Answer: dispersion
Diff: 2 Page Ref: 142

48) ________ is collection of mathematical techniques to characterize and interpret data.


Answer: Statistics
Diff: 1 Page Ref: 139

49) Logistic regression is a very popular, statistically sound, probability-based classification


algorithm that employs supervised ________.
Answer: learning
Diff: 2 Page Ref: 155

50) ________ makes no a priori assumption of whether one variable is dependent on the other(s)
and is not concerned with the relationship between variables.
Answer: Correlation
Diff: 2 Page Ref: 151

51) ________ are typically enterprise-wide agreed upon targets to be tracked against over a
period of time.
Answer: Key performance indicators or KPIs
Diff: 3 Page Ref: 165

52) Key to any successful report are ________, brevity, completeness, and correctness.
Answer: clarity
Diff: 2 Page Ref: 164

53) ________ has also single-handedly democratized both the interface conventions and the
technology for displaying interactive geography online
Answer: Google Maps
Diff: 2 Page Ref: 168

54) Data visualization usually means ________ visualization.


Answer: information
Diff: 1 Page Ref: 166

55) The ________ chart is often an enhanced version of scatter plots.


Answer: bubble
Diff: 2 Page Ref: 172

56) ________ are used to show the frequency distribution of one variable or several variables.
Answer: Histograms
Diff: 2 Page Ref: 172

57) When presenting your data analysis it is often helpful to view your analysis as a data rich
________.
Answer: story
Diff: 2 Page Ref: 179

58) An ideal dashboard would provide ________ to underlying data sources or reports, providing
more detail about the underlying comparative and evaluative context.
Answer: drill-down or drill-through
Diff: 2 Page Ref: 187

59) Specialized display ________ allow easy visual comparison of information with a minimum
of set up when using a dashboard.
Answer: widgets
Diff: 2 Page Ref: 185

60) An ideal dashboard requires little, if any, customized ________ to implement, deploy, and
maintain.
Answer: coding
Diff: 2 Page Ref: 187
61) Select and discuss one of the best practices in dashboard design.
Answer: Student selections will vary, but they will discuss one of the following best practices:
• Benchmark Key Performance Indicators with Industry Standards
• Wrap the Dashboard Metrics with Contextual Metadata
• Validate the Dashboard Design by a Usability Specialist
• Prioritize and Rank Alerts/Exceptions Streamed to the Dashboard
• Enrich the Dashboard with Business-User Comments
• Present Information in Three Different Levels
• Pick the Right Visual Construct Using Dashboard Design Principles
• Provide for Guided Analytics
Diff: 2 Page Ref: 187-8

62) List and describe the three layers of an ideal dashboard.


Answer:
1. Monitoring: Graphical, abstracted data to monitor key performance metrics.
2. Analysis: Summarized dimensional data to analyze the root cause of problems.
3. Management: Detailed operational data that identify what actions to take to resolve a problem.
Diff: 2 Page Ref: 185

63) Describe the use of both Gantt and Pert charts.


Answer:
• A Gantt chart is a special case of horizontal bar charts used to portray project timelines, project
tasks/activity durations, and overlap among the tasks/activities. By showing start and end
dates/times of tasks/activities and the overlapping relationships, Gantt charts provide an
invaluable aid for management and control of projects. For instance, Gantt charts are often used
to show project timelines, task overlaps, relative task completions (a partial bar illustrating the
completion percentage inside a bar that shows the actual task duration), resources assigned to
each task, milestones, and deliverables.
• The PERT chart (also called a network diagram) is developed primarily to simplify the
planning and scheduling of large and complex projects. A PERT chart shows precedence
relationships among project activities/tasks. It is composed of nodes (represented as circles or
rectangles) and edges (represented with directed arrows). Based on the selected PERT chart
convention, either nodes or the edges can be used to represent the project activities/tasks
(activity-on-node versus activity-on-arrow representation schema).
Diff: 2 Page Ref: 173

64) Discuss how data can be classified.


Answer: At the highest level of abstraction, one can classify data as structured and unstructured
(or semistructured). Unstructured data/semistructured data are composed of any combination of
textual, imagery, voice, and Web content. Unstructured/semistructured data will be covered in
more detail in the text mining and Web mining chapter. Structured data are what data mining
algorithms use and can be classified as categorical or numeric. The categorical data can be
subdivided into nominal or ordinal data, whereas numeric data can be subdivided into intervals
or ratios.
Diff: 2 Page Ref: 125
65) List the four steps of data preprocessing and include examples of activities that may be
completed at each step.
Answer:
• Data Consolidation
o Collect data
o Select data
o Integrate data
• Data Cleaning
o Impute values
o Reduce noise
o Eliminate duplicates
• Data Transformation
o Normalize data
o Discretize data
o Create attributes
• Data Reduction
o Reduce dimension
o Reduce volume
o Balance data
Diff: 2 Page Ref: 130

66) List and briefly define the central tendency measures of descriptive statistics.
Answer:
• The arithmetic mean (or simply mean or average) is the sum of all the values/observations
divided by the number of observations in the data set.
• The median is the measure of center value in a given data set.
• The mode is the observation that occurs most frequently.
Diff: 2 Page Ref: 140-141

67) What is the difference between correlation and regression?


Answer: Correlation makes no a priori assumption of whether one variable is dependent on the
other(s) and is not concerned with the relationship between variables; instead it gives an estimate
on the degree of association between the variables. On the other hand, regression attempts to
describe the dependence of a response variable on one (or more) explanatory variables where it
implicitly assumes that there is a one-way causal effect from the explanatory variable(s) to the
response variable, regardless of whether the path of effect is direct or indirect. Also, although
correlation is interested in the low-level relationships between two variables, regression is
concerned with the relationships between all explanatory variables and the response variable.
Diff: 3 Page Ref: 151
68) How can you determine if a regression model is good enough?
Answer: In the simplest sense, a well-fitting regression model results in predicted values close to
the observed data values. For the numerical assessment, three statistical measures are often used
in evaluating the fit of a regression model: R2(R - squared), the overall F-test, and the root mean
square error (RMSE). All three of these measures are based on the sums of the square errors
(how far the data are from the mean and how far the data are from the model's predicted values).
Different combinations of these two values provide different information about how the
regression model compares to the mean model.
Diff: 3 Page Ref: 153

69) List the five assumptions made in linear regressions and select one to discuss in depth.
Answer:
1. Linearity. This assumption states that the relationship between the response variable and the
explanatory variables is linear. That is, the expected value of the response variable is a straight-
line function of each explanatory variable while holding all other explanatory variables fixed.
Also, the slope of the line does not depend on the values of the other variables. It also implies
that the effects of different explanatory variables on the expected value of the response variable
are additive in nature.
2. Independence (of errors). This assumption states that the errors of the response variable are
uncorrelated with each other. This independence of the errors is weaker
than actual statistical independence, which is a stronger condition and is often not needed for
linear regression analysis.
3. Normality (of errors). This assumption states that the errors of the response variable are
normally distributed. That is, they are supposed to be totally random and should not represent
any nonrandom patterns.
4. Constant variance (of errors). This assumption, also called homoscedasticity, states that the
response variables have the same variance in their error regardless of the values of the
explanatory variables. In practice, this assumption is invalid if the response variable varies over a
wide enough range/scale.
5. Multicollinearity. This assumption states that the explanatory variables are not correlated (i.e.,
do not replicate the same but provide a different perspective of the information needed for the
model). Multicollinearity can be triggered by having two or more perfectly correlated
explanatory variables presented to the model (e.g., if the same explanatory variable is mistakenly
included in the model twice, one with a slight transformation of the same variable). A
correlation-based data assessment usually catches this error.
Diff: 2 Page Ref: 154-5
70) Describe the three main types of business reports.
Answer: Metric management Reports - In many organizations, business performance is
managed through outcome-oriented metrics. For external groups, these are service-level
agreements. For internal management, they are key performance indicators (KPIs). Typically,
there are enterprise-wide agreed upon targets to be tracked against over a period of time. They
can be used as part of other management strategies such as Six Sigma or total quality
management.
Dashboard-Type Reports - A popular idea in business reporting in recent years has been to
present a range of different performance indicators on one page like a dashboard in a car.
Typically, dashboard vendors would provide a set of predefined reports with static elements and
fixed structure but also allow for customization of the dashboard widgets, views, and set targets
for various metrics. It is common to have color-coded traffic lights defined for performance (red,
orange, green) to draw management's attention to particular areas. A more detailed description of
dashboards can be found in a later section of this chapter.
Balanced Scorecard Reports - This is a method developed by Kaplan and Norton that attempts to
present an integrated view of success in an organization. In addition to financial performance,
balanced scorecard—type reports also include customer, business process, and learning and
growth perspectives. More details on balanced scorecards are provided in a later section in this
chapter.
Diff: 2 Page Ref: 165

You might also like