Data Final
Data Final
Question 1 (1 point)
Question 2 (1 point)
8/18
9/18
10/18
11/18
Question 3 (1 point)
Which is the measurement used in the k-NN algorithm to
determine the classification of any given data point in the test
data set?
Question 3 options:
Entropy
Probability
Variance
Distance
Question 4 (1 point)
True
False
Question 5 (1 point)
12.3
1.23
0.123
0.0123
Question 6 (1 point)
Evaluate the relationship between a test data set and training data set
Balance classification models where target variables have a significantly lower frequency than other
classes
Establish baseline performance to determine whether results produced by a data model are within the
confidence interval of expected results
Combine bias and variance to evaluate the accuracy of model estimation for a continuous target
variable
Question 7 (1 point)
Question 8 (1 point)
Question 9 (1 point)
True
False
Question 10 (1 point)
Should be -5 but not an answer????
3
1
-1
-3
Question 11 (1 point)
In the description task of data mining, analysts do this:
Question 11 options:
Identify methods to describe observed trends and patterns within the data
Describe the data at hand before moving on to the data cleaning and data transformation stage
Perform classification tasks such as creating decision trees and neural networks, and creating a report
based on the results
All of the above
Question 12 (1 point)
Skew = 3 * (Mean – Median) / Standard Deviation.
Question 12 options:
-3
3
-5
5
Question 13 (1 point)
Choose the answer that fits best. Jim is an IT analyst - For his
current task at work, he would like to assess how many more IT
staff his department should hire and train to keep up with IT
requests because his company expects to add an additional 100
staff members within the next 6 months. This is an example of:
Question 13 options:
Data Mining
Prediction
Estimation
Clustering
Question 14 (1 point)
True
False
Question 15 (1 point)
Hypothesis testing
Exploratory data analysis
Supervised data modeling
Data mining
Question 16 (1 point)
Question 17 (1 point)
Question 18 (1 point)
Not sure if the word “gradual” affects the answer
True
False
Question 19 (1 point)
Question 20 (1 point)
If the k value is too large in k-NN, it is likely the following will
happen.
Question 20 options:
Overfitting
Training data set becomes corrupted
The most common class will dominate the classification
None of the above
Question 21 (1 point)
Question 22 (1 point)
[66.45, 69.55]
[65.55, 70.45]
[59.13, 78.20]
[0, 1.55]
Question 24 (1 point)
Question 25 (1 point)
True
False
Question 26 (1 point)
Commercialization of products make it easier for users to find data-driven solutions to problems
Continual technological advancements have made it faster to process more data
External pressure for companies to find advantages over their competitors
All of the above
Question 27 (1 point)
Association
Prediction
Classification
Compilation
Question 28 (1 point)
5
8
7
4
Question 29 (1 point)
Two customers have identical first and last names, but different birthdays
Two customers have identical 'Customer ID' fields
Data set with three nominal fields, and each field takes only four values. There are 63 records in total.
All of the above
Question 30 (1 point)
Identify the best tool for determining if a predictor is useful for
predicting a target variable.
Question 30 options:
Overlay Histogram
Directed Web Graph
Contingency Table with Row Percentages
C4.5 algorithm
Question 31 (1 point)
True
False
Question 32 (1 point)
This is a list of prices of apples compiled around the city (in $/lbs):
2 1 3 5 5 11 8 10 4 19 9
Using the list of prices above, calculate the mean price.
Question 32 options:
5
8
7
4
Question 33 (1 point)
Normalize data
Identify and confirm findings from the initial univariate exploration
Uncover new findings that the initial univariate exploration may have missed
All the above
Question 34 (1 point)
True
False
Question 35 (1 point)
Represent the distance between clusters and is set after initial data set partitioning
Represent the center point of between clusters and is updated during each pass
Represent the center point of a given cluster and is updated during each pass
Represent the outer bounds of any given cluster and is updated during each pass
Question 36 (1 point)
When working with potential outliers in multiple variables, a
good tool to help identify the outliers is a:
Question 36 options:
Histogram
Frequency distribution chart
Scatterplot, 2D
Least squares regression
Question 37 (1 point)
Question 38 (1 point)
Clustering uses a combination of classification, estimation, and
prediction in order to segment the entire data set into
subgroups.
Question 38 options:
True
False
Question 39 (1 point)
5
8
7
4
Question 40 (1 point)
Statistical inference is a tool for:
Question 40 options: