0% found this document useful (0 votes)

10 views

Data Final

Uploaded by

Goutam Thukral

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Data Final

Uploaded by

Goutam Thukral

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Quiz

Question 1 (1 point)

Which of the following is not an example of data cleaning:

Question 1 options:

Removing entries where there is a negative value in the 'age' column

Rounding up the 'Customer ID' column to the nearest hundred
Deleting the entire 'gender' column because this particular study is not interested in the gender of the
customer
Filtering out and removing all customers over the age of 50 because they are not a target demographic for
this specific study

Question 2 (1 point)

Given this list of prices:

2 1 3 5 5 11 8 10 4 19 9
Using the list of prices above, find the min-max normalized price of
apples for the apples sold at $11
Question 2 options:

8/18
9/18
10/18
11/18

Question 3 (1 point)
Which is the measurement used in the k-NN algorithm to
determine the classification of any given data point in the test
data set?
Question 3 options:

Entropy
Probability
Variance
Distance

Question 4 (1 point)

There is no target variable identified in both supervised and

unsupervised data mining because the target variable is
identified based on the results.
Question 4 options:

True
False

Question 5 (1 point)

Given this list of prices:

2 1 3 5 5 11 8 10 4 19 9
Using the list of prices above, find the decimal scaling price for apples
selling at $12.30 / lbs.
Question 5 options:

12.3
1.23
0.123
0.0123

Question 6 (1 point)

Mean-squared Error is used to:

Question 6 options:

Evaluate the relationship between a test data set and training data set
Balance classification models where target variables have a significantly lower frequency than other
classes
Establish baseline performance to determine whether results produced by a data model are within the
confidence interval of expected results
Combine bias and variance to evaluate the accuracy of model estimation for a continuous target
variable

Question 7 (1 point)

Which of the following best defines clustering?

Question 7 options:

To group similar records together without the usage of a target variable

Classification of records into groups based on similarity
A hierarchical method of aggregating data into a combinations of clusters
An effective method to group data records and used as an alternative to classification by neural network

Question 8 (1 point)

Select the properties of left-skewed data.

Question 8 options:
Positive skewness, and median is smaller than the mean
Positive skewness, and median is greater than the mean
Negative skewness, and median is smaller than the mean
Negative skewness, and median is greater than the mean

Question 9 (1 point)

Unsupervised data mining always requires human input.

Question 9 options:

True
False

Question 10 (1 point)
Should be -5 but not an answer????

Given this list of prices:

2 1 3 5 5 11 8 10 4 19 9
Using the list of prices above, find the Z-score for the apples sold at $2
given a standard deviation of 1.0.
Question 10 options:

3
1
-1
-3

Question 11 (1 point)
In the description task of data mining, analysts do this:
Question 11 options:

Identify methods to describe observed trends and patterns within the data
Describe the data at hand before moving on to the data cleaning and data transformation stage
Perform classification tasks such as creating decision trees and neural networks, and creating a report
based on the results
All of the above

Question 12 (1 point)
Skew = 3 * (Mean – Median) / Standard Deviation.

Given the following data, calculate the skewness of the distribution.

Mean = 50
Median = 45
Standard Deviation = 3

Question 12 options:

-3
3
-5
5

Question 13 (1 point)

Choose the answer that fits best. Jim is an IT analyst - For his
current task at work, he would like to assess how many more IT
staff his department should hire and train to keep up with IT
requests because his company expects to add an additional 100
staff members within the next 6 months. This is an example of:
Question 13 options:

Data Mining
Prediction
Estimation
Clustering

Question 14 (1 point)

In decision trees, the root node is located at the top of a

decision tree, and extend downwards towards decision nodes
that are eventually terminated in leaf nodes.
Question 14 options:

True
False

Question 15 (1 point)

When someone wishes to examine variables, investigate

distributions of categorized variables, read histograms with their
numeric variables, or look into relationships within a set of
variables, it is a good situation to use:
Question 15 options:

Hypothesis testing
Exploratory data analysis
Supervised data modeling
Data mining

Question 16 (1 point)

The best way to reduce the margin of error is:

Question 16 options:

Increasing the sample size

Decreasing the sample size
Increasing the confidence level
Decreasing the confidence level

Question 17 (1 point)

Overfitting is observed when:

Question 17 options:
Complexity is greater than the optimal level of model complexity, and error rate on the test data set
is increasing
Complexity is greater than the optimal level of model complexity, and error rate on the test data set is
decreasing
Complexity is greater than the optimal level of model complexity, and error rate on the training data set is
increasing
None of the above

Question 18 (1 point)
Not sure if the word “gradual” affects the answer

You can consider two variables 'a' and 'b' to be linearly

correlated if an increase in 'b' is associated with a gradual
increase in 'a'.
Question 18 options:

True
False

Question 19 (1 point)

The level of optimal model complexity can be found at:

Question 19 options:

The point of minimum error rate of the test data set

The point of minimum error rate of the training data set
The point of least distance between the error rate of the test data set and the training data set
The point of maximum distance between the error rate of the test data set and the training data set

Question 20 (1 point)
If the k value is too large in k-NN, it is likely the following will
happen.
Question 20 options:

Overfitting
Training data set becomes corrupted
The most common class will dominate the classification
None of the above

Question 21 (1 point)

Margin of error can be decreased by:

Question 21 options:

Increasing sample size and increasing confidence level

Increasing sample size and decreasing confidence level
Decreasing sample size and increasing confidence level
Decreasing sample size and decreasing confidence level

Question 22 (1 point)

We should use hypothesis testing instead of exploratory data

analysis when:
Question 22 options:

We want to examine relationships among attributes within a data set

There is a large prediction error after testing against the initial hypothesized relationship between two
variables, which necessitates the forming of a new hypothesis
Many potential outliers are identified
The a priori hypothesis has been identified
Question 23 (1 point)

If the mean is 68 and the margin of error is 1.55, what is your

confidence interval?
Question 23 options:

[66.45, 69.55]
[65.55, 70.45]
[59.13, 78.20]
[0, 1.55]

Question 24 (1 point)

Confidence interval estimate can be defined as:

Question 24 options:

An example of prediction, classification, clustering, and association methods

An interval of numbers produced by a point estimate combined with the associated confidence level
The margin of error in a regression line
The percentage of results which provide normalized data points

Question 25 (1 point)

During data preprocessing, it is best practice to omit records

with missing values in order to ensure that data is clean and
easy to work with.
Question 25 options:

True
False

Question 26 (1 point)

Reasons that data mining has increased in usage across multiple

industries include:
Question 26 options:

Commercialization of products make it easier for users to find data-driven solutions to problems
Continual technological advancements have made it faster to process more data
External pressure for companies to find advantages over their competitors
All of the above

Question 27 (1 point)

Which of the following is not a common task of data mining?

Question 27 options:

Association
Prediction
Classification
Compilation
Question 28 (1 point)

Given this list of prices:

2 1 3 5 5 11 8 10 4 19 9
Using the list of prices above, calculate the mode price.
Question 28 options:

5
8
7
4

Question 29 (1 point)

Which of the following is most likely to be a duplicate record in

the database and should be removed after further investigation?
Question 29 options:

Two customers have identical first and last names, but different birthdays
Two customers have identical 'Customer ID' fields
Data set with three nominal fields, and each field takes only four values. There are 63 records in total.
All of the above

Question 30 (1 point)
Identify the best tool for determining if a predictor is useful for
predicting a target variable.
Question 30 options:

Overlay Histogram
Directed Web Graph
Contingency Table with Row Percentages
C4.5 algorithm

Question 31 (1 point)

You can consider two variables 'a' and 'b' to be linearly

correlated if an increase in 'b' is associated with a decrease in
'a'.
Question 31 options:

True
False

Question 32 (1 point)

This is a list of prices of apples compiled around the city (in $/lbs):
2 1 3 5 5 11 8 10 4 19 9
Using the list of prices above, calculate the mean price.
Question 32 options:

5
8
7
4

Question 33 (1 point)

Multivariate graphs can be used to

Question 33 options:

Normalize data
Identify and confirm findings from the initial univariate exploration
Uncover new findings that the initial univariate exploration may have missed
All the above

Question 34 (1 point)

Clustering algorithms cannot use recursive methods of splitting.

Question 34 options:

True
False

Question 35 (1 point)

In k-means clustering, the centroid is used to:

Question 35 options:

Represent the distance between clusters and is set after initial data set partitioning
Represent the center point of between clusters and is updated during each pass
Represent the center point of a given cluster and is updated during each pass
Represent the outer bounds of any given cluster and is updated during each pass
Question 36 (1 point)
When working with potential outliers in multiple variables, a
good tool to help identify the outliers is a:
Question 36 options:

Histogram
Frequency distribution chart
Scatterplot, 2D
Least squares regression

Question 37 (1 point)

Which of the following is a requirement that must be met in

order to use a decision tree?
Question 37 options:
Target attribute classes must not be discrete, as decision tree logic cannot be applied to continuous target
variables
Training data set must provide the algorithm with target variable values
Both the training and testing dataset must be varied and rich, as this is an unsupervised learning algorithm
All training data must be normalized

Question 38 (1 point)
Clustering uses a combination of classification, estimation, and
prediction in order to segment the entire data set into
subgroups.
Question 38 options:

True
False

Question 39 (1 point)

Given this list of prices:

2 1 3 5 5 11 8 10 4 19 9
Using the list of prices above, calculate the median price.
Question 39 options:

5
8
7
4

Question 40 (1 point)
Statistical inference is a tool for:
Question 40 options:

Prediction and classification

Classification and clustering
Classification, clustering, and association
Estimation and prediction

(Ebook) Modern CSS with Tailwind, Second Edition (beta) by Noel Rappin ISBN 9781680509403, 1680509403 All Chapters Instant Download
100% (5)
(Ebook) Modern CSS with Tailwind, Second Edition (beta) by Noel Rappin ISBN 9781680509403, 1680509403 All Chapters Instant Download
81 pages
ITAE002
0% (1)
ITAE002
10 pages
MCQs (Machine Learning)
50% (22)
MCQs (Machine Learning)
7 pages
DataMining - Workbook MCQ
No ratings yet
DataMining - Workbook MCQ
16 pages
Sheet With Answers
No ratings yet
Sheet With Answers
87 pages
MCQ
100% (7)
MCQ
37 pages
Factoring Flow Chart
No ratings yet
Factoring Flow Chart
2 pages
The Idea of A New City Was Introduced in 1974
No ratings yet
The Idea of A New City Was Introduced in 1974
2 pages
CFA LVL II Quantitative Methods Study Notes
No ratings yet
CFA LVL II Quantitative Methods Study Notes
10 pages
1 Mark Rebirth
No ratings yet
1 Mark Rebirth
16 pages
Isp565 - Its665 Feb 22
No ratings yet
Isp565 - Its665 Feb 22
17 pages
Multiple Choice Questions
No ratings yet
Multiple Choice Questions
56 pages
Business Intelligence and Analytics: Systems For Decision Support, 10e (Sharda) Chapter 5 Data Mining
100% (1)
Business Intelligence and Analytics: Systems For Decision Support, 10e (Sharda) Chapter 5 Data Mining
13 pages
3 Marks Dobara
No ratings yet
3 Marks Dobara
6 pages
Data Analytics For Accounting - Exercise Chapter 3 Performing The Test Plan and Analyzing The Results
No ratings yet
Data Analytics For Accounting - Exercise Chapter 3 Performing The Test Plan and Analyzing The Results
3 pages
1 Green IT446 Test Bank 2 2
No ratings yet
1 Green IT446 Test Bank 2 2
61 pages
Itae 002 Test 1 2
0% (1)
Itae 002 Test 1 2
5 pages
unit 3 Question Bank
No ratings yet
unit 3 Question Bank
8 pages
Data Mining Exam
No ratings yet
Data Mining Exam
14 pages
IT446 Test Bank
No ratings yet
IT446 Test Bank
57 pages
DataMining - Workbook TF
No ratings yet
DataMining - Workbook TF
8 pages
Answer Midterm Exam Data Mining1 2021 - 2022
100% (1)
Answer Midterm Exam Data Mining1 2021 - 2022
4 pages
1 - Page
No ratings yet
1 - Page
11 pages
BigData ML
No ratings yet
BigData ML
10 pages
q2 Finals Itpfl7
No ratings yet
q2 Finals Itpfl7
1 page
mcqs unit 3
No ratings yet
mcqs unit 3
6 pages
Data Mining For Business Analyst Assignment
100% (1)
Data Mining For Business Analyst Assignment
9 pages
Data Mining MCQ (Multiple Choice Questions)
No ratings yet
Data Mining MCQ (Multiple Choice Questions)
7 pages
Test - 1 IDS
No ratings yet
Test - 1 IDS
20 pages
DM Quiz2 Ans DJ
No ratings yet
DM Quiz2 Ans DJ
4 pages
Soal Latihan IT Specialist Data Analytics
No ratings yet
Soal Latihan IT Specialist Data Analytics
12 pages
UIIC_AO_Dataanalytics_Syllabuscoveredthroughmcqs
No ratings yet
UIIC_AO_Dataanalytics_Syllabuscoveredthroughmcqs
333 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
Mid-Semester Regular Data Mining QP v1 PDF
No ratings yet
Mid-Semester Regular Data Mining QP v1 PDF
2 pages
Mining
No ratings yet
Mining
8 pages
2022 Final exam_all
No ratings yet
2022 Final exam_all
9 pages
Question Big data-1
No ratings yet
Question Big data-1
11 pages
Data Analytics Mid
No ratings yet
Data Analytics Mid
15 pages
DS&BDA Techneo Unit 1&2 MCQs
No ratings yet
DS&BDA Techneo Unit 1&2 MCQs
16 pages
Foundations of Data Science - R19AD253
No ratings yet
Foundations of Data Science - R19AD253
22 pages
Khoi KHDL - de On
No ratings yet
Khoi KHDL - de On
6 pages
Revision Exercise SDSC5001 Midterm
No ratings yet
Revision Exercise SDSC5001 Midterm
4 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
Graded Quiz Unit 3 PDF
No ratings yet
Graded Quiz Unit 3 PDF
10 pages
Tutorial 2 Solutions
No ratings yet
Tutorial 2 Solutions
5 pages
Questions-For-Data-Mining-2020 Eng Marwan
No ratings yet
Questions-For-Data-Mining-2020 Eng Marwan
19 pages
Itae002 Test 2
No ratings yet
Itae002 Test 2
150 pages
Itae006 Test 1 and 2
No ratings yet
Itae006 Test 1 and 2
18 pages
BD Chapter 5
No ratings yet
BD Chapter 5
14 pages
Data Mining
No ratings yet
Data Mining
3 pages
CS 4407 Quizzes PDF
No ratings yet
CS 4407 Quizzes PDF
54 pages
Data Mining IMP Objective Questions_Sep 2023
No ratings yet
Data Mining IMP Objective Questions_Sep 2023
4 pages
ML 1
No ratings yet
ML 1
51 pages
ML Unit 1 MCQ
100% (1)
ML Unit 1 MCQ
9 pages
Predictive Modeling MCQs IMT
100% (1)
Predictive Modeling MCQs IMT
19 pages
Mcq on Data Mining
No ratings yet
Mcq on Data Mining
20 pages
FRA LOng Quiz
0% (1)
FRA LOng Quiz
10 pages
Merged Ma MCQ and Descriptive
No ratings yet
Merged Ma MCQ and Descriptive
36 pages
BigDatal PDF
No ratings yet
BigDatal PDF
50 pages
Data Mining Exam Questions
No ratings yet
Data Mining Exam Questions
25 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Business Statistics For Dummies
From Everand
Business Statistics For Dummies
Alan Anderson
No ratings yet
Exam Dyna
No ratings yet
Exam Dyna
18 pages
Ege Math School Var 38
No ratings yet
Ege Math School Var 38
3 pages
Metvy Survey Analysis
No ratings yet
Metvy Survey Analysis
24 pages
Samrat Malik AEM
No ratings yet
Samrat Malik AEM
3 pages
N-Tier Architecture
No ratings yet
N-Tier Architecture
6 pages
Indian Influences On Rastafarianism (2007) BA Thesis Univ Ohio
100% (1)
Indian Influences On Rastafarianism (2007) BA Thesis Univ Ohio
53 pages
Managing Digital Business Infrastructure
No ratings yet
Managing Digital Business Infrastructure
64 pages
3 Top 30 Old Age Homes in Srirangam, Trichy - Institutions For Aged - Justdial
No ratings yet
3 Top 30 Old Age Homes in Srirangam, Trichy - Institutions For Aged - Justdial
8 pages
03 - HAMILTON-MR1 Working in The MRI Environment
No ratings yet
03 - HAMILTON-MR1 Working in The MRI Environment
29 pages
Choice Boards Packet - Menu PDF
No ratings yet
Choice Boards Packet - Menu PDF
16 pages
AndroidAuto Users Guide 051816
No ratings yet
AndroidAuto Users Guide 051816
10 pages
PHYSICS
No ratings yet
PHYSICS
29 pages
Hospital List As On 16-05-2024
No ratings yet
Hospital List As On 16-05-2024
3 pages
Handmaids Girls Secondary School: Student'S Information
No ratings yet
Handmaids Girls Secondary School: Student'S Information
1 page
Air-Launched Guided Missiles and Its Launchers
100% (1)
Air-Launched Guided Missiles and Its Launchers
19 pages
LAS Gen - Chem2 - MELC - 20 22 - Q3 Week 8
No ratings yet
LAS Gen - Chem2 - MELC - 20 22 - Q3 Week 8
11 pages
3 Dimension GB Sir Module PDF
No ratings yet
3 Dimension GB Sir Module PDF
17 pages
Fluid Mechanics 2
No ratings yet
Fluid Mechanics 2
2 pages
TEC-031100P-MET-DoR-001 (Method Statement For De-Shuttering Work For Peri)
0% (1)
TEC-031100P-MET-DoR-001 (Method Statement For De-Shuttering Work For Peri)
7 pages
Lieferantenleitfaden - v1.0 (En) Mar 19
No ratings yet
Lieferantenleitfaden - v1.0 (En) Mar 19
49 pages
Components of Optical Communication Systems-Transmitter
No ratings yet
Components of Optical Communication Systems-Transmitter
31 pages
Cnh Est 9-4-2021 Etimgo Full 04 2021 90gb Install Active
No ratings yet
Cnh Est 9-4-2021 Etimgo Full 04 2021 90gb Install Active
23 pages
Escorts Railways
No ratings yet
Escorts Railways
8 pages
Linkage and Crossing Over
100% (2)
Linkage and Crossing Over
42 pages
Development of Autonomous Bucket Wheel Reclaimer With Laser
No ratings yet
Development of Autonomous Bucket Wheel Reclaimer With Laser
5 pages
Secrets series Emergency Medicine Secrets Fifth Edition Vincent J. Markovchick Md Faaem Facep - Instantly access the full ebook content in just a few seconds
100% (1)
Secrets series Emergency Medicine Secrets Fifth Edition Vincent J. Markovchick Md Faaem Facep - Instantly access the full ebook content in just a few seconds
62 pages

Data Final

Uploaded by

Data Final

Uploaded by

Quiz

Which of the following is not an example of data cleaning:

Removing entries where there is a negative value in the 'age' column

Given this list of prices:

There is no target variable identified in both supervised and

Given this list of prices:

Mean-squared Error is used to:

Which of the following best defines clustering?

To group similar records together without the usage of a target variable

Select the properties of left-skewed data.

Unsupervised data mining always requires human input.

Given this list of prices:

Given the following data, calculate the skewness of the distribution.

In decision trees, the root node is located at the top of a

When someone wishes to examine variables, investigate

The best way to reduce the margin of error is:

Increasing the sample size

Overfitting is observed when:

You can consider two variables 'a' and 'b' to be linearly

The level of optimal model complexity can be found at:

The point of minimum error rate of the test data set

Margin of error can be decreased by:

Increasing sample size and increasing confidence level

We should use hypothesis testing instead of exploratory data

We want to examine relationships among attributes within a data set

If the mean is 68 and the margin of error is 1.55, what is your

Confidence interval estimate can be defined as:

An example of prediction, classification, clustering, and association methods

During data preprocessing, it is best practice to omit records

Reasons that data mining has increased in usage across multiple

Which of the following is not a common task of data mining?

Given this list of prices:

Which of the following is most likely to be a duplicate record in

You can consider two variables 'a' and 'b' to be linearly

Multivariate graphs can be used to

Clustering algorithms cannot use recursive methods of splitting.

In k-means clustering, the centroid is used to:

Which of the following is a requirement that must be met in

Given this list of prices:

Prediction and classification

You might also like