0% found this document useful (0 votes)

34 views6 pages

Reading 4 Big Data Projects - Answers

The document consists of a series of questions and explanations related to big data and machine learning concepts, including data exploration, model evaluation metrics, and characteristics of big data. It covers topics such as feature selection, precision, recall, and the importance of data curation. Each question is accompanied by an explanation that clarifies the correct answer and the reasoning behind it.

Uploaded by

r379764

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views6 pages

Reading 4 Big Data Projects - Answers

Uploaded by

r379764

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Question #1 of 13 Question ID: 1472236

In big data projects, data exploration is least likely to encompass:

A) feature selection.
B) feature engineering.
C) feature design.

Explanation

Data exploration encompasses exploratory data analysis, feature selection, and feature
engineering.

(Module 4.2, LOS 4.d)

Question #2 of 13 Question ID: 1472234

Big data is most likely to suffer from low:

A) veracity.
B) velocity.
C) variety.

Explanation

Big data is defined as data with high volume, velocity, and variety. Big data often suffers from
low veracity, because it can contain a high percentage of meaningless data.

(Module 4.1, LOS 4.a)

Question #3 of 13 Question ID: 1472237

Under which of these conditions is a machine learning model said to be underfit?

A) The model identifies spurious relationships.

B) The model treats true parameters as noise.
C) The input data are not labelled.

Explanation

Underfitting describes a machine learning model that is not complex enough to describe the
data it is meant to analyze. An underfit model treats true parameters as noise and fails to
identify the actual patterns and relationships. A model that is overfit (too complex) will tend
to identify spurious relationships in the data. Labelling of input data is related to the use of
supervised or unsupervised machine learning techniques.

(Module 4.3, LOS 4.f)

Question #4 of 13 Question ID: 1681485

Which of the following uses of data is most accurately described as curation?

A data technician accesses an offsite archive to retrieve data that has been stored
A)
there.
An investor creates a word cloud from financial analysts’ recent research reports
B)
about a company.
C) An analyst gathering data for sentiment analysis determines what sources to use.

Explanation

Data collection (curation) is determining the sources of data to be used (e.g., web scouring,
specific social media sites). Word clouds are a visualization technique. Moving data from a
storage medium to where they are needed is referred to as transfer.

(Module 4.1, LOS 4.a)

Question #5 of 13 Question ID: 1472238

When evaluating the fit of a machine learning algorithm, it is most accurate to state that:

A) precision is the percentage of correctly predicted classes out of total predictions.

accuracy is the ratio of correctly predicted positive classes to all predicted positive
B)
classes.
C) recall is the ratio of correctly predicted positive classes to all actual positive classes.

Explanation

Recall (also called sensitivity) is the ratio of correctly predicted positive classes to all actual
positive classes. Precision is the ratio of correctly predicted positive classes to all predicted
positive classes. Accuracy is the percentage of correctly predicted classes out of total
predictions.

(Module 4.3, LOS 4.c)

Question #6 of 13 Question ID: 1685260

In big data analysis, the three primary tasks involved in data exploration are most accurately
described as:

A) data collection, data curation, and data preparation.

B) exploratory data analysis, feature selection, and feature engineering.
C) data wrangling, data curation, and model training

Explanation

Data exploration involves three central tasks: exploratory data analysis, feature selection,
and feature engineering. Exploratory data analysis uses visualizations to observe and
summarize data. Feature selection is where only pertinent features from the dataset are
selected for machine learning model training. Feature engineering is the process of creating
new features by changing or transforming existing features.

(Module 4.2, LOS 4.d)

Question #7 of 13 Question ID: 1472232

An executive describes her company's "low latency, multiple terabyte" requirements for
managing Big Data. To which characteristics of Big Data is the executive referring?

A) Volume and velocity.

B) Velocity and variety.
C) Volume and variety.

Explanation
Big Data may be characterized by its volume (the amount of data available), velocity (the
speed at which data are communicated), and variety (degrees of structure in which data
exist). "Terabyte" is a measure of volume. "Latency" refers to velocity.

(Module 4.1, LOS 4.a)

Question #8 of 13 Question ID: 1685261

In big data analysis, the most appropriate method of gaining a high-level picture of the
composition of textual content is through the use of a:

A) scatterplot.
B) histogram.
C) word cloud.

Explanation

Word clouds are an effective way to gain a high-level picture of the composition of textual
content. Histograms, box plots, and scatterplots are common techniques for exploring
structured data.

(Module 4.2, LOS 4.d)

Question #9 of 13 Question ID: 1472235

The process of splitting a given text into separate words is best characterized as:

A) tokenization.
B) stemming.
C) bag-of-words.

Explanation

Text is considered to be a collection of tokens, where a token is equivalent to a word.

Tokenization is the process of splitting a given text into separate tokens. Bag-of-words (BOW)
is a collection of a distinct set of tokens from all the texts in a sample dataset. Stemming is
the process of converting inflected word forms into a base word.

(Module 4.1, LOS 4.g)

Question #10 - 13 of 13 Question ID: 1472240

Based on Exhibit 1, Karlsson's model's precision is closest to:

A) 91%.
B) 71%.
C) 81%.

Explanation

Precision, the ratio of correctly predicted positive classes (true positives) to all predicted
positive classes, is calculated as:

Precision (P) = TP /(TP + FP) = 307 / (307 + 31) = 0.9083 (91%)

In the context of this default classification, high precision would help us avoid the situation
where a bond is incorrectly predicted to default when it actually is not going to default.

(Module 4.3, LOS 4.c)

Question #11 - 13 of 13 Question ID: 1472241

Karlsson is especially concerned about the possibility that her model may indicate that a bond
will not default, but then the bond actually defaults. Karlsson decides to use the model's recall
to evaluate this possibility. Based on the data in Exhibit 1, the model's recall is closest to:

A) 83%.
B) 73%.
C) 93%.

Explanation

Recall that TP / (TP + FN) = 307 / (307 + 23) = 0.9303 = 93%.

Recall is useful when the cost of a false negative is high, such as when we predict that a bond
will not default but it actually will. In cases like this, high recall indicates that false negatives
will be minimized.

(Module 4.3, LOS 4.c)

Question #12 - 13 of 13 Question ID: 1472242

Karlsson would like to gain a sense of her model's overall performance. In her research,
Karlsson learns about the F1 score, which she hopes will provide a useful measure. Based on
Exhibit 1, Karlsson's model's F1 score is closest to:

A) 72%.
B) 82%.
C) 92%.

Explanation

The model's F1 score, which is the harmonic mean of precision and recall, is calculated as:

F1 score = (2 × P × R) / (P + R) = (2 × 0.9083 × 0.9303) / (0.9083 + 0.9303) = 0.9192

(92%)

Like accuracy, F1 is a measure of overall performance measures that gives equal weight to FP
and FN.

(Module 4.3, LOS 4.c)

Question #13 - 13 of 13 Question ID: 1472243

Karlsson also learns of the model measure of accuracy. Based on Exhibit 1, Karlsson's model's
accuracy metric is closest to:

A) 79%.
B) 89%.
C) 69%.

Explanation

The model's accuracy is the percentage of correctly predicted classes out of total predictions.
Model accuracy is calculated as:

Accuracy = (TP + TN) / (TP + FP + TN + FN) = (TP + TN) / N

= (307 + 113) / (307 + 31 + 113 + 23) = (307 + 113) / (474)
= 0.8861 = 89%

(Module 4.3, LOS 4.c)

Final Exam Update Huawei
0% (1)
Final Exam Update Huawei
13 pages
Reading 4 Big Data Projects - Answers
No ratings yet
Reading 4 Big Data Projects - Answers
6 pages
ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
Answer 2023-24
No ratings yet
Answer 2023-24
19 pages
Reading 4 Big Data Projects
No ratings yet
Reading 4 Big Data Projects
5 pages
Huawei Final Written Exam
50% (2)
Huawei Final Written Exam
18 pages
Huawei Final Written Exam 2.2 Attempts
No ratings yet
Huawei Final Written Exam 2.2 Attempts
19 pages
BigDataSolution of Paper Oct 2022
No ratings yet
BigDataSolution of Paper Oct 2022
11 pages
Storyboarding and Presentation Writing
100% (4)
Storyboarding and Presentation Writing
33 pages
MLfinal 1
No ratings yet
MLfinal 1
7 pages
Data Science Quiz Answers
No ratings yet
Data Science Quiz Answers
5 pages
AIP-210 CertNexus Certified Artificial Intelligence Practitioner Practice Questions
No ratings yet
AIP-210 CertNexus Certified Artificial Intelligence Practitioner Practice Questions
8 pages
Ai - MQP 15 2024-25
No ratings yet
Ai - MQP 15 2024-25
6 pages
Reading 4 Big Data Projects
No ratings yet
Reading 4 Big Data Projects
4 pages
Reading 4 Big Data Projects 1
No ratings yet
Reading 4 Big Data Projects 1
4 pages
NASHEEEEYYYYYY
No ratings yet
NASHEEEEYYYYYY
30 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
Ds
No ratings yet
Ds
22 pages
ISE 529 Mock Test Answers
No ratings yet
ISE 529 Mock Test Answers
6 pages
Reading 4 Big Data Projects
No ratings yet
Reading 4 Big Data Projects
2 pages
R08 Big Data Projects - Answers
No ratings yet
R08 Big Data Projects - Answers
3 pages
Reading 1 Multiple Regression - Answers
100% (1)
Reading 1 Multiple Regression - Answers
75 pages
Khoi KHDL - de On
No ratings yet
Khoi KHDL - de On
6 pages
Machine Learning - Info 4122 - 2023
No ratings yet
Machine Learning - Info 4122 - 2023
4 pages
ML mcq2
No ratings yet
ML mcq2
10 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
Exam 1
No ratings yet
Exam 1
3 pages
Sem3 Asmt Answers
No ratings yet
Sem3 Asmt Answers
20 pages
Deep Learning
No ratings yet
Deep Learning
9 pages
Data Science Final Mock Test
No ratings yet
Data Science Final Mock Test
47 pages
Ds
No ratings yet
Ds
8 pages
Mid 2
No ratings yet
Mid 2
10 pages
Data Final
No ratings yet
Data Final
17 pages
Mcqs 1
No ratings yet
Mcqs 1
34 pages
Data Analytic MCQ
No ratings yet
Data Analytic MCQ
5 pages
AIL Quiz Loc
No ratings yet
AIL Quiz Loc
33 pages
Test DS
No ratings yet
Test DS
7 pages
ML Suggestion 2
No ratings yet
ML Suggestion 2
11 pages
Reading 4 Big Data Projects
No ratings yet
Reading 4 Big Data Projects
4 pages
212 Final-Solution
No ratings yet
212 Final-Solution
23 pages
Final 2019
No ratings yet
Final 2019
15 pages
Questions-For-Data-Mining-2020 Eng Marwan
No ratings yet
Questions-For-Data-Mining-2020 Eng Marwan
19 pages
Practice MCQ AI
No ratings yet
Practice MCQ AI
4 pages
ML Endsem 2022
No ratings yet
ML Endsem 2022
7 pages
DS Question Paper
No ratings yet
DS Question Paper
2 pages
Reading 3 Machine Learning - Answers
No ratings yet
Reading 3 Machine Learning - Answers
11 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
r22 IV CSD Year Syllabus
No ratings yet
r22 IV CSD Year Syllabus
21 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
MCQs Dumps 2
No ratings yet
MCQs Dumps 2
15 pages
Computational Machine Learning Mock Test
No ratings yet
Computational Machine Learning Mock Test
6 pages
DS&BDA Techneo Unit 1&2 MCQs
No ratings yet
DS&BDA Techneo Unit 1&2 MCQs
16 pages
Hatdog 1.2
No ratings yet
Hatdog 1.2
18 pages
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
No ratings yet
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
4 pages
Reading 2 Time-Series Analysis - Answers
No ratings yet
Reading 2 Time-Series Analysis - Answers
52 pages
d3 PDF
No ratings yet
d3 PDF
7 pages
Introduction To Prompt Engineering
No ratings yet
Introduction To Prompt Engineering
17 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Blockchain Based Data Integrity Verification For Large-Scale IoT Data
No ratings yet
Blockchain Based Data Integrity Verification For Large-Scale IoT Data
9 pages
Big Data Analytics (BDAG 19-5) : Quiz: GMP - 2019 Term V
No ratings yet
Big Data Analytics (BDAG 19-5) : Quiz: GMP - 2019 Term V
2 pages
Bank Review - BERT
No ratings yet
Bank Review - BERT
23 pages
In5490 Classification
No ratings yet
In5490 Classification
85 pages
Reading 3 Machine Learning
No ratings yet
Reading 3 Machine Learning
9 pages
Database FA21-BCS-221
No ratings yet
Database FA21-BCS-221
4 pages
CC Qualis Periodicos 2017 Ranking
No ratings yet
CC Qualis Periodicos 2017 Ranking
41 pages
Wellman NetworkAnalysisBasic 1983
No ratings yet
Wellman NetworkAnalysisBasic 1983
47 pages
KM Secc
No ratings yet
KM Secc
16 pages
AI Answers
No ratings yet
AI Answers
8 pages
AI Coursework
No ratings yet
AI Coursework
43 pages
MIS - Chapter 5
No ratings yet
MIS - Chapter 5
44 pages
Cohort 9 Day 7
No ratings yet
Cohort 9 Day 7
12 pages
Emp Database
No ratings yet
Emp Database
8 pages
Hospital
No ratings yet
Hospital
14 pages
Wiley Edge Alumni Brochure UK
No ratings yet
Wiley Edge Alumni Brochure UK
23 pages
Application of Deep Neural Network in Intelligent System With Production Dashboard
No ratings yet
Application of Deep Neural Network in Intelligent System With Production Dashboard
12 pages
Lightweight Cryptography Algorithms For Resource-Constrained IoT Devices A Review Comparison and Research Opportunities
No ratings yet
Lightweight Cryptography Algorithms For Resource-Constrained IoT Devices A Review Comparison and Research Opportunities
17 pages
Ankit CCTV Resume
No ratings yet
Ankit CCTV Resume
1 page
Metadata-Drainage Classes
No ratings yet
Metadata-Drainage Classes
3 pages
File000000 1307158898
No ratings yet
File000000 1307158898
4 pages
PGDC Iiit Delhi
No ratings yet
PGDC Iiit Delhi
16 pages
Vineetha's Cover Letter
No ratings yet
Vineetha's Cover Letter
1 page
Introduction DBMS
No ratings yet
Introduction DBMS
5 pages
B.Tech III - I R18 I Mid Term Exam Timetable Dec-2023
No ratings yet
B.Tech III - I R18 I Mid Term Exam Timetable Dec-2023
9 pages
XML and Web Database
No ratings yet
XML and Web Database
10 pages
Learn OSM
No ratings yet
Learn OSM
3 pages
Army Public School Kanpur Part B Chapter-2 Project Cycle
No ratings yet
Army Public School Kanpur Part B Chapter-2 Project Cycle
5 pages

Reading 4 Big Data Projects - Answers

Uploaded by

Reading 4 Big Data Projects - Answers

Uploaded by

Question #1 of 13 Question ID: 1472236

In big data projects, data exploration is least likely to encompass:

(Module 4.2, LOS 4.d)

Question #2 of 13 Question ID: 1472234

Big data is most likely to suffer from low:

(Module 4.1, LOS 4.a)

Question #3 of 13 Question ID: 1472237

Under which of these conditions is a machine learning model said to be underfit?

A) The model identifies spurious relationships.

(Module 4.3, LOS 4.f)

Question #4 of 13 Question ID: 1681485

Which of the following uses of data is most accurately described as curation?

(Module 4.1, LOS 4.a)

Question #5 of 13 Question ID: 1472238

A) precision is the percentage of correctly predicted classes out of total predictions.

(Module 4.3, LOS 4.c)

Question #6 of 13 Question ID: 1685260

A) data collection, data curation, and data preparation.

(Module 4.2, LOS 4.d)

Question #7 of 13 Question ID: 1472232

A) Volume and velocity.

(Module 4.1, LOS 4.a)

Question #8 of 13 Question ID: 1685261

(Module 4.2, LOS 4.d)

Question #9 of 13 Question ID: 1472235

Text is considered to be a collection of tokens, where a token is equivalent to a word.

(Module 4.1, LOS 4.g)

Based on Exhibit 1, Karlsson's model's precision is closest to:

Precision (P) = TP /(TP + FP) = 307 / (307 + 31) = 0.9083 (91%)

(Module 4.3, LOS 4.c)

Question #11 - 13 of 13 Question ID: 1472241

Recall that TP / (TP + FN) = 307 / (307 + 23) = 0.9303 = 93%.

(Module 4.3, LOS 4.c)

F1 score = (2 × P × R) / (P + R) = (2 × 0.9083 × 0.9303) / (0.9083 + 0.9303) = 0.9192

(Module 4.3, LOS 4.c)

Question #13 - 13 of 13 Question ID: 1472243

Accuracy = (TP + TN) / (TP + FP + TN + FN) = (TP + TN) / N

(Module 4.3, LOS 4.c)

You might also like