0% found this document useful (0 votes)
4 views

Big Data Computing - - Unit 10 - Week-7

The document outlines the Week 7 assignment for the NPTEL Big Data Computing course, detailing questions related to decision trees, bootstrapping, and machine learning techniques. It includes various topics such as the purpose of decision trees in regression, the function of bootstrapping in random forests, and the advantages of regression trees in big data environments. The assignment is due on October 9, 2024, and allows for multiple submissions before the deadline.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Big Data Computing - - Unit 10 - Week-7

The document outlines the Week 7 assignment for the NPTEL Big Data Computing course, detailing questions related to decision trees, bootstrapping, and machine learning techniques. It includes various topics such as the purpose of decision trees in regression, the function of bootstrapping in random forests, and the advantages of regression trees in big data environments. The assignment is due on October 9, 2024, and allows for multiple submissions before the deadline.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

X

(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)

[email protected]

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Big Data Computing (course)

Course Week 7: Assignment 7


outline Your last recorded submission was on 2024-09-28, 23:43 IST Due date: 2024-10-09, 23:59 IST.

1) What is the primary purpose of using a decision tree in regression 1 point


About NPTEL
() tasks within big data environments?

To classify data into distinct categories


How does an
NPTEL online
To predict continuous values based on input features
course work? To reduce the dimensionality of the dataset
() To perform clustering of similar data points

Week-0 () 2) Which statement accurately explains the function of bootstrapping 1 point


within the random forest algorithm?
Week-1 ()
Bootstrapping creates additional features to augment the dataset for
Week-2 () improved random forest performance.
Bootstrapping is not used in the random forest algorithm, it is only
Week-3 () employed in decision tree construction.
Bootstrapping produces replicas of the dataset by random sampling with
Week-4 ()
replacement, which is essential for the random forest algorithm.
Bootstrapping generates replicas of the dataset without replacement,
Week-5 ()
ensuring diversity in the random forest.
Week-6 ()
In a big data scenario using MapReduce, how is the decision tree
3) 1 point

Week-7 ()
model typically built?

Decision Trees
By using a single-node system to fit the model
for Big Data By distributing the data and computations across multiple nodes for
Analytics (unit? parallel processing
unit=67&lesson By manually sorting data before applying decision tree algorithms
=68)
By using in-memory processing on a single machine
Big Data
Predictive 4) In Apache Spark, what is the primary purpose of using cross-validation 1 point
Analytics (Part-
in machine learning pipelines?
I) (unit?
unit=67&lesson To reduce the number of features used in the model
=69)
To evaluate the model's performance by partitioning the data into training
Big Data and validation sets multiple times
Predictive
To speed up the data preprocessing phase
Analytics (Part-
II) (unit? To increase the size of the training dataset by generating synthetic
unit=67&lesson samples
=70)
5) How does gradient boosting in machine learning conceptually 1 point
Quiz: Week 7: resemble gradient descent in optimization theory?
Assignment 7
(assessment? Both techniques use large step sizes to quickly converge to a minimum
name=146)
Both methods involve iteratively adjusting model parameters based on the
Week 7: Lecture gradient to minimize a loss function
Notes (unit? Both methods rely on random sampling to update the model
unit=67&lesson
=71)
Both techniques use a fixed learning rate to ensure convergence without
overfitting
Text
Transcripts () 6) Which statement accurately describes one of the benefits of decision 1 point
trees?
DOWNLOAD
VIDEOS () Decision trees always outperform other models in predictive accuracy,
regardless of the complexity of the dataset.
Books () Decision trees can automatically handle feature interactions by combining
different features within a single tree, but a single tree’s predictive power is
often limited.
Decision trees cannot handle large datasets and are not computationally
scalable.
Decision trees require a fixed set of features and cannot adapt to new
feature interactions during training.

7) What has driven the development of specialized graph computation 1 point


engines capable of inferring complex recursive properties of graph structured
data?
Increasing demand for social media analytics
Advances in machine learning algorithms
Growing scale and importance of graph data
Expansion of blockchain technology

8) Which of these statements accurately describes bagging in the 1 point


context of understanding the random forest algorithm?
Bagging is primarily used to average predictions of decision trees in the
random forest algorithm.
Bagging is a technique exclusively designed for reducing the bias in
predictions made by decision trees.
Bagging, short for Bootstrap Aggregation, is a general method for
averaging predictions of various algorithms, not limited to decision trees, and
it works by reducing the variance of predictions.
Bagging is a method specifically tailored for improving the interpretability of
decision trees in the random forest algorithm.
9) What is a key advantage of using regression trees in a big data 1 point
environment when combined with MapReduce?
They require less computational power compared to other algorithms
They can handle both classification and regression tasks effectively
They automatically handle large-scale datasets by leveraging distributed
processing
They eliminate the need for data preprocessing

10) When implementing a regression decision tree using MapReduce, 1 point


which technique helps in managing the data that needs to be split across
different nodes?
Feature scaling
Data shuffling
Data partitioning
Model pruning
You may submit any number of times before the due date. The final submission will be considered for
grading.
Submit Answers

You might also like