Big Data Computing - - Unit 10 - Week-7
Big Data Computing - - Unit 10 - Week-7
(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)
Week-7 ()
model typically built?
Decision Trees
By using a single-node system to fit the model
for Big Data By distributing the data and computations across multiple nodes for
Analytics (unit? parallel processing
unit=67&lesson By manually sorting data before applying decision tree algorithms
=68)
By using in-memory processing on a single machine
Big Data
Predictive 4) In Apache Spark, what is the primary purpose of using cross-validation 1 point
Analytics (Part-
in machine learning pipelines?
I) (unit?
unit=67&lesson To reduce the number of features used in the model
=69)
To evaluate the model's performance by partitioning the data into training
Big Data and validation sets multiple times
Predictive
To speed up the data preprocessing phase
Analytics (Part-
II) (unit? To increase the size of the training dataset by generating synthetic
unit=67&lesson samples
=70)
5) How does gradient boosting in machine learning conceptually 1 point
Quiz: Week 7: resemble gradient descent in optimization theory?
Assignment 7
(assessment? Both techniques use large step sizes to quickly converge to a minimum
name=146)
Both methods involve iteratively adjusting model parameters based on the
Week 7: Lecture gradient to minimize a loss function
Notes (unit? Both methods rely on random sampling to update the model
unit=67&lesson
=71)
Both techniques use a fixed learning rate to ensure convergence without
overfitting
Text
Transcripts () 6) Which statement accurately describes one of the benefits of decision 1 point
trees?
DOWNLOAD
VIDEOS () Decision trees always outperform other models in predictive accuracy,
regardless of the complexity of the dataset.
Books () Decision trees can automatically handle feature interactions by combining
different features within a single tree, but a single tree’s predictive power is
often limited.
Decision trees cannot handle large datasets and are not computationally
scalable.
Decision trees require a fixed set of features and cannot adapt to new
feature interactions during training.