0% found this document useful (0 votes)
3 views

Big Data Computing - - Unit 9 - Week-6

The document outlines the details of an assignment for the NPTEL Big Data Computing course, including submission deadlines and various questions related to machine learning concepts. It covers topics such as Random Forest, K-means clustering, similarity measures, and the purpose of validation sets in machine learning. Students can submit their answers multiple times before the due date for grading.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Big Data Computing - - Unit 9 - Week-6

The document outlines the details of an assignment for the NPTEL Big Data Computing course, including submission deadlines and various questions related to machine learning concepts. It covers topics such as Random Forest, K-means clustering, similarity measures, and the purpose of validation sets in machine learning. Students can submit their answers multiple times before the due date for grading.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

X

(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)

[email protected]

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Big Data Computing (course)

Course Week 6: Assignment 6


outline Your last recorded submission was on 2024-09-25, 11:02 IST Due date: 2024-10-02, 23:59 IST.

About NPTEL
1) Point out the wrong statement. 1 point
()
Replication Factor can be configured at a cluster level (Default is set to 3)
and also at a file level
How does an
NPTEL online
Block Report from each DataNode contains a list of all the blocks that are
course work? stored on that DataNode
() User data is distributed across multiple DataNodes in the cluster and is
managed by the NameNode.
Week-0 () DataNode is aware of the files to which the blocks stored on it belong to

Week-1 () 2)What is the primary technique used by Random Forest to reduce 1 point
overfitting?
Week-2 ()
Boosting
Week-3 () Bagging
Pruning
Week-4 ()
Neural networks
Week-5 ()
What statements accurately describe Random Forest and Gradient
3) 1 point

Week-6 ()
Boosting ensemble methods?

Big Data S1: Both methods can be used for classification task
Machine
Learning (Part-I) S2: Random Forest is use for regression whereas Gradient Boosting is use for
(unit? Classification task
unit=59&lesson
=60)
S3: Random Forest is use for classification whereas Gradient Boosting is use for
Big Data regression task
Machine
Learning (Part- S4: Both methods can be used for regression
II) (unit?
S1 and S2
unit=59&lesson S2 and S4
=61)
S3 and S4
Machine
S1 and S4
Learning
Algorithm K- 4) In the context of K-means clustering with MapReduce, what role does 1 point
means using the Map phase play in handling very large datasets?
Map Reduce for
Big Data It reduces the size of the dataset by removing duplicates
Analytics (unit?
It distributes the computation of distances between data points and
unit=59&lesson
=62)
centroids across multiple nodes
It initializes multiple sets of centroids to improve clustering accuracy
Parallel K-
means using
It performs principal component analysis (PCA) on the data
Map Reduce on
Big Data Cluster 5) What is a common method to improve the performance of the K- 1 point
Analysis (unit? means algorithm when dealing with large-scale datasets in a MapReduce
unit=59&lesson environment?
=63)
Using hierarchical clustering before K-means
Quiz: Week 6:
Assignment 6 Reducing the number of clusters
(assessment? Employing mini-batch K-means
name=145)
Increasing the number of centroids
Week 6: Lecture
Notes (unit? 6) Which similarity measure is often used to determine the similarity 1 point
unit=59&lesson between two text documents by considering the angle between their vector
=64)
representations in a high-dimensional space?
Feedback for
Week 6 (unit?
Manhattan Distance
unit=59&lesson Cosine Similarity
=65) Jaccard Similarity
Week-7 () Hamming Distance

Text 7) Which distance measure calculates the distance along strictly 1 point
Transcripts () horizontal and vertical paths, consisting of segments along the axes?
Minkowski distance
DOWNLOAD
VIDEOS () Cosine similarity
Manhattan distance
Books () Euclidean distance

8) What is the purpose of a validation set in machine learning? 1 point

To train the model on unseen data


To evaluate the model’s performance on the training data
To tune hyperparameters and prevent overfitting
To test the final model’s performance

9) In K-fold cross-validation, what is the purpose of splitting the dataset 1 point


into K folds?

To ensure that every data point is used for training only once
To train the model on all the data points
To test the model on the same data multiple times
To evaluate the model’s performance on different subsets of data
10) Which of the following steps is NOT typically part of the machine 1 point
learning process?

Data Collection
Model Training
Model Deployment
Data Encryption
You may submit any number of times before the due date. The final submission will be considered for
grading.
Submit Answers

You might also like