Big Data Computing - - Unit 9 - Week-6
Big Data Computing - - Unit 9 - Week-6
(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)
About NPTEL
1) Point out the wrong statement. 1 point
()
Replication Factor can be configured at a cluster level (Default is set to 3)
and also at a file level
How does an
NPTEL online
Block Report from each DataNode contains a list of all the blocks that are
course work? stored on that DataNode
() User data is distributed across multiple DataNodes in the cluster and is
managed by the NameNode.
Week-0 () DataNode is aware of the files to which the blocks stored on it belong to
Week-1 () 2)What is the primary technique used by Random Forest to reduce 1 point
overfitting?
Week-2 ()
Boosting
Week-3 () Bagging
Pruning
Week-4 ()
Neural networks
Week-5 ()
What statements accurately describe Random Forest and Gradient
3) 1 point
Week-6 ()
Boosting ensemble methods?
Big Data S1: Both methods can be used for classification task
Machine
Learning (Part-I) S2: Random Forest is use for regression whereas Gradient Boosting is use for
(unit? Classification task
unit=59&lesson
=60)
S3: Random Forest is use for classification whereas Gradient Boosting is use for
Big Data regression task
Machine
Learning (Part- S4: Both methods can be used for regression
II) (unit?
S1 and S2
unit=59&lesson S2 and S4
=61)
S3 and S4
Machine
S1 and S4
Learning
Algorithm K- 4) In the context of K-means clustering with MapReduce, what role does 1 point
means using the Map phase play in handling very large datasets?
Map Reduce for
Big Data It reduces the size of the dataset by removing duplicates
Analytics (unit?
It distributes the computation of distances between data points and
unit=59&lesson
=62)
centroids across multiple nodes
It initializes multiple sets of centroids to improve clustering accuracy
Parallel K-
means using
It performs principal component analysis (PCA) on the data
Map Reduce on
Big Data Cluster 5) What is a common method to improve the performance of the K- 1 point
Analysis (unit? means algorithm when dealing with large-scale datasets in a MapReduce
unit=59&lesson environment?
=63)
Using hierarchical clustering before K-means
Quiz: Week 6:
Assignment 6 Reducing the number of clusters
(assessment? Employing mini-batch K-means
name=145)
Increasing the number of centroids
Week 6: Lecture
Notes (unit? 6) Which similarity measure is often used to determine the similarity 1 point
unit=59&lesson between two text documents by considering the angle between their vector
=64)
representations in a high-dimensional space?
Feedback for
Week 6 (unit?
Manhattan Distance
unit=59&lesson Cosine Similarity
=65) Jaccard Similarity
Week-7 () Hamming Distance
Text 7) Which distance measure calculates the distance along strictly 1 point
Transcripts () horizontal and vertical paths, consisting of segments along the axes?
Minkowski distance
DOWNLOAD
VIDEOS () Cosine similarity
Manhattan distance
Books () Euclidean distance
To ensure that every data point is used for training only once
To train the model on all the data points
To test the model on the same data multiple times
To evaluate the model’s performance on different subsets of data
10) Which of the following steps is NOT typically part of the machine 1 point
learning process?
Data Collection
Model Training
Model Deployment
Data Encryption
You may submit any number of times before the due date. The final submission will be considered for
grading.
Submit Answers