Big Data Computing - Unit 9 - Week-6

The document outlines the details of an assignment for the NPTEL Big Data Computing course, including submission deadlines and various questions related to machine learning concepts. It covers topics such as Random Forest, K-means clustering, similarity measures, and the purpose of validation sets in machine learning. Students can submit their answers multiple times before the due date for grading.

Uploaded by

21102042.atharva.dalvi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views3 pages

Big Data Computing - Unit 9 - Week-6

Uploaded by

21102042.atharva.dalvi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

X

(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)

[email protected] 

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Big Data Computing (course)

Course Week 6: Assignment 6

outline Your last recorded submission was on 2024-09-25, 11:02 IST Due date: 2024-10-02, 23:59 IST.

About NPTEL
1) Point out the wrong statement. 1 point
()
Replication Factor can be configured at a cluster level (Default is set to 3)
and also at a file level
How does an
NPTEL online
Block Report from each DataNode contains a list of all the blocks that are
course work? stored on that DataNode
() User data is distributed across multiple DataNodes in the cluster and is
managed by the NameNode.
Week-0 () DataNode is aware of the files to which the blocks stored on it belong to

Week-1 () 2)What is the primary technique used by Random Forest to reduce 1 point
overfitting?
Week-2 ()
Boosting
Week-3 () Bagging
Pruning
Week-4 ()
Neural networks
Week-5 ()
What statements accurately describe Random Forest and Gradient
3) 1 point

Week-6 ()
Boosting ensemble methods?

Big Data S1: Both methods can be used for classification task
Machine
Learning (Part-I) S2: Random Forest is use for regression whereas Gradient Boosting is use for
(unit? Classification task
unit=59&lesson
=60)
S3: Random Forest is use for classification whereas Gradient Boosting is use for
Big Data regression task
Machine
Learning (Part- S4: Both methods can be used for regression
II) (unit?
S1 and S2
unit=59&lesson S2 and S4
=61)
S3 and S4
Machine
S1 and S4
Learning
Algorithm K- 4) In the context of K-means clustering with MapReduce, what role does 1 point
means using the Map phase play in handling very large datasets?
Map Reduce for
Big Data It reduces the size of the dataset by removing duplicates
Analytics (unit?
It distributes the computation of distances between data points and
unit=59&lesson
=62)
centroids across multiple nodes
It initializes multiple sets of centroids to improve clustering accuracy
Parallel K-
means using
It performs principal component analysis (PCA) on the data
Map Reduce on
Big Data Cluster 5) What is a common method to improve the performance of the K- 1 point
Analysis (unit? means algorithm when dealing with large-scale datasets in a MapReduce
unit=59&lesson environment?
=63)
Using hierarchical clustering before K-means
Quiz: Week 6:
Assignment 6 Reducing the number of clusters
(assessment? Employing mini-batch K-means
name=145)
Increasing the number of centroids
Week 6: Lecture
Notes (unit? 6) Which similarity measure is often used to determine the similarity 1 point
unit=59&lesson between two text documents by considering the angle between their vector
=64)
representations in a high-dimensional space?
Feedback for
Week 6 (unit?
Manhattan Distance
unit=59&lesson Cosine Similarity
=65) Jaccard Similarity
Week-7 () Hamming Distance

Text 7) Which distance measure calculates the distance along strictly 1 point
Transcripts () horizontal and vertical paths, consisting of segments along the axes?
Minkowski distance
DOWNLOAD
VIDEOS () Cosine similarity
Manhattan distance
Books () Euclidean distance

8) What is the purpose of a validation set in machine learning? 1 point

To train the model on unseen data

To evaluate the model’s performance on the training data
To tune hyperparameters and prevent overfitting
To test the final model’s performance

9) In K-fold cross-validation, what is the purpose of splitting the dataset 1 point

into K folds?

To ensure that every data point is used for training only once
To train the model on all the data points
To test the model on the same data multiple times
To evaluate the model’s performance on different subsets of data
10) Which of the following steps is NOT typically part of the machine 1 point
learning process?

Data Collection
Model Training
Model Deployment
Data Encryption
You may submit any number of times before the due date. The final submission will be considered for
grading.
Submit Answers

B20-ml Basedbotnet Attack in IoT Devices
No ratings yet
B20-ml Basedbotnet Attack in IoT Devices
66 pages
CS3491 Ai Lab Manula R2021 Final
100% (4)
CS3491 Ai Lab Manula R2021 Final
43 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
KNN Interview Question Rev 2.0
No ratings yet
KNN Interview Question Rev 2.0
17 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
(English) Introduction To Large Language Models (DownSub - Com)
No ratings yet
(English) Introduction To Large Language Models (DownSub - Com)
9 pages
ML 750 MCQS
No ratings yet
ML 750 MCQS
258 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
05 KNN
No ratings yet
05 KNN
49 pages
Descriptive Statistics Project
No ratings yet
Descriptive Statistics Project
11 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Unit II - 2 - Supervised Learning
No ratings yet
Unit II - 2 - Supervised Learning
23 pages
Distributed Linear Regression Class Notes
No ratings yet
Distributed Linear Regression Class Notes
140 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
ccs341 Data Warehousing Lab Manual2021
No ratings yet
ccs341 Data Warehousing Lab Manual2021
50 pages
Machine Learning (BTCOC603 - Y23) Supplementary December 2024
No ratings yet
Machine Learning (BTCOC603 - Y23) Supplementary December 2024
4 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
ML04 KNN-SVM 2024-2025
No ratings yet
ML04 KNN-SVM 2024-2025
57 pages
Data Mining - 2023 Solutions
No ratings yet
Data Mining - 2023 Solutions
26 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
D1-22683 Aam Tyan 2023-24 SMD
No ratings yet
D1-22683 Aam Tyan 2023-24 SMD
6 pages
Nptel Assignment
No ratings yet
Nptel Assignment
28 pages
ML 4
No ratings yet
ML 4
33 pages
23CS0902
No ratings yet
23CS0902
13 pages
DSM MOd 5
No ratings yet
DSM MOd 5
34 pages
Unit 5
No ratings yet
Unit 5
28 pages
Coincent Data Analysis Answers
No ratings yet
Coincent Data Analysis Answers
16 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
AI (X) PRACTICE PAPER 1-5 (Code 417) (2024) With Watermark Reduce File
No ratings yet
AI (X) PRACTICE PAPER 1-5 (Code 417) (2024) With Watermark Reduce File
20 pages
ML Merged PDF
No ratings yet
ML Merged PDF
14 pages
ML Lec-10
No ratings yet
ML Lec-10
19 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Machine Learning Solutions
No ratings yet
Machine Learning Solutions
6 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Week 6-1
No ratings yet
Week 6-1
9 pages
Aiml FML Answer Key
No ratings yet
Aiml FML Answer Key
13 pages
ML Unit 2
No ratings yet
ML Unit 2
24 pages
CAT2 Key
No ratings yet
CAT2 Key
10 pages
Instance Based Learning: 09s1: COMP9417 Machine Learning and Data Mining
No ratings yet
Instance Based Learning: 09s1: COMP9417 Machine Learning and Data Mining
9 pages
Mohamed Sathik: Assessment Report
No ratings yet
Mohamed Sathik: Assessment Report
9 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Big Data Computing - Assignment 6
No ratings yet
Big Data Computing - Assignment 6
3 pages
Big Data Computing - Unit 10 - Week-7
No ratings yet
Big Data Computing - Unit 10 - Week-7
3 pages
Data Analytics With Python - Unit 13 - Week 11
No ratings yet
Data Analytics With Python - Unit 13 - Week 11
4 pages
Data Analytics With Python - Week 12 - 2022
No ratings yet
Data Analytics With Python - Week 12 - 2022
3 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
CSE381 Introduction To Machine Learning - Image Classification and Loss Functions: Theoretical Questions and Answers
No ratings yet
CSE381 Introduction To Machine Learning - Image Classification and Loss Functions: Theoretical Questions and Answers
8 pages
ML Lesson Plan
No ratings yet
ML Lesson Plan
4 pages
Week 3
No ratings yet
Week 3
11 pages
CS 5785 - Applied Machine Learning - Lec. 1: 1 Logistics
No ratings yet
CS 5785 - Applied Machine Learning - Lec. 1: 1 Logistics
7 pages
Big Data Computing - Assignment 7
No ratings yet
Big Data Computing - Assignment 7
3 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Noc20-Cs28 Week 08 Assignment 01 PDF
No ratings yet
Noc20-Cs28 Week 08 Assignment 01 PDF
3 pages
Notes: KNN: K-Nearest Neighbors
No ratings yet
Notes: KNN: K-Nearest Neighbors
4 pages
Lesson Plan - ML - Spring 2023
No ratings yet
Lesson Plan - ML - Spring 2023
4 pages
Diploma Quiz 2 PYQ 4 ?
No ratings yet
Diploma Quiz 2 PYQ 4 ?
224 pages
ML-Mini QB
No ratings yet
ML-Mini QB
5 pages
A) It Is Probably A Overfitted Model
No ratings yet
A) It Is Probably A Overfitted Model
2 pages
Midterm - APS1070 - 2019 - 09 Fall
No ratings yet
Midterm - APS1070 - 2019 - 09 Fall
2 pages
Introduction To Machine Learning - Unit 10 - Week 7
No ratings yet
Introduction To Machine Learning - Unit 10 - Week 7
5 pages
Fraudulent Insurance Claims Detection Using Machine Learning
No ratings yet
Fraudulent Insurance Claims Detection Using Machine Learning
54 pages
29 K-Nearest Neighbor and Summing Up The End-To-End Workflow
No ratings yet
29 K-Nearest Neighbor and Summing Up The End-To-End Workflow
6 pages
(INTI
No ratings yet
(INTI
9 pages
Machine Learning For Engineering and Science Applications - Unit 14 - Week 11
No ratings yet
Machine Learning For Engineering and Science Applications - Unit 14 - Week 11
2 pages
Semester - 6-Machine Learning
No ratings yet
Semester - 6-Machine Learning
4 pages
Introduction To Machine Learning - Unit 3 - Week 1
No ratings yet
Introduction To Machine Learning - Unit 3 - Week 1
3 pages
1ST Ai ML
No ratings yet
1ST Ai ML
21 pages
Transfer Learning With VGG16 and Inceptionv3 Model For Classification of Potato Leaf Disease
No ratings yet
Transfer Learning With VGG16 and Inceptionv3 Model For Classification of Potato Leaf Disease
14 pages
Karampinis Et Al. (2024)
No ratings yet
Karampinis Et Al. (2024)
17 pages
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
No ratings yet
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
6 pages
Image Processing of Big Data For Plant Diseases of
No ratings yet
Image Processing of Big Data For Plant Diseases of
32 pages
07 Image Classification Manual
No ratings yet
07 Image Classification Manual
30 pages
Machine Learning Techniques For Vibration-Based Condition Monitoring-A Review
No ratings yet
Machine Learning Techniques For Vibration-Based Condition Monitoring-A Review
6 pages
Product Aesthetic Design - A Machine Learning Augmentation
No ratings yet
Product Aesthetic Design - A Machine Learning Augmentation
29 pages
A Comparison of Deep Learning Methods For Urban Traffic Forecasting Using Floating Car Data
No ratings yet
A Comparison of Deep Learning Methods For Urban Traffic Forecasting Using Floating Car Data
8 pages
Deep Alignment Network: A Convolutional Neural Network For Robust Face Alignment
No ratings yet
Deep Alignment Network: A Convolutional Neural Network For Robust Face Alignment
10 pages
23 Cleaning Robot Object Detection For Sweeping Robots in Home Scenes ODSR-IHS A Novel Benchmark Dataset
No ratings yet
23 Cleaning Robot Object Detection For Sweeping Robots in Home Scenes ODSR-IHS A Novel Benchmark Dataset
9 pages
2019-12 Classification of Pruning Methodologies For Model Development Using Data Mining Techniques
No ratings yet
2019-12 Classification of Pruning Methodologies For Model Development Using Data Mining Techniques
5 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
11 pages
Sharpening The Blade Missing Data Imputation Using Supervised Machine Learning
No ratings yet
Sharpening The Blade Missing Data Imputation Using Supervised Machine Learning
24 pages
DL Question Bank
No ratings yet
DL Question Bank
5 pages
2nd Course
No ratings yet
2nd Course
4 pages
Cryptocurrency Prediction Using Machine Learning IJERTCONV11IS03014
No ratings yet
Cryptocurrency Prediction Using Machine Learning IJERTCONV11IS03014
5 pages
2408.12959v1 Multimodal ICL
No ratings yet
2408.12959v1 Multimodal ICL
12 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet