0% found this document useful (0 votes)
68 views1 page

MCS-226 Data Science and Big Data

The document outlines the assignment details for the course MCS-226: Data Science & Big Data, including submission deadlines, marking scheme, and the requirement to answer 10 questions worth 8 marks each, along with a viva voce component. The questions cover various topics such as Exploratory Data Analysis, hypothesis testing, data preprocessing, big data concepts, and machine learning techniques. Students are encouraged to use illustrations and follow presentation guidelines from the Programme Guide.

Uploaded by

mailabhiabhi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views1 page

MCS-226 Data Science and Big Data

The document outlines the assignment details for the course MCS-226: Data Science & Big Data, including submission deadlines, marking scheme, and the requirement to answer 10 questions worth 8 marks each, along with a viva voce component. The questions cover various topics such as Exploratory Data Analysis, hypothesis testing, data preprocessing, big data concepts, and machine learning techniques. Students are encouraged to use illustrations and follow presentation guidelines from the Programme Guide.

Uploaded by

mailabhiabhi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Course Code : MCS-226

Course Title : Data Science & Big Data


Assignment Number : MCAOL(III)/218/Assign/2024
Maximum Marks : 100
Weightage : 30%
Last Dates for Submission : 30th April, 2024 (for January session)
31st October, 2024 (for July session)

This assignment has 10 questions of 8 Marks each, answer all questions. Rest 20 marks are
for viva voce. You may use illustrations and diagrams to enhance the explanations. Please go
through the guidelines regarding assignments given in the Programme Guide for the format of
presentation.

Q1: What is Exploratory Data Analysis (EDA) and why is it important in the data science workflow? What
are the key components of the data science process?

Q2: Discuss the implications of hypothesis testing results in decision-making. Provide examples of real-
world situations where statistical hypothesis testing is commonly used.

Q3: What is data preprocessing, and why is it a crucial step in the data science workflow? Why is it
important to identify and handle outliers in a dataset during data preprocessing?

Q4: Discuss the significance of the three Vs (Volume, Velocity, Variety) in the context of big data. Provide
examples of each of the three Vs in real-world scenarios. How does MapReduce facilitate parallel
processing of large datasets? Explain the functionality of the Map function in the MapReduce
paradigm with the help of an example.

Q5: Explain the purpose of Apache Hive in the Hadoop ecosystem. How does Spark address limitations of
the traditional MapReduce model?

Q6: Define NoSQL databases and explain the primary motivations behind their development. Provide
examples of scenarios where each type of NoSQL database is suitable.

Q7: How does collaborative filtering contribute to enhancing user experience and engagement in
recommendation systems? Provide examples of industries or platforms where collaborative filtering is
widely used.

Q8: What is a Data Stream Bloom Filter? Explain its primary purpose in data stream processing. Also,
introduce the Flajolet-Martin Algorithm and its role in estimating the cardinality of a data stream.

Q9: Describe the role of link analysis in the PageRank algorithm. How are links between web pages
interpreted in the context of PageRank?

Q10: Explain the concept of decision trees in classification. Provide an example of building and visualizing
a decision tree using R. How can K-means clustering be applied to a dataset in R?

You might also like