Prectical List MCA-304 (Data Science and Big Data)
Prectical List MCA-304 (Data Science and Big Data)
Third –Semester
MCA- 304
Introduction to Data Science and Big Data
Prectical List
1. Use a large dataset to perform exploratory data analysis (EDA). Analyze the relationships
between variables such as age, gender, and survival rate. What patterns do you observe?
2. Using a Big Data tool (like Hadoop or Spark), evaluate the risks associated with handling large
datasets. How would you mitigate the risks of data privacy and security in a large-scale data
environment?
3. Load a dataset into R and calculate the mean, median, mode, variance, and standard deviation.
Analyze how these measures describe the distribution of the data.
4. Using R, create a scatter plot with a regression line to visualize the relationship between two
variables (e.g., height and weight). How does the regression line help predict outcomes?
5. Apply a linear regression model to a dataset and evaluate the model's performance using
metrics such as R-squared and RMSE (Root Mean Square Error). What does this tell you about
the model's accuracy?
6. Build a classification model in R using logistic regression to predict whether a customer will
buy a product based on demographic factors. How would you evaluate the model's
effectiveness?
7. Using Hadoop and HDFS, perform a basic data storage and retrieval task on a large dataset
(e.g., log files). Analyze the performance differences between HDFS and traditional RDBMS
for large-scale data.
8. Implement a MapReduce algorithm in Hadoop to calculate the average transaction value from a
dataset of customer transactions. How would you implement a distributed algorithm using
MapReduce?
9. Use stream processing tools to analyze a continuous stream of data, such as real-time web
traffic. How would you filter the stream to extract useful information?
10. Implement a decaying window algorithm in stream analytics to track the moving
average of a dataset over time. How would you visualize and interpret this trend in real-time?