0% found this document useful (0 votes)
24 views1 page

Prectical List MCA-304 (Data Science and Big Data)

The document outlines a practical list for the Third Semester MCA course at LNCT University, focusing on Data Science and Big Data. It includes tasks such as exploratory data analysis, risk evaluation in big data environments, statistical calculations in R, model building, and performance evaluation using various data processing tools. The exercises aim to enhance students' understanding of data handling, analysis, and visualization techniques in large datasets.

Uploaded by

anandiit8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views1 page

Prectical List MCA-304 (Data Science and Big Data)

The document outlines a practical list for the Third Semester MCA course at LNCT University, focusing on Data Science and Big Data. It includes tasks such as exploratory data analysis, risk evaluation in big data environments, statistical calculations in R, model building, and performance evaluation using various data processing tools. The exercises aim to enhance students' understanding of data handling, analysis, and visualization techniques in large datasets.

Uploaded by

anandiit8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

LNCT UNIVERSITY, BHOPAL

Third –Semester
MCA- 304
Introduction to Data Science and Big Data
Prectical List

1. Use a large dataset to perform exploratory data analysis (EDA). Analyze the relationships
between variables such as age, gender, and survival rate. What patterns do you observe?
2. Using a Big Data tool (like Hadoop or Spark), evaluate the risks associated with handling large
datasets. How would you mitigate the risks of data privacy and security in a large-scale data
environment?

3. Load a dataset into R and calculate the mean, median, mode, variance, and standard deviation.
Analyze how these measures describe the distribution of the data.
4. Using R, create a scatter plot with a regression line to visualize the relationship between two
variables (e.g., height and weight). How does the regression line help predict outcomes?

5. Apply a linear regression model to a dataset and evaluate the model's performance using
metrics such as R-squared and RMSE (Root Mean Square Error). What does this tell you about
the model's accuracy?
6. Build a classification model in R using logistic regression to predict whether a customer will
buy a product based on demographic factors. How would you evaluate the model's
effectiveness?

7. Using Hadoop and HDFS, perform a basic data storage and retrieval task on a large dataset
(e.g., log files). Analyze the performance differences between HDFS and traditional RDBMS
for large-scale data.
8. Implement a MapReduce algorithm in Hadoop to calculate the average transaction value from a
dataset of customer transactions. How would you implement a distributed algorithm using
MapReduce?

9. Use stream processing tools to analyze a continuous stream of data, such as real-time web
traffic. How would you filter the stream to extract useful information?
10. Implement a decaying window algorithm in stream analytics to track the moving
average of a dataset over time. How would you visualize and interpret this trend in real-time?

You might also like