CourseCurriculum (8) - 1
CourseCurriculum (8) - 1
Course Code: CSIT363 Credit Units L T P/S SW AS/DS FW No. of PSDA Total Credit Unit
Course Level UG 3 0 2 2 0 0 0 5
Course Title Big Data and Data Analytics
Course
Description :
Course Objectives :
SN. Objectives
1 Targeting the futuristic requirement of Realtime Data Analytics.
2 Matching with the pace of availability of heterogeneous data in the format of structured and unstructured.
3 Generating the Knowledgebase discovery that will be useful for next generation of Machine Learning and Artificial Intelligence.
4 Introducing the core concepts of Big Data and Data mining, its techniques, implementation, challenges, and benefits.
Pre-Requisites : General
Theory /VAC / Architecture Assessment (L,T & Self Work): 80.00 Max : 100
Attendance+CE+EE : 5+35+60
Lab/ Practical/ Studio/Arch. Studio/ Field Work Assessment : 20.00 Max : 100
Attendance+CE+EE : 5+35+60
SN
Lab / Practical Details
.
1 1. Implement the following Data structures in Java i) Linked Lists ii) Stacks iii) Queues iv) Set v) Map.
2 2. Perform setting up and Installing Hadoop in its three operating modes: a) Standalone, Pseudo distributed, Fully distributed.
3. Implement the following file management tasks in Hadoop: • Adding files and directories • Retrieving files • Deleting files Hint: A typical Hadoop
3
workflow creates data files (such as log files) elsewhere and copies them into HDFS using one of the above command line utilities.
4 4. Write a Map Reduce program that mines weather data. Weather sensors collecting data every hour at many locations across the globe gather a
large volume of log data, which is a good candidate for analysis with MapReduce, since it is semi structured and record oriented.
5 5. Implement Matrix Multiplication with Hadoop Map Reduce.
6 6. Install and Run Pig then write Pig Latin scripts to sort, group, join, project, and filter your data.
7 7. Install and Run Hive then use Hive to create, alter, and drop databases, tables, views, functions, and indexes.
8. Data Preprocessing Using Weka: You are expected to explore, observe and understand the purpose of each button under the preprocess panel
8
after loading the ARFF file you prepared in this lab.
9 9. Try to interpret what you observe using a different ARFF file, weather.arff, provided with WEKA Tool (Open Software).
10 10. Demonstrate and analyze the result of following Data mining techniques using Weka on the data sets provided with WEKA
11 11. Classification (e.g., BayesNet, KNN, C4.5 Decision Tree, Neural Networks, SVM),
12 12. Regression (e.g., Linear Regression, Isotonic Regression, SVM for Regression),
13 13. Clustering (e.g., Simple K-means, Expectation Maximization (EM)),
14 14. Association rules (e.g., Apriori Algorithm, Predictive Accuracy, Confirmation Guided),
15 15. Feature Selection (e.g., Cfs Subset Evaluation, Information Gain, Chi-squared Statistic), and
16 16. Visualization (e.g., View different two-dimensional plots of the data).
No.of PSDA : 3
SN. PSDA Point
1 Practice and develop skills on Microsoft Azure.
2 Practice and develop data analytics skills on Weka Tool.
3 Practice and develop skills on AWS framework.