Data Analytics (KIT 601)
Course Outcome ( CO) Bloom’s Knowledge Level (KL)
At the end of course , the student will be able to
CO 1 Discuss various concepts of data analytics pipeline K1, K2
CO 2 Apply classification and regression techniques K3
CO 3 Explain and apply mining techniques on streaming data K2, K3
CO 4 Compare different clustering and frequent pattern mining algorithms K4
CO 5 Describe the concept of R programming and implement analytics on Big data using R. K2,K3
DETAILED SYLLABUS 3-0-0
Unit Topic Proposed
Lecture
Introduction to Data Analytics: Sources and nature of data, classification of data
(structured, semi-structured, unstructured), characteristics of data, introduction to Big Data
platform, need of data analytics, evolution of analytic scalability, analytic process and
I tools, analysis vs reporting, modern data analytic tools, applications of data analytics. 08
Data Analytics Lifecycle: Need, key roles for successful analytic projects, various phases
of data analytics lifecycle – discovery, data preparation, model planning, model building,
communicating results, operationalization.
Data Analysis: Regression modeling, multivariate analysis, Bayesian modeling, inference
and Bayesian networks, support vector and kernel methods, analysis of time series: linear
II systems analysis & nonlinear dynamics, rule induction, neural networks: learning and 08
generalisation, competitive learning, principal component analysis and neural networks,
fuzzy logic: extracting fuzzy models from data, fuzzy decision trees, stochastic search
methods.
Mining Data Streams: Introduction to streams concepts, stream data model and
architecture, stream computing, sampling data in a stream, filtering streams, counting
III distinct elements in a stream, estimating moments, counting oneness in a window, 08
decaying window, Real-time Analytics Platform ( RTAP) applications, Case studies – real
time sentiment analysis, stock market predictions.
Frequent Itemsets and Clustering: Mining frequent itemsets, market based modelling,
Apriori algorithm, handling large data sets in main memory, limited pass algorithm,
IV counting frequent itemsets in a stream, clustering techniques: hierarchical, K-means, 08
clustering high dimensional data, CLIQUE and ProCLUS, frequent pattern based clustering
methods, clustering in non-euclidean space, clustering for streams and parallelism.
Frame Works and Visualization: MapReduce, Hadoop, Pig, Hive, HBase, MapR,
Sharding, NoSQL Databases, S3, Hadoop Distributed File Systems, Visualization: visual
V data analysis techniques, interaction techniques, systems and applications.
Introduction to R - R graphical user interfaces, data import and export, attribute and data 08
types, descriptive statistics, exploratory data analysis, visualization before analysis,
analytics for unstructured data.
Text books and References:
1. Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer
2. Anand Rajaraman and Jeffrey David Ullman, Mining of Massive Datasets, Cambridge University Press.
3. John Garrett,Data Analytics for IT Networks : Developing Innovative Use Cases, Pearson Education
Curriculum & Evaluation Scheme IT & CSI (V & VI semester) 23