Data Analytics Analysis
Data Analytics Analysis
I…Introduction to Data Analytics: Sources and nature of data, classification of data (structured,
semi-structured, unstructured), characteristics of data, introduction to Big Data platform, need of data
analytics, evolution of analytic scalability, analytic process and tools, analysis vs reporting, modern data
analytic tools, applications of data analytics. Data Analytics Lifecycle: Need, key roles for successful analytic
projects, various phases of data analytics lifecycle – discovery, data preparation, model planning, model
building, communicating results, operationalization.
II… Data Analysis: Regression modeling, multivariate analysis, Bayesian modeling, inference and Bayesian
networks, support vector and kernel methods, analysis of time series: linear systems analysis & nonlinear
dynamics, rule induction, neural networks: learning and generalisation, competitive learning, principal
component analysis and neural networks, fuzzy logic: extracting fuzzy models from data, fuzzy decision trees,
stochastic search methods
III…Mining Data Streams: Introduction to streams concepts, stream data model and architecture, stream
computing, sampling data in a stream, filtering streams, counting distinct elements in a stream, estimating
moments, counting oneness in a window, decaying window, Real-time Analytics Platform ( RTAP)
applications, Case studies – real time sentiment analysis, stock market predictions.
IV… Frequent Itemsets and Clustering: Mining frequent itemsets, market based modelling, Apriori algorithm,
handling large data sets in main memory, limited pass algorithm, counting frequent itemsets in a stream,
clustering techniques: hierarchical, K-means, clustering high dimensional data, CLIQUE and ProCLUS,
frequent pattern based clustering methods, clustering in non-euclidean space, clustering for streams and
parallelism.
V… Frame Works and Visualization: MapReduce, Hadoop, Pig, Hive, HBase, MapR, Sharding, NoSQL
Databases, S3, Hadoop Distributed File Systems, Visualization: visual data analysis techniques, interaction
techniques, systems and applications. Introduction to R - R graphical user interfaces, data import and export,
attribute and data types, descriptive statistics, exp
UNIT 1
2 marks
2020-21
2021-22
Discuss the need of data analytics.
2022-23
What are the main characteristics of Big Data?
2020-21
2021-22
Explain the process model and computation model for Big data platform.
2022-23
Compare and contrast analysis and reporting in data analytics with suitable example.
What are the various stages in big data analytics life cycle? Illustrate with a figure,
explaining each of them
UNIT 2
2 marks
2020-21
2021-22
Define neural network.
2022-23
10 marks
2020-21
2021-22
Explain the use and advantages of decision trees
Compare various types of support vector and kernel methods of data analysis.
Given data= {2,3,4,5,6,7;1,5,3,6,7,8}. Compute the principal component using PCA algorithm.
2022-23
What is the difference between regression modelling and Bayesian modeling? Explain in
brief.
What are the parameters used to characterize any fuzzy membership function?
What is Prediction error ? With the help of suitable example explain prediction error in
classification and regression.
UNIT 3
2 marks
2020-21
2021-22
2022-23
2021-22
Explain any one algorithm to count number of distinct elements in a data stream.
2022-23
What are the different components of a general stream processing model? List few sources
of streaming data .
UNIT 4
2 marks
2020-21
2021-22
2022-23
10 marks
2020-21
2021-22
Illustrate the K-means algorithm in detail with its advantages.
Differentiate between CLIQUE and ProCLUS clustering.
2022-23
Explain SON algorithm to find all or most frequent item sets using at most two passes.
UNIT 5
2 marks
2020-21
2021-22
2022-23
2021-22
Differentiate between NoSQL and RDBMS databases.
2022-23
What are the approaches to integrate the human in data exploration process to realize
different types of approaches to visual data mining?