0% found this document useful (0 votes)
9 views

Data Analytics Analysis

paper analysis of Data analysis

Uploaded by

Abhishek Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data Analytics Analysis

paper analysis of Data analysis

Uploaded by

Abhishek Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Analytics

I…Introduction to Data Analytics: Sources and nature of data, classification of data (structured,
semi-structured, unstructured), characteristics of data, introduction to Big Data platform, need of data
analytics, evolution of analytic scalability, analytic process and tools, analysis vs reporting, modern data
analytic tools, applications of data analytics. Data Analytics Lifecycle: Need, key roles for successful analytic
projects, various phases of data analytics lifecycle – discovery, data preparation, model planning, model
building, communicating results, operationalization.

II… Data Analysis: Regression modeling, multivariate analysis, Bayesian modeling, inference and Bayesian
networks, support vector and kernel methods, analysis of time series: linear systems analysis & nonlinear
dynamics, rule induction, neural networks: learning and generalisation, competitive learning, principal
component analysis and neural networks, fuzzy logic: extracting fuzzy models from data, fuzzy decision trees,
stochastic search methods

III…Mining Data Streams: Introduction to streams concepts, stream data model and architecture, stream
computing, sampling data in a stream, filtering streams, counting distinct elements in a stream, estimating
moments, counting oneness in a window, decaying window, Real-time Analytics Platform ( RTAP)
applications, Case studies – real time sentiment analysis, stock market predictions.

IV… Frequent Itemsets and Clustering: Mining frequent itemsets, market based modelling, Apriori algorithm,
handling large data sets in main memory, limited pass algorithm, counting frequent itemsets in a stream,
clustering techniques: hierarchical, K-means, clustering high dimensional data, CLIQUE and ProCLUS,
frequent pattern based clustering methods, clustering in non-euclidean space, clustering for streams and
parallelism.

V… Frame Works and Visualization: MapReduce, Hadoop, Pig, Hive, HBase, MapR, Sharding, NoSQL
Databases, S3, Hadoop Distributed File Systems, Visualization: visual data analysis techniques, interaction
techniques, systems and applications. Introduction to R - R graphical user interfaces, data import and export,
attribute and data types, descriptive statistics, exp

Blue - could be from another unit


2 markers not included from these 2 papers:
2020-2021 sem 5
2022-2023 sem 5

UNIT 1
2 marks

2020-21
2021-22
Discuss the need of data analytics.

Give the classification of data.

2022-23
What are the main characteristics of Big Data?

Generalize the role of analytical tools in Big data?


10 marks

2020-21

2021-22

Explain the process model and computation model for Big data platform.

Explain the various phases of data analytics life cycle.

Explain modern data analytics tools in detail.

2022-23

Compare and contrast analysis and reporting in data analytics with suitable example.

What are the various stages in big data analytics life cycle? Illustrate with a figure,
explaining each of them

Compare and contrast traditional analytics structure to modern analytics architecture.

Explain various phases of Data Analytics Life Cycle

UNIT 2
2 marks

2020-21

2021-22
Define neural network.

What is multivariate analysis?

2022-23

What are the purposes of regression analysis?

What do you mean by fuzzy qualitative model?

Define association rule.

State the benefits of analytic sandbox.

10 marks
2020-21

2021-22
Explain the use and advantages of decision trees

Compare various types of support vector and kernel methods of data analysis.

Given data= {2,3,4,5,6,7;1,5,3,6,7,8}. Compute the principal component using PCA algorithm.
2022-23

What is a neural network? How can it be used in analytics?

What is the difference between regression modelling and Bayesian modeling? Explain in
brief.

Explain the role of principal component analysis in neural networks.

What are the parameters used to characterize any fuzzy membership function?

Explain multivariate analysis and Bayesian network.

Differentiate between Crisp logic and Fuzzy logic.

What are the different kernel methods of Data Analytics?

What is Prediction error ? With the help of suitable example explain prediction error in
classification and regression.

UNIT 3
2 marks
2020-21

2021-22

Give the full form of RTAP and discuss its application.

What is the role of sampling data in a stream?

2022-23

What do you mean by data stream management system?

What do you mean by response modeling?


10 marks
2020-21

2021-22

Explain the architecture of data stream model.

Explain any one algorithm to count number of distinct elements in a data stream.

Discuss the case study of stock market predictions in detail.

2022-23

Explain Apriori association rule mining algorithm.

Discriminate the concept of sampling data in a stream.

Illustrate various Real Time Analytics Platforms (RTAPs) with examples.

Explain Datar-Gionis-Indyk-Motwani (DGIM) algorithm for counting oneness in a window.

Explain Bernoulli sampling with its algorithm.

What are the different components of a general stream processing model? List few sources
of streaming data .

UNIT 4
2 marks
2020-21
2021-22

Discuss the use of limited pass algorithm.

What is the principle behind hierarchical clustering technique?

2022-23

Explain the working of CLIQUE algorithm in brief.

Identify the major issues in data stream query processing.

10 marks
2020-21

2021-22
Illustrate the K-means algorithm in detail with its advantages.
Differentiate between CLIQUE and ProCLUS clustering.

A database has 5 transactions. Let min_sup=60% and min_conf=80%.


TID Items_Bought
T100 {M, O, N, K, E, Y}
T200 {D, O, N, K, E, Y}
T300 {M, A, K, E}
T400 {M, U, C, K, Y}
T500 {C, O, O, K, I, E}
i) Find all frequent itemsets using Apriori algorithm.
ii) List all the strong association rules (with support s and confidence c).

2022-23

List the advantages and disadvantages of K-Means clustering.

Why PCY algorithm is preferred over Apriori algorithm?

Explain SON algorithm to find all or most frequent item sets using at most two passes.

UNIT 5
2 marks
2020-21

2021-22

List five R functions used in descriptive statistics.

List the names of any 2 visualization tools.

2022-23

What are the benefits of visual data exploration?

Mention some main goals of Hadoop


10 marks
2020-21

2021-22
Differentiate between NoSQL and RDBMS databases.

Explain the HIVE architecture with its features in detail.

Write R function to check whether the given number is prime or not.

2022-23

What is HDFS? How does it handle Big Data?

Illustrate and explain the concept of Map Reduce framework in brief.

Write R function to check whether the given number is prime or not?

How RDBS is different from NoSQL?

Explain Apache Hadoop , KNIME & Open refine in detail

Draw and discuss the architecture of Hive in detail.

What are the approaches to integrate the human in data exploration process to realize
different types of approaches to visual data mining?

Explain Apache Hadoop , KNIME & Open refine in detail.

You might also like