0% found this document useful (0 votes)

15 views50 pages

SRU ADA Unit-1

Introduction ppts

Uploaded by

p.chandrashakerreddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views50 pages

SRU ADA Unit-1

Introduction ppts

Uploaded by

p.chandrashakerreddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

1

Advanced Data Analytics

Term: 2024-25
Unit-1

Text Books: 1. Data Mining Concepts and Techniques,

Han, Kamber, 3rd Edition, Morgan Kaufmann
Publishers

2
Unit-1 Syllabus
Overview of Data Analytics: Introduction and Importance, Types
of Data Analytics, Applications. Data Management: Design Data
Architecture and manage the data for analysis, understand
various sources of Data like Sensors/Signals/GPS etc. Data
Management, Data Quality (noise, outliers, missing values,
duplicate data) and Data Processing & Processing.

3
Introduction
Data Analytics is defined as
It is the process of examining data sets to draw
conclusions about the information they
contain, increasingly with the aid of specialized
systems and software

Data analytics helps organizations make informed

decisions, improve operational efficiency, and gain
competitive advantages by uncovering patterns,
correlations, and insights from raw data. 4
5
6
 Big data analytics enabled companies to gain deeper insights into
their customers, operations, and markets. It also laid the foundation
for advanced analytics techniques, such as predictive analytics,
machine learning, and artificial intelligence.
 Key Milestones: Development of relational databases, introduction
of big data technologies, rise of AI and machine learning

7
The Evolving role of Data Analytics
 From early statistical methods to advanced machine learning
algorithms, data analytics has evolved significantly.
 Data analytics has become a critical component in modern
decision-making processes.
 Its role has evolved from basic data reporting to advanced
predictive and prescriptive analytics.
Past:
Descriptive Analytics (Reporting, Historical Data)
Present:
Diagnostic Analytics (Root Cause Analysis, Understanding Why)
Future:
•Predictive Analytics (Forecasting, Predicting Future Outcomes)
•Prescriptive Analytics (Recommending Actions, Optimization)
8
9
Importance of Data Analytics
● Enhancing Decision-Making: Data-driven decisions reduce risks and
increase the likelihood of successful outcomes.
● Driving Business Value: Identifying new revenue opportunities,
optimizing operations, and enhancing customer experiences.
● Improving Efficiency: Streamlining processes, reducing waste, and
improving resource utilization.

10
.

11
12
Types of Data Analytics
Data analytics is an important field that involves the process of
collecting, processing, and interpreting data to uncover insights and
help in making decisions. Data analytics is the practice of examining
raw data to identify trends, draw conclusions, and extract meaningful
information. This involves various techniques and tools to process and
transform data into valuable insights that can be used for decision-
making.

There are four major types of data analytics:

● Descriptive (business intelligence and data mining)
● Predictive (forecasting)
● Prescriptive (optimization and simulation)
● Diagnostic analytics 13
14
Descriptive (business intelligence and
data mining)
Descriptive analytics looks at data and analyze past event for insight
as to how to approach future events. It looks at past performance and
understands the performance by mining historical data to understand
the cause of success or failure in the past. Almost all management
reporting such as sales, marketing, operations, and finance uses this
type of analysis.

Descriptive analytics is the first step in data analysis. The goal of

descriptive analytics is to find out what happened?
 what was the average revenue for the month of January? Or
 how many children between the ages of two and ten attend school?
 It’s the first layer of information that you can get from the data
15
Predictive
Predictive analytics turn the data into valuable, actionable information.
predictive analytics uses data to determine the probable outcome of
an event or a likelihood of a situation occurring. Predictive analytics
holds a variety of statistical techniques from modeling,
machine learning, data mining, and game theory that analyze current
and historical facts to make predictions about a future
event. Techniques that are used for

Predictive analytics are:

● Linear Regression
● Time Series Analysis and Forecasting
● Data Mining
16
Prescriptive
 Prescriptive Analytics automatically synthesize big data,
mathematical science, business rule, and machine learning to make
a prediction and then suggests a decision option to take advantage
of the prediction.
 Prescriptive analytics goes beyond predicting future outcomes by
also suggesting action benefits from the predictions and showing
the decision maker the implication of each decision option.
 Prescriptive Analytics not only anticipates what will happen and
when to happen but also why it will happen. Further, Prescriptive
Analytics can suggest decision options on how to take advantage of
a future opportunity or mitigate a future risk and illustrate the
implication of each decision option.
Prescriptive Analytics can benefit healthcare strategic planning by 17
Diagnostic Analytics
In this analysis, we generally use historical data over other data to
answer any question or for the solution of any problem. We try to find
any dependency and pattern in the historical data of the particular
problem.

For example, companies go for this analysis because it gives a great

insight into a problem, and they also keep detailed information about
their disposal otherwise data collection may turn out individual for
every problem and it will be very time-consuming. Common
techniques used for Diagnostic Analytics are:
● Data discovery
● Data mining
18
● Correlations
19
Applications of Data Analytics

20
Data Architecture Design and Data
Management
 Most of the data is generated from social media sites like Facebook,
Instagram, Twitter, etc, and the other sources can be e-business, e-
commerce transactions, hospital, school, bank data, etc. This data is
impossible to manage by traditional data storing techniques. So Big-
Data came into existence for handling the data which is big and
impure.

 Big Data is the field of collecting the large data sets from various
sources like social media, GPS, sensors etc and analyzing them
systematically and extract useful patterns using some tools and
techniques by enterprises. Before analyzing and determining the
data, the data architecture must be designed by the architect.
21
 Data architecture design is set of standards which are composed of
certain policies, rules, models and standards which manages, what
type of data is collected, from where it is collected, the arrangement
of collected data, storing that data, utilizing and securing the data
into the systems and data warehouses for further analysis.

 Data is one of the essential pillars of enterprise architecture through

which it succeeds in the execution of business strategy.
 Data architecture design is important for creating a vision of
interactions occurring between data systems, like for example if data
architect wants to implement data integration, so it will need
interaction between two systems and by using data architecture the 22
visionary model of data interaction during the process can be
Data architecture also describes the type of data structures applied
to manage data and it provides an easy way for data preprocessing.
The data architecture is formed by dividing into three essential
models and then are combined 23
● Conceptual model –
It is a business model which uses Entity Relationship (ER) model for
relation between entities and their attributes.
● Logical model –
It is a model where problems are represented in the form of logic
such as rows and column of data, classes, xml tags and other DBMS
techniques.
● Physical model –
Physical models holds the database design like which type of
database technology will be suitable for architecture.
A data architect is responsible for all the design, creation, manage,
deployment of data architecture and defines how data is to be stored
24
and retrieved, other decisions are made by internal bodies.
Factors that influence Data Architecture :
Few influences that can have an effect on data architecture are
business policies, business requirements, Technology used, economics,
and data processing needs.

Business requirements –
Business policies –
Technology in use –
Business economics –
Data processing needs –

These include factors such as mining of the data, large continuous

transactions, database management, and other data preprocessing 25
Data Management
● Data management is the process of managing tasks like extracting
data, storing data, transferring data, processing data, and then
securing data with low-cost consumption.

● Main motive of data management is to manage and safeguard the

people’s and organization data in an optimal way so that they can
easily create, access, delete, and update the data.

● Because data management is an essential process in each and

every enterprise growth, without which the policies and decisions
can’t be made for business advancement. The better the data
management the better productivity in business. 26
27
28
● Large volumes of data like big data are harder to manage
traditionally so there must be the utilization of optimal technologies
and tools for data management such as Hadoop, Scala, Tableau,
AWS, etc. Which can further used for big data analysis in achieving
improvements in patterns.

● Data management can be achieved by training the employees

necessarily and maintenance by DBA, data analyst, and data
architects.

29
Data Preprocessing Techniques
● Data Integration − Data integration involves combining multiple
datasets with similar variables or structures. R provides functions
like merge() and rbind() to merge datasets based on common
identifiers or variables. Proper data integration ensures a unified
dataset for analysis.
● Data Transformation − Data transformation involves converting
raw data into a suitable format for analysis. R provides functions
like scale(), log() or sqrt() to normalize or transform skewed data
distributions. These transformations help meet the assumptions of
statistical models and improve interpretability.
● Feature Selection − Feature selection aims to identify the most
relevant variables for analysis. R offers techniques like correlation
analysis, stepwise regression, or regularization methods (e.g., Lasso 30
● Encoding Categorical Variables − Categorical variables often
require encoding to numerical representations for analysis. R offers
functions like factor() or dummyVars() to convert categorical
variables into binary or numerical representations. This process
enables the inclusion of categorical variables in statistical models.
● Handling Imbalanced Data − Imbalanced datasets, where one
class dominates over others, can lead to biased predictions or
model performance. R provides techniques such as oversampling
(e.g., SMOTE) or under sampling to balance the dataset and
improve model training.

31
R Packages for Data Cleaning and
Preprocessing
● Tidyverse − Tidyverse is a collection of R packages, including dplyr,
tidyr, and stringr, that provide powerful tools for data manipulation,
cleaning, and tidying. These packages offer a consistent and intuitive
syntax for transforming and cleaning data.
● Caret − The caret package (Classification and Regression Training) in R
provides functions for data preprocessing, feature selection, and
resampling techniques. It offers a comprehensive set of tools for
preparing data for machine learning algorithms.
● DataPreparation − The DataPreparation package in R provides a wide
range of functions for data cleaning, transformation, and
preprocessing. It offers functionalities like missing value imputation,
outlier detection, feature scaling, and more. 32
Data Preprocessing
The data preprocessing process is divided into the following steps:

● Importing the dataset.

● Completing missing data.
● Encoding categorical data.
● Splitting the dataset.
● Feature scaling.

33
34
The command read.csv('filename') receive different optional parameters,
you will have to use some of them depending on how your dataset is
arranged on the .csv file. You can set the sep parameter to indicate
the separator on your file. For instance,

35
Completing Missing Data
● Completing missing data is optional. If your dataset is complete you
obviously will not have to do this part. But sometimes you will
find datasets with some missing cells, in that case, you could do 2
things,
● Remove a complete row (not recommended, you could delete
crucial information).
● Complete that missing information with the mean of the column.

36
Encoding Categorical Data
● This step is also optional. Depending on your dataset, you might
have from beginning on, a dataset with already encoded categorical
data. In that case you won’t need to do this.
● In our case, we have the Graduate column, this column has 2
possible values, either yes or no. In order to be able to work with
this data, we have to encode it, that means, changing the labels to
numbers.

37
Splitting the Dataset
● This part is mandatory and one of the most important parts when
working with Machine Learning models.
● Splitting the dataset means that you have to divide the whole
dataset into two parts, the training set and the test set. When you
want to train a model to solve or predict a specific thing, you first
have to train your model and then test if the models is doing a
correct prediction.
rmally the proportion is 80% training set and 20% test set, but it can vary depending on your mo
will split the dataset with that proportion.
u first have to install a package called caTools by doing the following,

packages.install('caTools')

Once installed you have to tell R that you will use that
library,
38
library(caTools)
39
Feature Scaling
● This last step is also not always necessary. In the dataset there are
some values that are not on the same scale, for example the Age
and the Income have a very different scale.
● Most of Machine Learning models work using the euclidian distance
between two points, but since the scales are different, the distance
between two points could be enormous and it could cause problems
on your model.

40
41
Different Sources of Data for Data
Analysis
● Data collection is the process of acquiring, collecting, extracting,
and storing the voluminous amount of data which may be in the
structured or unstructured form like text, video, audio, XML files,
records, or other image files used in later stages of data analysis. In
the process of big data analysis, “Data collection” is the initial step
before starting to analyze the patterns or useful information in data.
The data which is to be analyzed must be collected from different
valid sources.

● The data which is collected is known as raw data which is not useful
now but on cleaning the impure and utilizing that data for further
analysis forms information, the information obtained is known as
“knowledge”. 42
 Data collection starts with asking some questions such as what type
of data is to be collected and what is the source of collection.

 Most of the data collected are of two types known as “qualitative

data“ which is a group of non-numerical data such as words,
sentences mostly focus on behavior and actions of the group and
another one is “quantitative data” which is in numerical forms and
can be calculated using different scientific tools and sampling data.
The actual data is then further divided mainly into two types
known as:
● Primary data
● Secondary data
43
Primary data
 Primary data refers to the first hand data gathered by the
researcher himself. Sources of primary data are surveys,
observations, Experimental Methods.

44
45
46
47
48
49
Thank You

Unit-1 Data Analytics PPT-1
No ratings yet
Unit-1 Data Analytics PPT-1
109 pages
Big - Data Unit-2
100% (2)
Big - Data Unit-2
64 pages
Module 2 Data Analytics and Its Type
No ratings yet
Module 2 Data Analytics and Its Type
9 pages
Scaling Up - Verne Harnish - 1
0% (10)
Scaling Up - Verne Harnish - 1
13 pages
wk6 - Data Analytics
No ratings yet
wk6 - Data Analytics
25 pages
Data Anlytics Full Notes
No ratings yet
Data Anlytics Full Notes
186 pages
Unit-II (Data Analytics)
100% (1)
Unit-II (Data Analytics)
17 pages
Data Analytics Unit-1
No ratings yet
Data Analytics Unit-1
83 pages
Data Science Introduction
100% (1)
Data Science Introduction
54 pages
Da Unit 1 & 2
No ratings yet
Da Unit 1 & 2
44 pages
Ca 1 Merged
No ratings yet
Ca 1 Merged
677 pages
Introduction To Data Science and Data Analytics
No ratings yet
Introduction To Data Science and Data Analytics
72 pages
DA Unit 1
No ratings yet
DA Unit 1
24 pages
BDA CH 1 V1
No ratings yet
BDA CH 1 V1
48 pages
Da Unit-1
No ratings yet
Da Unit-1
23 pages
Lecture 1
No ratings yet
Lecture 1
27 pages
Module 1
No ratings yet
Module 1
49 pages
U1 Da (R18) 20102021
No ratings yet
U1 Da (R18) 20102021
23 pages
Abdur Rehman - 00829801721
No ratings yet
Abdur Rehman - 00829801721
61 pages
Unit 1 - 1 Data Analytics Introduction
No ratings yet
Unit 1 - 1 Data Analytics Introduction
73 pages
DataAnalytics UNIT 1 NOTES
No ratings yet
DataAnalytics UNIT 1 NOTES
13 pages
Data Analytics
No ratings yet
Data Analytics
17 pages
Unit II
No ratings yet
Unit II
91 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
Shruti Internship Report
No ratings yet
Shruti Internship Report
14 pages
1 Introduction To Data Analytics
No ratings yet
1 Introduction To Data Analytics
14 pages
DA Unit 2
No ratings yet
DA Unit 2
12 pages
Technical Seminar 2
No ratings yet
Technical Seminar 2
22 pages
Data Analytics-Unit1 Notes
No ratings yet
Data Analytics-Unit1 Notes
30 pages
1.data Analytics Overview and Variables Disruptive System
No ratings yet
1.data Analytics Overview and Variables Disruptive System
7 pages
Data Sci Notes
No ratings yet
Data Sci Notes
88 pages
Unit 1
No ratings yet
Unit 1
50 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
Data Analytics Unit-1
No ratings yet
Data Analytics Unit-1
87 pages
01 Konsep Big Data
No ratings yet
01 Konsep Big Data
60 pages
Unit 1
No ratings yet
Unit 1
57 pages
Here Is An Even More Detailed and Expanded Version of Chapter 1
No ratings yet
Here Is An Even More Detailed and Expanded Version of Chapter 1
5 pages
Week 1
No ratings yet
Week 1
50 pages
Bisma Itc
No ratings yet
Bisma Itc
7 pages
Chapter 1 Introduction To Data Analytics
No ratings yet
Chapter 1 Introduction To Data Analytics
4 pages
Documento General
No ratings yet
Documento General
42 pages
U1 C CLSRM
No ratings yet
U1 C CLSRM
30 pages
DA Unit 1
No ratings yet
DA Unit 1
33 pages
2.1 Data Analytics
No ratings yet
2.1 Data Analytics
16 pages
Dataanalyticsunit-1 (2) 104014
No ratings yet
Dataanalyticsunit-1 (2) 104014
51 pages
Business Analytics Chapter1 3
No ratings yet
Business Analytics Chapter1 3
3 pages
Research Paper PDF
No ratings yet
Research Paper PDF
9 pages
Data Analytics Complete Notes
No ratings yet
Data Analytics Complete Notes
33 pages
Da Unit 2
No ratings yet
Da Unit 2
18 pages
Unit I
No ratings yet
Unit I
47 pages
VO MCA S4 Applied Analytics Marketing, Web, Social Media U1
No ratings yet
VO MCA S4 Applied Analytics Marketing, Web, Social Media U1
22 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
3 pages
Da Unit 1
No ratings yet
Da Unit 1
12 pages
DataAnalyticsCh 1
No ratings yet
DataAnalyticsCh 1
13 pages
Unit 1-2
No ratings yet
Unit 1-2
8 pages
Unit 1
No ratings yet
Unit 1
21 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
BigData DataAnalyticsTypes
No ratings yet
BigData DataAnalyticsTypes
9 pages
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
No ratings yet
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
10 pages
The Empty Raincoat - : Book Review
No ratings yet
The Empty Raincoat - : Book Review
9 pages
Agriculture Law: RL31837
100% (2)
Agriculture Law: RL31837
35 pages
STMT 20250228
No ratings yet
STMT 20250228
4 pages
Chapter 5 Exchange Rate
No ratings yet
Chapter 5 Exchange Rate
11 pages
Superior Finite Element Analysis Solutions
No ratings yet
Superior Finite Element Analysis Solutions
12 pages
Factors Influencing Business Ethics
No ratings yet
Factors Influencing Business Ethics
9 pages
Philippine Gastronomical Tourism 1 1
No ratings yet
Philippine Gastronomical Tourism 1 1
20 pages
Research Paper
No ratings yet
Research Paper
10 pages
Chapter 5-Dayag-Problem 3
No ratings yet
Chapter 5-Dayag-Problem 3
7 pages
Chapter 2 - Cost Concepts and Cost Behaviours
No ratings yet
Chapter 2 - Cost Concepts and Cost Behaviours
77 pages
20EC4702D Syllabus
No ratings yet
20EC4702D Syllabus
2 pages
Sports Advisory Contract
No ratings yet
Sports Advisory Contract
3 pages
Output XML
No ratings yet
Output XML
48 pages
NCR Reported by NCR Issued To: Non-Conformance Report
No ratings yet
NCR Reported by NCR Issued To: Non-Conformance Report
3 pages
Guayakí Impact Report 2020 ENGLISH
No ratings yet
Guayakí Impact Report 2020 ENGLISH
12 pages
SRU ADA Unit-3
No ratings yet
SRU ADA Unit-3
78 pages
FSDC Paper No 44 Hong Kong-Developing Into The Global Esg Investment Hub of Asia Eng
No ratings yet
FSDC Paper No 44 Hong Kong-Developing Into The Global Esg Investment Hub of Asia Eng
25 pages
Sabrine Et Al, 2022, Syria Illicit Antiquities Trade Syria Conflict Antiquities Trade
No ratings yet
Sabrine Et Al, 2022, Syria Illicit Antiquities Trade Syria Conflict Antiquities Trade
23 pages
Bba 108
No ratings yet
Bba 108
35 pages
Entrepreneurship 1
No ratings yet
Entrepreneurship 1
15 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
16 pages
FREE READ (PDF) The Book of Useless Information
No ratings yet
FREE READ (PDF) The Book of Useless Information
1 page
Chapter 1. Presentation - Introduction To Quantitative Analysis
No ratings yet
Chapter 1. Presentation - Introduction To Quantitative Analysis
37 pages
SRU FDBMS Unit-1
No ratings yet
SRU FDBMS Unit-1
57 pages
SRU ADA Unit-2
No ratings yet
SRU ADA Unit-2
31 pages
2 ESCAP 2023 RP APTIT Foreign Direct Investment Trends Outlook Asia Pacific 2023 2024
No ratings yet
2 ESCAP 2023 RP APTIT Foreign Direct Investment Trends Outlook Asia Pacific 2023 2024
33 pages
The Smile Curve Evolving Sources of Value Added in Manufacturing
No ratings yet
The Smile Curve Evolving Sources of Value Added in Manufacturing
39 pages
2.oladimeji Ayobalogun Odunayo
No ratings yet
2.oladimeji Ayobalogun Odunayo
9 pages
Human Resources Management Activity
No ratings yet
Human Resources Management Activity
1 page
SM Topic 2
No ratings yet
SM Topic 2
28 pages
Term Papers With E-Mail IDs
No ratings yet
Term Papers With E-Mail IDs
8 pages
PDF 24
No ratings yet
PDF 24
4 pages
2.11 Tourism 2.11.1 Existing Situation: Assessment of Performance
No ratings yet
2.11 Tourism 2.11.1 Existing Situation: Assessment of Performance
10 pages
Questioner Report - Data Gathering
No ratings yet
Questioner Report - Data Gathering
7 pages
Lahore School of Economics: Bsc-Iv A
No ratings yet
Lahore School of Economics: Bsc-Iv A
1 page
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet

SRU ADA Unit-1

Uploaded by

SRU ADA Unit-1

Uploaded by

1

Advanced Data Analytics

Text Books: 1. Data Mining Concepts and Techniques,

Data analytics helps organizations make informed

There are four major types of data analytics:

Descriptive analytics is the first step in data analysis. The goal of

Predictive analytics are:

For example, companies go for this analysis because it gives a great

 Data is one of the essential pillars of enterprise architecture through

These include factors such as mining of the data, large continuous

● Main motive of data management is to manage and safeguard the

● Because data management is an essential process in each and

● Data management can be achieved by training the employees

● Importing the dataset.

 Most of the data collected are of two types known as “qualitative

You might also like