0% found this document useful (0 votes)
38 views

Unit 1 Topic 1 Intro

Uploaded by

bhargavyash259
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Unit 1 Topic 1 Intro

Uploaded by

bhargavyash259
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Introduction to Data

Analytics
Dr. Anil Kumar Dubey
Associate Professor,
Computer Science & Engineering Department,
ABES EC, Ghaziabad
Affiliated to Dr. A.P.J. Abdul Kalam Technical University,
Uttar Pradesh, Lucknow
Basic
Data analytics is the process of storing,
organizing, and analyzing raw data to answer
questions or gain important insights. Data
analytics is integral to business because it allows
leadership to create evidence-based strategy,
understand customers to better target marketing
initiatives, and increase overall productivity.

Data analytics is the collection, transformation,


and organization of data in order to draw
conclusions, make predictions, and drive informed
Basic
Data analytics is a multidisciplinary field that
employs a wide range of analysis techniques,
including math, statistics, and computer science,
to draw insights from data sets.
Data analytics is a broad term that includes
everything from simply analyzing data to
theorizing ways of collecting data and creating
the frameworks needed to store it.
Data analytics is the science of analyzing raw
data to make conclusions about that
information.
Basic
 Data analytics help a business optimize its
performance, perform more efficiently, maximize
profit, or make more strategically-guided decisions.
 The techniques and processes of data analytics have
been automated into mechanical processes and
algorithms that work over raw data for human
consumption.
 Various approaches to data analytics include
descriptive analytics, diagnostic analytics, predictive
analytics, and prescriptive analytics.
Conti…
 Data analytics relies on a variety of software tools
including spreadsheets, data visualization, reporting
tools, data mining programs, and open-source languages.
 Four key types of data analytics: descriptive, diagnostic,
predictive, and prescriptive.
 Together, these four types of data analytics can help an
organization make data-driven decisions.
◦ Descriptive analytics tell us what happened.
◦ Diagnostic analytics tell us why something
happened.
◦ Predictive analytics tell us what will likely happen in
the future.
Types of Data Analytics
Four major types:
◦ Predictive (forecasting)
◦ Descriptive (business intelligence and data
mining)
◦ Prescriptive (optimization and simulation)
◦ Diagnostic analytics
Conti…
Predictive analytics
Predictive analytics turn the data into valuable,
actionable information. predictive analytics uses
data to determine the probable outcome of an
event or a likelihood of a situation occurring.

Predictive analytics holds a variety of statistical


techniques from modeling, ML, data mining ,
and game theory that analyze current and
historical facts to make predictions about a
future event.
Predictive analytics
Techniques that are used for predictive analytics
are:
◦ Linear Regression
◦ Time Series Analysis and Forecasting
◦ Data Mining

Basic Cornerstones of Predictive Analytics


◦ Predictive modeling
◦ Decision Analysis and optimization
◦ Transaction profiling
Predictive analytics
Descriptive analytics looks at data and analyze
past event for insight as to how to approach future
events.

Itlooks at past performance and understands the


performance by mining historical data to
understand the cause of success or failure in the
past.

Almost all management reporting such as sales,


marketing, operations, and finance uses this type
Predictive analytics
The descriptive model quantifies relationships in
data in a way that is often used to classify
customers or prospects into groups.

Unlike a predictive model that focuses on predicting


the behavior of a single customer, Descriptive
analytics identifies many different relationships
between customer and product.
Predictive analytics
Common examples of Descriptive analytics are
company reports that provide historic reviews like:
◦ Data Queries
◦ Reports
◦ Descriptive Statistics
◦ Data dashboard
Prescriptive Analytics
PrescriptiveAnalytics automatically synthesize
big data, mathematical science, business rule,
and machine learning to make a prediction and
then suggests a decision option to take
advantage of the prediction.

Prescriptive analytics goes beyond predicting


future outcomes by also suggesting action
benefits from the predictions and showing the
decision maker the implication of each decision
option.
Prescriptive Analytics
Prescriptive Analytics not only anticipates what
will happen and when to happen but also why it
will happen. Further, Prescriptive Analytics can
suggest decision options on how to take
advantage of a future opportunity or mitigate a
future risk and illustrate the implication of each
decision option.
For example, Prescriptive Analytics can benefit
healthcare strategic planning by using analytics to
leverage operational and usage data combined
with data of external factors such as economic
data, population demography, etc.
Diagnostic Analytics
In this analysis, we generally use historical data over
other data to answer any question or for the solution
of any problem. We try to find any dependency and
pattern in the historical data of the particular
problem.

For example, companies go for this analysis because


it gives a great insight into a problem, and they also
keep detailed information about their disposal
otherwise data collection may turn out individual for
every problem and it will be very time-consuming.
Diagnostic Analytics
Common techniques used for Diagnostic Analytics
are:
◦ Data discovery
◦ Data mining
◦ Correlations
Steps in Data Analysis
Define Data Requirements : This involves
determining how the data will be grouped or
categorized. Data can be segmented based on
various factors such as age, demographic,
income, or gender, and can consist of numerical
values or categorical data.

Data Collection : Data is gathered from


different sources, including computers, online
platforms, cameras, environmental sensors, or
through human personnel.
Steps in Data Analysis
 Data Organization : Once collected, data needs to
be organized in a structured format to facilitate
analysis. This could involve using spreadsheets or
specialized software designed for managing and
analyzing statistical data.

 Data Cleaning : Before analysis, data undergoes a


cleaning process to ensure accuracy and reliability.
This involves identifying and removing any duplicate
or erroneous entries, as well as addressing any
missing or incomplete data. Cleaning data helps to
mitigate potential biases and errors that could affect
Data Analytics Tool
Data Analytics How It’s Used
Tool
Artificial Makes decisions that can provide a plausible likelihood in achieving a
Intelligence goal
NoSQL Database Delivers a method for accumulation and retrieval of data
R Programming Assists data scientists in designing statistical software
Data Lakes Accumulates data without transforming it into structured data
Predictive Predicts future behavior via prior data
Analytics
Apache Spark Generates big data transformation via Python, R, Scala and Java
Prescriptive Provides guidance about what to do to achieve a desired outcome
Analytics
In-Memory Saves time by omitting the requirements to access hard drives
Database
Hadoop Ecosystem Ingests, stores, analyzes and maintains large data sets
Blockchain Distributed ledger technologies have proven valuable in managing data
challenges
Sources and nature of data
 Data collection is the process of acquiring,
collecting, extracting, and storing the voluminous
amount of data which may be in the structured or
unstructured form like text, video, audio, XML
files, records, or other image files used in later
stages of data analysis.
In the process of big data analysis, “Data
collection” is the initial step before starting to
analyze the patterns or useful information in data.
The data which is to be analyzed must be
collected from different valid sources.
Sources and nature of data
Primary data
The data which is Raw, original, and extracted
directly from the official sources is known as
primary data. This type of data is collected
directly by performing techniques such as
questionnaires, interviews, and surveys.
The data collected must be according to the
demand and requirements of the target
audience on which analysis is performed
otherwise it would be a burden in the data
processing. Few methods of collecting primary
data:
Secondary data
Secondary data is the data which has already
been collected and reused again for some
valid purpose.

Thistype of data is previously recorded from


primary data and it has two types of sources
named internal source and external source.
Classification of data
Big Data includes huge volume, high
velocity, and extensible variety of data. There
are 3 types: Structured data, Semi-structured
data, and Unstructured data.

◦ Structured
◦ Semi-structured
◦ Unstructured
Structured data
 Structured data is data whose elements are addressable
for effective analysis.
 It has been organized into a formatted repository that is
typically a database.
 It concerns all data which can be stored in
database SQL in a table with rows and columns.
 They have relational keys and can easily be mapped
into pre-designed fields. Today, those data are most
processed in the development and simplest way to
manage information.
 Example: Relational data.
Semi-Structured data
 Semi-structured data is information that does not reside
in a relational database but that has some
organizational properties that make it easier to analyze.

 Withsome processes, you can store them in the relation


database (it could be very hard for some kind of semi-
structured data), but Semi-structured exist to ease
space.

 Example: XML data.


Unstructured data
Unstructured data is a data which is not organized
in a predefined manner or does not have a
predefined data model, thus it is not a good fit for a
mainstream relational database.

So for Unstructured data, there are alternative


platforms for storing and managing, it is
increasingly prevalent in IT systems and is used by
organizations in a variety of business intelligence
and analytics applications.

Example: Word, PDF, Text, Media logs.


Diff types of Data
Properties Structured data Semi-structured data Unstructured data

It is based on
It is based on Relational It is based on character
Technology XML/RDF(Resource Description
database table and binary data
Framework).

Matured transaction and No transaction


Transaction Transaction is adapted from
various concurrency management and no
management DBMS not matured
techniques concurrency

Version Versioning over Versioning over tuples or graph


Versioned as a whole
management tuples,row,tables is possible

It is more flexible than It is more flexible and


It is schema dependent and
Flexibility structured data but less there is absence of
less flexible
flexible than unstructured data schema

It is very difficult to scale DB It’s scaling is simpler than


Scalability It is more scalable.
schema structured data

New technology, not very


Robustness Very robust —
spread

Query Structured query allow Queries over anonymous nodes Only textual queries are
performance complex joining are possible possible
Need of data analytics
Implementing data analytics into the
business model means companies can help
reduce costs by identifying more efficient
ways of doing business.

A company can also use data analytics to


make better business decisions.
Thanks

You might also like