0% found this document useful (0 votes)

54 views52 pages

Insights Into Big Data: An Industrial Perspective

The document discusses big data analytics from an industrial perspective. It covers topics like the evolution of analytics, characteristics of big data, challenges of big data, transforming data into insights, and provides examples of analytics use cases in marketing and customer behavior. The document is intended to provide an overview of big data analytics concepts and applications.

Uploaded by

venkymit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views52 pages

Insights Into Big Data: An Industrial Perspective

Uploaded by

venkymit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 52

Insights into Big Data

- An Industrial Perspective
Monika R
Data Analyst, Customer Success
[email protected]
Analytics is what?!

• Any sufficiently advanced technology should be indistinguishable

from magic.!!! Arthur C. Clarke
• Shouldn’t just stop with the excitement of enjoying magic

• Should be able to decode the detailed aspects of the technology behind

• Should go beyond presentation bias, to start looking things beyond the

presentation layer
Table of Contents:

• Analytics in Solving Today’s Business problems.

• Characteristics of Big Data

• Big Data Challenges.

• "Dataset to Insight" inside a Business.

• Sample Case Studies.

• Marketing Analytics
• Customer Behavior Analytics
Analytics in Solving Today’s Business
problems

 Before Data Analytics

 Evolution of Analytics
 Big Data definition
 Data across every Industries
Before Data Analytics
How were we starting a business even before the evolution of computer?
Before Data Analytics
• The ideology behind Analytics can most likely be tracked back to the days
before the age of computers, when unstructured data were the norm and
analytics was in its infancy.
• The use of data to make decisions is, of course, not a new idea; it is as old
as decision making itself.
What is DA ?
• Analytics is the process of iterative,
methodical exploration of an organization’s
data with emphasis on statistical Analysis. To
enable data-driven decision making.
• Not knowing, is like Not Competing
• Incorporating on advanced analytics lets us to
extract, cleanse, validate, analyze and report to
improve decision-making and business
outcomes.
• NOW WE KNOW, NOW WE CAN COMPETE
Why did it evolve big now ?
Today it isn’t just online and information firms that can create
products and services from analyses of data.
It’s every firm in every industry.

• Although analytics has been around for a long while,

it wasn’t until the last 10 to 15 years that its
importance in the business field has been realized.
• With analytics, organizations can now base their
decisions and strategies on data rather than on gut
feelings.
• Everything needs a proof in this world and Analytics
helps to have a proof.
What is Big Data?
• 'Big Data' is a term used to describe collection of data that is huge in size
and yet growing exponentially with time
• The reasons why every company is inclined towards adopting big data are

Reasons Big Data benefits

Timely Gain instant insights from diverse data sources

Better analytics Improvement of business performance through real-time analytics

Vast amount of data Big data technologies manage huge amounts of data

Insights Can provide better insights with the help of unstructured and semi-
structured data

Decision-making Helps mitigate risk and make smart decision by proper risk analysis
Everywhere in Every Domain
• Web
• Retail
• E-Commerce
• Medical
• Financial
• Insurance
• Telecom
• Banking
• Travel & Hospitality
Types of Data across Industries
• Medical, Healthcare and Life Sciences
• Automobile and Manufacturing
• Travel and Hospitality
• Retail and Ecommerce
• Web, Social Media and Digital
• Media
• Telecommunication
• Banking, Finance and Insurance
• Energy
• Sports, Media and Entertainment
• Niche areas like autonomous driving, image, video, etc,.
Big Data and its Characteristics.

 Data availability
 Characteristics
Format of Handy Data
Big Data
is about
these
4Vs

Velocity is the Game Changer: Its NOT just how fast data is produced or changed,
BUT the speed at which it much be received, understood and processed.
Big Data Challenges

 Risks in handling Big Data

 Types of Analytics
Risk of Big Data
• There are commensurate risks that go along with using big data.
• The specific properties of big data, creates new types of risks that
necessitate a comprehensive strategy to enable a company to utilize big
data while avoiding the pitfalls.
Principles of Analysis
• Goal of an analysis:
* To explain cause-and-effect phenomena
* To relate research with real-world event
* To predict/forecast the real-world phenomena based on research
* Finding answers to a particular problem
* Making conclusions about real-world event based on the problem
* Learning a lesson from the problem
Types of Analytics
• Each of the Analytic type offers a different insight.
"Dataset to Insight" inside a
Business.

 Steps followed in transmitting a dataset(structured/unstructured in

to useful insight.
Insights Defined:
• An insight is a novel, interesting, plausible, and understandable relation, or
set of associated relations, that is selected from a larger set of relations
derived from a data set.
• An insight must have the following key properties:
• Actionable
• Measurable
• Stable
• Reproducible
• Robust
• Enduring
Transmission Process
• Data Collection
• Data Integration
• Data Storage
• Data Pre-
Processing
• Data Mining
• Data Analysis
• Reporting.
Data Collection
• The Task of Data collection begins after a research problem has been
defined and research design/plan chalked out.
• Two Types of Data:
• Primary Data
• Secondary Data
Data Collection
• Data Collection methods for impact evaluation vary along a continuum:
• Quantitative Data - Any Data that is numeric in form
• Qualitative Data - Any Data that is descriptive
• Data Collection improves the quality of expected Results.
• Decision making process is smoother and decisions are definitely better, if
there is data driving them.
• No Specific Data collecting tools - Various according to your business
problem.
Data Collection
Data that are collected in Today’s Business
• Market Research
• Social media data and Crawling
• Sensor Data
• Log(Audit)
• Reviews and Feedback
• Sampling
Data Integration
• DI platforms are the glue between each
program.
• A complete data integration solutions
delivers trusted data from various
sources.
• For Instance, CAD and GIS data can be
integrated to enhance a CAD drawing
with specialized GIS information and
Attributes.
Tools Used:
• BlockSpring:
• It is a unique Program in the way that they harness all of the power of
services in familiar Platforms.
• Free to Use.
• PentaHo:
• It offers big data with Zero coding, using a simple drag and drop UI.
• It is an enterprise solution.
Data Storage and Management

• Part of how Big data got the distinction as “BIG” is that it became too much
for traditional system to handle.
• Need : An infrastructure on which to run all the other analytics tools as well
as a place to store and query data.
• Evolution of NoSQL
• A typical big data storage architecture:
• Direct attached storage pools (Scalable and redundant)
• Clustered network attached storage.
CAP Theorem
• Consistency : All nodes see
the same data at the same
time.
• Availability : Every request
gets a response on
success/failure.
• Partition : System continues to
work despite message loss or
partial failure.
Data Storage Tools:
Data Pre-Processing
• The 3 various steps that would be carried out in a Data Pre-Processing
scenario would be
• Data Cleaning
• Data Transformation
• Data Reduction
Why Data Cleaning?
Data in the real world is dirty
incomplete: lacking attribute values, lacking certain attributes of interest, or containing
only aggregate data
• e.g., occupation=“”
noisy: containing errors or outliers (spelling, phonetic and typing errors, word
transpositions, multiple values in a single free-form field)
• e.g., Salary=“-10”
inconsistent: containing discrepancies in codes or names (synonyms and nicknames,
prefix and suffix variations, abbreviations, truncation and initials)
• e.g., Age=“42” Birthday=“03/07/1997”
• e.g., Was rating “1,2,3”, now rating “A, B, C”
• e.g., discrepancy between duplicate records
Why is Data Dirty?
• Incomplete data comes from:
• non available data value when collected
• different criteria between the time when the data was collected and when it is analyzed.
• human/hardware/software problems
• Noisy data comes from:
• data collection: faulty instruments
• data entry: human or computer errors
• data transmission
• Inconsistent (and redundant) data comes from:
• Different data sources, so non uniform naming conventions/data codes
• Functional dependency and/or referential integrity violation
Steps in Data Cleaning

• The Following are the steps followed in cleaning the Big data
• Fill in the missing Value
• Unified Date Format
• Converting nominal data to Numeric
• Identifying and removing the outliers and Noisy data
• Meta Data
Data Transformation
Data transformation routines convert the data into appropriate forms for
mining. They're shown as follows:
• Smoothing: This uses binning, regression, and clustering to remove noise from
the data
• Attribute construction: In this routine, new attributes are constructed and
added from the given set of attributes
• Aggregation: In this summary or aggregation, operations are performed on the
data
• Normalization: Here, the attribute data is scaled so as to fall within a smaller
range
• Discretization: In this routine, the raw values of a numeric attribute are replaced
by interval label or conceptual label
• Concept hierarchy generation for nominal data: Here, attributes can be
generalized to higher level concepts
Data Reduction
• Reduction of multitudinous amounts of data down to the meaningful parts.
• Data De-Duplication
• Sampling
• Feature Selection
• Dimensionality Reduction
• Data compression reduces the size of a file by removing redundant
information from files so that less disk space is required.
• Archiving data also reduces data on storage systems, but the approach is
quite different.
What is Data Mining?
• Data mining (knowledge discovery from data)
• Extraction of interesting (non-trivial, implicit, previously unknown and potentially
useful) patterns or knowledge from huge amount of data
• Data mining: a misnomer?

• Alternative names
• Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging, information
harvesting, business intelligence, etc.
Data Mining Functionalities
-What kind of patterns can be mined?
• Concept/Class Description: Characterization and Discrimination
• Data can be associated with classes or concepts.
• E.g. classes of items – computers, printers, …
concepts of customers – bigSpenders, budgetSpenders, …
• How to describe these items or concepts?
• Descriptions can be derived via
• Data characterization – summarizing the general characteristics of a
target class of data.
• E.g. summarizing the characteristics of customers who spend more than $1,000 a year
at AllElectronics. Result can be a general profile of the customers, such as 40 – 50 years old,
employed, have excellent credit ratings.
Data Mining Functionalities
-What kind of patterns can be mined?
• Data discrimination – comparing the target class with one or a set of
comparative classes
• E.g. Compare the general features of software products whole sales increase by 10% in the last year with those
whose sales decrease by 30% during the same period
• Or both of the above

• Mining Frequent Patterns, Associations and

Correlations
• Frequent itemset: a set of items that frequently appear
together in a transactional data set (e.g. milk and bread)
• Frequent subsequence: a pattern that customers tend to purchase product A, followed by
a purchase of product B
Data Mining Functionalities
- What kinds of patterns can be mined?
• Association Analysis: find frequent patterns
• E.g. a sample analysis result – an association rule:
buys(X, “computer”) => buys(X, “software”) [support = 1%, confidence = 50%]
(if a customer buys a computer, there is a 50% chance that she will buy software.
1% of all of the transactions under analysis showed that computer and software
are purchased together. )
• Associations rules are discarded as uninteresting if they do not satisfy both a
minimum support threshold and a minimum confidence threshold.
• Correlation Analysis: additional analysis to find statistical correlations
between associated pairs

Data Mining: Concepts and Techniques 40

Data Mining Functionalities
- What kinds of patterns can be mined?
• Classification and Prediction
• Classification
• The process of finding a model that describes and distinguishes the data classes
or concepts, for the purpose of being able to use the model to predict the class of
objects whose class label is unknown.
• The derived model is based on the analysis of a set of training data (data objects
whose class label is known).
• The model can be represented in classification (IF-THEN) rules, decision trees,
neural networks, etc.

• Prediction
• Predict missing or unavailable numerical data values

Data Mining: Concepts and Techniques 41

Data Mining Functionalities
- What kinds of patterns can be mined?

Data Mining: Concepts and Techniques 42

Data Mining Functionalities
• Cluster Analysis
• Class label is unknown: group data to form new classes
• Clusters of objects are formed based on the principle of maximizing
intra-class similarity & minimizing interclass similarity
• E.g. Identify homogeneous subpopulations of customers. These clusters may
represent individual target groups for marketing.

Data Mining: Concepts and Techniques 43

Case Studies

 Marketing Analytics
 Customer Behavior Analytics
Data Mining Functionalities
• Outlier Analysis
• Data that do no comply with the general behavior or model.
• Outliers are usually discarded as noise or exceptions.
• Useful for fraud detection.
• E.g. Detect purchases of extremely large amounts

• Evolution Analysis
• Describes and models regularities or trends for objects whose
behavior changes over time.
• E.g. Identify stock evolution regularities for overall stocks and for the stocks of
particular companies.

Data Mining: Concepts and Techniques 45

Are All of the Patterns Interesting?
• Data mining may generate thousands of patterns: Not all of them
are interesting
• A pattern is interesting if it is
• easily understood by humans
• valid on new or test data with some degree of certainity
• potentially useful
• novel
• validates some hypothesis that a user seeks to confirm
• An interesting measure represents knowledge !

Data Mining: Concepts and Techniques 46

Data Mining Tools
Reporting
Two main steps to be followed for reporting the insights are:
• Visualization
• Story telling

Data Visualization:
• It will make the data come to life.
• They are bright and easy way to convey complex data insights and most of them
require no coding.
• The goal is to communicate information clearly and efficiently to users.
Reporting : Story Telling
• Data Scientists need to be able to influence.
• Data and insights which can shape the direction of a business should be
projected clearly and in interesting manner.
Visualization Tools Used:
The Magical Words
• Data Analytics, Data Analysis, Data
Mining, Data Science
• Let’s think about the data available to the
farmer, here’s a simplified breakdown:
1. Historic weather patterns
2. Plant breeding data and productivity for each strain
3. Fertilizer specifications
4. Pesticide specifications
5. Soil productivity data
6. Pest cycle data
7. Machinery cost, reliability, fault and cost data
8. Water supply data
9. Historic supply and demand data
10. Market spot price and futures data
Specialized areas

● Financial analytics
● Retail analytics
● Market analytics
● Social media
● HR Analytics
● Customer analytics
● Pricing analytics

BDA Unit 1
No ratings yet
BDA Unit 1
17 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
4 pages
Big Data Analytics Opportunities and Challenges
No ratings yet
Big Data Analytics Opportunities and Challenges
43 pages
CHAPTER 02: Big Data Analytics
No ratings yet
CHAPTER 02: Big Data Analytics
62 pages
CHAPTER 02: Big Data Analytics
No ratings yet
CHAPTER 02: Big Data Analytics
73 pages
Unit - 2 Fundamentals of Big Data Analytics
No ratings yet
Unit - 2 Fundamentals of Big Data Analytics
39 pages
Big Data Analytics - AAM - Unit 1
No ratings yet
Big Data Analytics - AAM - Unit 1
178 pages
Project Report: Id Card Generator
No ratings yet
Project Report: Id Card Generator
37 pages
AAU Data Analytics 24
No ratings yet
AAU Data Analytics 24
205 pages
Big Data Analytics - Unit 1
No ratings yet
Big Data Analytics - Unit 1
43 pages
Module 4 DSBD
No ratings yet
Module 4 DSBD
89 pages
UNIT Two Emerging Technology
No ratings yet
UNIT Two Emerging Technology
43 pages
Unit-III CC&BD Cs62 Ab
No ratings yet
Unit-III CC&BD Cs62 Ab
85 pages
Unit-Iii CC&BD CS71
No ratings yet
Unit-Iii CC&BD CS71
89 pages
How To Guide MFS With APC TCP v1.2
No ratings yet
How To Guide MFS With APC TCP v1.2
10 pages
Big Data Lesson 1 Lucrezia Noli
No ratings yet
Big Data Lesson 1 Lucrezia Noli
46 pages
Unit-1 Bda
No ratings yet
Unit-1 Bda
72 pages
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
No ratings yet
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
56 pages
OC - Module 1 - Intro To BDA 021312
No ratings yet
OC - Module 1 - Intro To BDA 021312
37 pages
1 Introduction To Big Data
No ratings yet
1 Introduction To Big Data
23 pages
Bigdata Mod-1
No ratings yet
Bigdata Mod-1
33 pages
Da 1
No ratings yet
Da 1
20 pages
Unit1 Big Data Analytics
No ratings yet
Unit1 Big Data Analytics
31 pages
"Ÿ""Isliln: Formation
No ratings yet
"Ÿ""Isliln: Formation
34 pages
BDA Unit 1
No ratings yet
BDA Unit 1
23 pages
BDA Class1
No ratings yet
BDA Class1
26 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
Unit 1
No ratings yet
Unit 1
74 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
Why System Design
0% (1)
Why System Design
229 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
Introduction To Data
No ratings yet
Introduction To Data
34 pages
Lecture 1 Introduction To Advanced Data Analytics
No ratings yet
Lecture 1 Introduction To Advanced Data Analytics
44 pages
BIG DATA INTRODUCTION Hadoop
No ratings yet
BIG DATA INTRODUCTION Hadoop
24 pages
Document From Shivam
No ratings yet
Document From Shivam
35 pages
Unit 1
No ratings yet
Unit 1
21 pages
Big Data Analtics (Unit 1)
No ratings yet
Big Data Analtics (Unit 1)
31 pages
Big Data in Business
No ratings yet
Big Data in Business
13 pages
Data Analytics-Unit1 Notes
No ratings yet
Data Analytics-Unit1 Notes
30 pages
CS 329 Lecture One 2025
No ratings yet
CS 329 Lecture One 2025
28 pages
Business Analytics
No ratings yet
Business Analytics
34 pages
Unit - I - Types of Digital Data
No ratings yet
Unit - I - Types of Digital Data
45 pages
Introduction To Big Data Unit - 2
No ratings yet
Introduction To Big Data Unit - 2
75 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
BDT 1
No ratings yet
BDT 1
49 pages
ETB 1 (Big Data)
No ratings yet
ETB 1 (Big Data)
28 pages
Big Data Manual - Edited
No ratings yet
Big Data Manual - Edited
69 pages
Bda U1
No ratings yet
Bda U1
78 pages
Big Data Analytics
No ratings yet
Big Data Analytics
37 pages
(Ebook PDF) Management Information Systems Managing The Digital Firm 15thpdf Download
100% (4)
(Ebook PDF) Management Information Systems Managing The Digital Firm 15thpdf Download
54 pages
Reviewed Big Data Assignment
No ratings yet
Reviewed Big Data Assignment
6 pages
Big Data Analytics Project Proposal by Slidesgo
No ratings yet
Big Data Analytics Project Proposal by Slidesgo
12 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
31 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
What Is Big Data
No ratings yet
What Is Big Data
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
Fortinet
No ratings yet
Fortinet
29 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
E Strive To Provide Our Clients The Very Best Technical Expertise and E-Business Solutions
No ratings yet
E Strive To Provide Our Clients The Very Best Technical Expertise and E-Business Solutions
12 pages
Salesforce Single Sign On
No ratings yet
Salesforce Single Sign On
42 pages
Communications Mining Taxonomy Guide
No ratings yet
Communications Mining Taxonomy Guide
42 pages
HPE Aruba Networking CX 8100 Switch Series Data sheet-PSN1014733547PHEN
No ratings yet
HPE Aruba Networking CX 8100 Switch Series Data sheet-PSN1014733547PHEN
4 pages
Sample FS - Integration
No ratings yet
Sample FS - Integration
13 pages
Magic Quadrant For Network Firewalls, 2021
No ratings yet
Magic Quadrant For Network Firewalls, 2021
41 pages
Cloud Computing 6TH Sem
No ratings yet
Cloud Computing 6TH Sem
6 pages
Cyber Security at Airports PDF
No ratings yet
Cyber Security at Airports PDF
6 pages
Nvidia Profiling Tools Keipert 10 4 22
No ratings yet
Nvidia Profiling Tools Keipert 10 4 22
27 pages
Chapter 3. RDBMS
No ratings yet
Chapter 3. RDBMS
7 pages
Projectwise Design Integration: Connect Edition Update 3.2
No ratings yet
Projectwise Design Integration: Connect Edition Update 3.2
262 pages
Project Scope Management
No ratings yet
Project Scope Management
48 pages
Inventory Management System
No ratings yet
Inventory Management System
9 pages
KNC-401 ST2Assignment
No ratings yet
KNC-401 ST2Assignment
2 pages
Os Module 1
No ratings yet
Os Module 1
21 pages
Mastering Data Warehouse Design Relational and Dimensional Techniques 1st Edition by Claudia Imhoff, Nicholas Galemmo, Jonathan Geiger ISBN 0471480921 9780471480921 Instant Download
100% (1)
Mastering Data Warehouse Design Relational and Dimensional Techniques 1st Edition by Claudia Imhoff, Nicholas Galemmo, Jonathan Geiger ISBN 0471480921 9780471480921 Instant Download
53 pages
Microstrategy vs. Cognos: A Comparison White Paper by Microstrategy
No ratings yet
Microstrategy vs. Cognos: A Comparison White Paper by Microstrategy
16 pages
ИКТ
No ratings yet
ИКТ
49 pages
Cs Aat 2
No ratings yet
Cs Aat 2
28 pages
Ieee 5 2020
No ratings yet
Ieee 5 2020
6 pages
Tes
No ratings yet
Tes
18 pages
Matching On Operating System
No ratings yet
Matching On Operating System
5 pages
Conferencing Using Session Initiation Protocol
No ratings yet
Conferencing Using Session Initiation Protocol
10 pages
Technology - Crosswords
No ratings yet
Technology - Crosswords
2 pages
Dragos O. 04.2020
No ratings yet
Dragos O. 04.2020
2 pages
The Data Whisperer - Making Sense of Big Data
From Everand
The Data Whisperer - Making Sense of Big Data
Keaton Rivers
No ratings yet
From Data To Decisions: Driving Performance in the Age of Analytics
From Everand
From Data To Decisions: Driving Performance in the Age of Analytics
Babatunde Yusuf
No ratings yet
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet

Insights Into Big Data: An Industrial Perspective

Uploaded by

Insights Into Big Data: An Industrial Perspective

Uploaded by

Insights into Big Data

• Any sufficiently advanced technology should be indistinguishable

• Should be able to decode the detailed aspects of the technology behind

• Should go beyond presentation bias, to start looking things beyond the

• Analytics in Solving Today’s Business problems.

• Characteristics of Big Data

• Big Data Challenges.

• "Dataset to Insight" inside a Business.

• Sample Case Studies.

 Before Data Analytics

• Although analytics has been around for a long while,

Reasons Big Data benefits

Timely Gain instant insights from diverse data sources

Better analytics Improvement of business performance through real-time analytics

 Risks in handling Big Data

 Steps followed in transmitting a dataset(structured/unstructured in

• Mining Frequent Patterns, Associations and

Data Mining: Concepts and Techniques 40

Data Mining: Concepts and Techniques 41

Data Mining: Concepts and Techniques 42

Data Mining: Concepts and Techniques 43

Data Mining: Concepts and Techniques 45

Data Mining: Concepts and Techniques 46

You might also like