Big Data and Data Analytics
Big Data and Data Analytics
1
2
What is Big Data?
What is Big Data?
Contracts Monitoring
Public Commercial
Sensor
Credit
Weather
Industry
Population Social Media Sentiment
Economic
Network
5
Types of Data
Types of Data
Male Yes
Female No
Very satisfied
Somewhat satisfied
Neutral
Somewhat dissatisfied
Very dissatisfied
(continued)
Classifying Data Elements in a Purchasing Database
Ca
Ca
Ca
Ca go
In al
In
Ra
Figure 1.2
Ra
Ra
Ra o
te
te
te
te rica
te
te
ti
tio
tio
tio
go
go
go l
rv
rv
al
r
ri c
ri c
ic
al
al
al
VOLUME
Growing quantity of data
e.g. social media, behavioral, video
T Y
I E
R
VA
Velocity
of data
16%
Variety of
data
Volume of 48%
data
35%
Source: Getting Value from Big Data, Gartner Webinar, May 2012
Volume
• Volume
• Petabytes,
exabytes of data
• Volumes too
great for typical
DBMS
Volume - Bytes Defined
Teradata > 10 PB
Megabyte: 220 bytes or, Gigabyte: 230 bytes or, loosely one
loosely,
5-18 one million bytes billion bytes
Velocity
• Velocity
• Massive
amount of
streaming data
Variety
• Variety
• Massive sets of
unstructured/se
mi-structured
data from Web
traffic, social
media, sensors,
and so on
Which source of data represents
the most immediate opportunity?
Source: Getting Value from Big Data, Gartner Webinar, May 2012
Big Data Opportunities
Making better informed decisions
e.g. strategies, recommendations
Source: Getting Value from Big Data, Gartner Webinar, May 2012
Identifying Insurance Fraud
Auto Insurance
• Opportunity
• Save and make money by reducing fraudulent
auto insurance claims
• Results
• Improved success rate in pursuing fraudulent claims from 50% to
88%; reduced fraudulent claim investigation time by 95%
What **“dark data” is just laying around that can transform business
• Marketing to individuals with low propensity for fraud
processes?
**Operational data that is not being used. Consulting and market research
company Gartner Inc. describes dark data as "information assets that
organizations collect, process and store in the course of their regular
business
24
activity, but generally fail to use for other purposes."
Quality Improvement
Opportunity
Move from manual to automated inspection of
burger bun production to ensure and improve
quality
Data & Analytics
Photo-analyze over 1000 buns-per-minute for
color, shape and seed distribution
Continually adjust ovens and process
automatically
Result
Eliminate 1000s of pounds of wasted product per
year; speed production; save energy; Reduce
manual labor costs
Is the company using all of its “senses” to observe, measure
and optimize business processes?
25
Improving Corporate Image
• Opportunity
• Improve reputation, brand and buzz by tapping social
media
• Data & Analytics
• Continually scanning twitterverse for mentions of their
business
• Integrating tweeters with their robust customer
management system
• Results
• Saw tweet from a top customer lamenting late flight—
no time to dine at Morton’s
• Tuxedo-clad waiter waiting for him when he landed
with a bag containing his favorite steak, prepared the
way he normally likes it with all the fixin’s
Analytics ti m
happen? Op
ig ht
Diagnostic r es
What
Analytics Fo
happened?
Descriptive gh t
s i
Analytics In
on gh t
at i
d s i
Inf
o rm Hin
DIFFICULTY
31
Descriptive Analytics
• Descriptive analytics, such as reporting/OLAP,
dashboards, and data visualization, have been widely
used for some time.
• They are the core of traditional BI.
Data Modeling
......
Example: Fraud Detection
Manufactures microprocessors on silicon wafers that are routed through 400 steps in many
weeks. This process is monitored, gathering 140,000 pieces of information about each wafer.
Somewhere in that heap of data can be warnings about things going wrong. Detect a bug early
before bad chips are made. TI uses visualization tools to make the detection process easier
Eli Lilly
Has 1500 scientists using an advanced information visualization tool (Spotfire) for decision
making. “With its ability to represent multiple sources of information and interactively change your
view, it’s helpful for homing in on specific molecules and deciding whether we should be doing
further testing on them”
Sheldon Ort of Eli Lilly, speaking to Fortune
The Cholera Epidemic,
London 1845
• The previous day, engineers who designed the rocket opposed the
launch, concerned that the O-Rings would not seal at the forecast
temperature (25 to 29oF).
• Refers to the innovative use of images and interactive technology to explore large, high- density
datasets
• Help users see patterns and relationships that would be difficult to see in text lists
• Rich graphs, charts
• Dashboards
• Maps
• Increasingly is being used to identify insights into both structured and unstructured data for such
areas as
• operational efficiencies
• profitability
• strategic planning
Video Tableau
Examples
Geo data
mapping
Treemap
Population
“Trendalyzer”