Big Data Analysis Patterns
Big Data Analysis Patterns
Analysis Patterns
Atlanta Big Data User Group
8/15/2013
1
whoami
Brad Anderson
Solutions Architect at MapR (Atlanta)
ATLHUG co-chair
NoSQL East Conference 2009
boorad most places (twitter, github)
[email protected]
2
Announcements
Next ATLHUG Meeting - Sept. 26
How Google Does Big Data
3
3
BIG DATA
4
5
Big Data is not new!
but the tools are.
6
The Good News in Big Data:
7
The Challenge: So Many Solutions!
9
9
Picking the Best Solution
10
10
Apache Solr/Lucene
11
Apache Mahout
12
Apache Drill
13
Storm
14
Titan
Distributed Graph Database
Property Graph
Pluggable Backend Storage
HBase or M7
Cassandra
Berkeley DB
Search Integrated
Solr/Lucene
Elastic Search
Faunus
Batch processing of large graphs
Fulgora
Graph traversals on subset
In-memory
15
Using the Answers to Guide Your Choices
16
Big Data Decision Tree
How big is your data?
<10 GB >200 GB
mid
?
?
A What size queries?
B C Response time?
D E
17
Use Cases
Company
Data Shape
Technique(s)
Business Value
18
Business Value
19
Business Value
20
Telecommunications Giant
ETL Offload
21
Telecommunications
Data Shape
Lots of Data
Lots of Queries across Large Sets
Throughput important
22
Telecommunications
Techniques
ETL Analytics
23
Telecommunications
Techniques
25
Credit Card
Issuer
26
Credit Card
Issuer
Data Shape
Customer Purchase History (big)
Merchant Designations
Merchant Special Offers
Throughput important
Recommendations27
Credit Card
Issuer
Techniques
A Recommendation Engine with Mahout and Solr/Lucene
History matrix
28
Credit Card
Issuer
Techniques
Recommendation based on
cooccurrence
Techniques
30
Credit Card
Issuer
Techniques
SolR
SolR
Complete Cooccurrence Indexer
Solr
Indexer
history (Mahout) indexing
20 Hrs 3 Hrs
31
Credit Card
Issuer
Techniques
SolR
SolR
User Indexer
Solr
Web tier Indexer
history search
8Hrs 3 Min
Item meta-
Index
data shards
32
Credit Card
Issuer
Techniques
Hadoop Export
(4 hrs)
Purchase App
History
App
Recommendation Presentation
Merchant
Engine Results Data Store App
Information
(Mahout) (DB2)
App
Merchant
Offers App
Import
(4 hrs)
33
Credit Card
Issuer
Techniques
Hadoop
Purchase App
Index
History
Update
App
(3 min)
Recommendation Recommendation
Merchant
Engine Results Search Index App
Information
(Mahout) (Solr)
App
Merchant
Offers App
34
Credit Card
Issuer
Business Value
35
Waste & Recycling Leader
Idle Alerts
36
Data Shape
Truck Geolocation Data
20,000 trucks
5 sec interval (arriving quickly)
Landfill Geographic Boundaries
37
Techniques
Realtime Stream Computation Immediate
(Storm) Alerts
Shortest Path
Route
Graph Algorithm
Optimization
(Titan)
38
Business Value
39
Beverage Company
40
Data Shape
Tweets, FB Messages
Person, Activity links
Graph Traversal
41
Consumer Activity Graph
Wal*Mart.com
Ebay
Shopping.com
Sams
Ebay Motors
Dollar General
StubHub
Toys R Us
CVS
42
Techniques
Social
Activity
Stream
Key/Value Store
(MapR M7)
43
Business Value
44
Fraud Detection
Data Lake
45
Data Sources
Anti-Money Laundering
Consumer Transactions
46
Techniques
Anti-Money Laundering Consumer Transactions
System System
47
Techniques
AML
49
Machine Learning
Search Relevance
DNA Matching
50
Data Sources
53
Traffic Analytics
54
Data Sources
59
Similar Characteristics
Lots of Data
Structured, Semi-Structured, Unstructured
Varied Systems Interoperating
Hadoop, Storm, Solr, MPP, Visualizations
Increase Revenue
Decrease Costs
60
Questions?
61