Multi-Terabyte MySQL Data Warehouses - Absolutely! Presentation
Multi-Terabyte MySQL Data Warehouses - Absolutely! Presentation
Absolutely!
Agenda
Industry Research
Gartner
The Problem: Data Warehouses
are Strained
+ =
Users are
Data is aggregated and deleted
Data is growing asking more
Data is archived and not usable
exponentially complex
Complex queries are blocked
questions
Complex queries don’t perform
What do the current limitations mean for Stakeholders?
Users
Do not get access to data they need;
Queries run too slowly;
Are not allowed to think creatively – ask new and different questions;
Are told to wait for months for what they want in minutes
IT
Besieged with requests for new data sources;
Feature creep and changing requirements straining resources;
Analytic system maintenance tuning affect support for operational systems;
Executives
CIOs face service level complaints and rising IT costs;
Business unit leaders without analytic data fail to achieve objectives;
About Infobright
Founded 2005
MySQL/Sun
Leverages MySQL connectivity to ETL and BI
Key Partner Provides MySQL customer with scalable, enterprise-ready data
warehouse
.
Data Warehousing:
Part of the Problem
More Data & More Kinds of Output Needed
Data Sources by More Business Users
Clickstream and log files
0101010101010101010101010101
10 101 101 0 10
0101010101010101010101010101
Existing data warehouse
0101010101010101010101010 10
0101010101010101010101010 0
10
External Sources
0101010101010101010101
10
0101010101010101010101
101
1
10 101 10 0 100
1
10
0
10
01
01
10
0
1
1
01
10 1
10
1
10
0
1
1
10
0
01 10 10001
1 1
10
1
1
0
0 1
0
10 1 0
0
011 1
1 10 10 10
1 0 1 10 10 1 101
0
01 010 1 0
1 0 1010 0100 1
10 1 0 1 1 01
0
01 0101 0 10 0 1
1
1 01 0
10 0 10
0
…then
Revise the model as reporting requirements change and data
grows:
Add indexes
Partition data to improve performance
Restrict users!
Traditional Data Warehouse Approach
Results:
Software costs well known and predictable but...
Management and support costs spiral:
Partitioning strategies
Indexing strategies
Additional data marts
More hardware
Business user satisfaction declines as restrictions are placed on:
Adhoc query capabilities
Volume of historical data that can be queried
Time lag between business requirement and system
delivery
With this particular client, their systems were unable to handle this
Market Evolution
Data
Warehouse
Innovator
Working Smarter
Database
Advances
Extending
Hardware Database Concepts
Advances Incremental improvements, still
inflexible
Divide and conquer on
Traditional
lots of hardware (MPP)
Nothing to address
All-purpose RDBMS underlying issues
Resource intensive, lots of
DBA time
Innovation
What to Look for in a New Approach
New Approach
Leverages column approach
Clickstream and log files
0101010101010101010101010101
0101010101010101010101010101
Automatically creates
10
101 01
Existing data warehouse
0101010101010101010101010
0101010101010101010101010
structures that:
• finds needed data
10 1 0
0 1 0110 101 0
External Sources
1
0101010101010101010101
0101010101010101010101
0 10
10 101 10 0 1 100
01
• are always ready
01
10
01
Has small footprint
10 1
10
1
01 10
01 1
1
01 10
0
10 10
0
1 maintain
101
0110 0 1
1 01 1 1 10
1
1 1010101010101010101
10
101010101010101010101010
0
10101010101010101010101010101
10
101 01
Existing data warehouse
0101010101010101010101010
0101010101010101010101010 Faster Response
10 1 0
0 1 0110 101 0
External Sources
0101010101010101010101 1
0101010101010101010101 Decreased IT Burden
0 10
10 101 10 0 1 100
10
Smaller Footprint
01
01
10
01
10 1
10
1
01
10
01 10
01 1
1
1
01 10
0
10 10
0
101
1
10
1 101010101010101010101
1
1 101010101010101010101010
10101010101010101010101010101
Smarter architecture:
Load data and go
No indices or partitions
Knowledge Grid—statistics to build / maintain
and metadata “describing”
the super-compressed data
Knowledge Grid
created automatically as
Data Packs—Data stored data loaded
in manageably sized, highly
compressed data packs
Up to 40:1 compression
Data compressed using reduces storage
algorithms tailored to
data type
Open architecture
leverages off-the-shelf
Brighthouse
hardware
How Brighthouse Works Smarter
Brighthouse
Brighthouse is Easy on IT
Existing Data Warehouses
010101010101010101010101010
BI Connectors
10 1 101 0 10
010101010101010101010101010 1
10
1
10
Clickstream/Logfiles
0101010101010101010101010
0101010101010101010101010 0
No strain on IT:
1
1
10
01 10
0101010101010101010101
0101010101010101010101
1
10
10 101 10 0 100
01
01
data modeling
10
10
0
1
0 1
0
10
10 1
Run on standard
01
1
10
01
0
hardware
1
1
10
01
1
1
10
10
1
01
10 BI and ETL platforms
0
MySQL “wrapper”
No need to learn
new database
ETL Platform system
Connector Leverage mature
tools
BrightHouse Architecture and MySQL
Increases as data
volume grows
Brighthouse
Performance
Advantage
• Queries were moderately complex, with at least two table joins and two or more where clauses
• Tables were indexed
• Response time represents the average response time of queries
Brighthouse Load Time Remains
Constant
• Comparison of load to a single table. Data was loaded in 10 million row chunks
• Table had a single index
Brighthouse is Fast
RAPID START:
Call us – we’ll walk you
through a a few questions to
mutually determine if our
technology is a good fit.
Agree on process – e.g.
your place or ours?
Load and go – Load your
Contact Us
data, run your queries
[email protected]
Summarize results –
416.596.2483, x. 225
performance, compression,
load times Download Claudia Imhoff
Next steps – did we prove paper:
it? http://www.infobright.com