0% found this document useful (0 votes)
84 views39 pages

How To Select An Analytic DBMS: Overview, Checklists, and Tips

The document discusses how to select an analytic database management system (DBMS), providing an overview of specialized analytic DBMS options, differentiations among products, and tips for the selection process. It explains why specialized analytic DBMS exist, major product categories and vendors, considerations for segmentation, and outlines a process for shortlisting options, conducting proofs-of-concept, and making a final decision based on cost, speed, and risk.

Uploaded by

jhakanchanjsr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views39 pages

How To Select An Analytic DBMS: Overview, Checklists, and Tips

The document discusses how to select an analytic database management system (DBMS), providing an overview of specialized analytic DBMS options, differentiations among products, and tips for the selection process. It explains why specialized analytic DBMS exist, major product categories and vendors, considerations for segmentation, and outlines a process for shortlisting options, conducting proofs-of-concept, and making a final decision based on cost, speed, and risk.

Uploaded by

jhakanchanjsr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

How to Select an Analytic DBMS

Overview, checklists, and tips


by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com

Curt Monash

Analyst since 1981, own firm since 1987

Covered DBMS since the pre-relational days Also analytics, search, etc.
Blogs, including DBMS2 (www.DBMS2.com -- the source for most of this talk) Feed at www.monash.com/blogs.html White papers and more at www.monash.com

Publicly available research

User and vendor consulting

Our agenda

Why are there such things as specialized analytic DBMS? What are the major analytic DBMS product alternatives? What are the most relevant differentiations among analytic DBMS users? Whats the best process for selecting an analytic DBMS?

Why are there specialized analytic DBMS?

General-purpose database managers are optimized for updating short rows not for analytic query performance 10-100X price/performance differences are not uncommon

At issue is the interplay between storage, processors, and RAM

Moores Law, Kryders Law, and a huge exception


Growth factors:

45% 40% 35% 30% Transistors/Chips since 1971 Disk Density since 1956 Disk Speed since 1956

Transistors/chip: >100,000 since 1971 25% Disk density: 20% >100,000,000 since 1956 15% Disk speed: 10% 12.5 since 1956 5%
0% Compound Annual Growth Rate

The disk speed barrier dominates everything!


4/1/2014

DRAFT!! THIRD TEST!!

Software strategies to optimize analytic I/O

Minimize data returned

Classic query optimization

Minimize index accesses

Page size
Materialized views OLAP cubes

Precalculate results

Return data sequentially Store data in columns Stash data in RAM

Hardware strategies to optimize analytic I/O


Lots of RAM Parallel disk access!!! Lots of networking

Tuned MPP (Massively Parallel Processing) is ideal. Recommended configurations are a mixed bag.

Specialty hardware strategies


Custom or unusual chips (rare) Custom or unusual interconnects Fixed configurations of common parts

Appliances or recommended configurations

And theres also SaaS.

18 contenders (and there are more)


Aster Data Dataupia Exasol Greenplum HP Neoview IBM DB2 BCUs Infobright/MySQL Kickfire/MySQL Kognitio Microsoft Madison

Netezza Oracle Exadata Oracle w/o Exadata ParAccel SQL Server w/o Madison Sybase IQ Teradata Vertica

General areas of feature differentiation

Most influenced by architecture


Query performance Update/load performance Alternate datatypes Compatibilities Advanced analytics Manageability and availability Encryption and security

Most influenced by product maturity

Major analytic DBMS product groupings


Architecture is a good first categorization

Traditional OLTP Row-based MPP Columnar (Not covered tonight) MOLAP/array-based

Traditional OLTP examples


Oracle (especially pre-Exadata) IBM DB2 (especially mainframe) Microsoft SQL Server (pre-Madison)

Analytic optimizations for OLTP DBMS

Performance Two major kinds of precalculation


Star indexes Materialized views

Other specialized indexes Query optimization tools Other OLAP extensions SQL 2003 Other embedded analytics

Drawbacks

Complexity and people cost Hardware cost Software cost Absolute performance

Legitimate use scenarios

When TCO isnt an issue

Undemanding performance (and therefore administration too) OLTP-like Integrated MOLAP Edge-case analytics

When specialized features matter


Rigid enterprise standards Small enterprise/true single-instance

Row-based MPP examples


Teradata DB2 (open systems version) Netezza Oracle Exadata (sort of) DATAllegro/Microsoft Madison Greenplum Aster Data Kognitio HP Neoview

Typical design choices in row-based MPP

Random (hashed or round-robin) data distribution among nodes Large block sizes

Suitable for scans rather than random accesses Or little optimization for using the full boat

Limited indexing alternatives

Carefully balanced hardware High-end networking

Tradeoffs among row MPP alternatives


Enterprise standards Vendor size Hardware lock-in Total system price Features

Columnar DBMS examples


Sybase IQ Vertica InfoBright SAND ParAccel Kickfire Exasol MonetDB SAP BI Accelerator (sort of)

Columnar pros and cons


Bulk retrieval is faster Pinpoint I/O is slower Compression is easier Memory-centric processing is easier MPP is not as crucial

Being columnar reduces I/O So does (better) compression

Segmentation made (too) simple


One database to rule them all One analytic database to rule them all Frontline analytic database Very, very big analytic database Big analytic database handled very costeffectively

Basics of systematic segmentation


Use cases Metrics Platform preferences

There isnt just one checklist.

Use cases a first cut


Light reporting Diverse EDW Big Data Operational analytics

Metrics a first cut

Total raw/user data


Below 1-2 TB, references abound 10 TB is another major breakpoint 5, 15, 50, or 500?

Total concurrent users

Data freshness

Hours Minutes Seconds

Basic platform issues


Enterprise standards Appliance-friendliness Need for MPP? Cloud/SaaS

The selection process in a nutshell


Figure out what youre trying to buy Make a shortlist Do free POCs* Evaluate and decide

*The only part thats even slightly specific to the analytic DBMS category

Figure out what youre trying to buy

Inventory your use cases


Current Known future Wish-list/dream-list future People and platforms Money Must-haves Nice-to-haves

Set constraints

Establish target SLAs


Use-case checklist -- generalities

Database growth

As time goes by More detail New data sources

Users (human) Users/usage (automated) Freshness (data and query results)

Use-case checklist traditional BI

Reports

Today Future Today Future Latency Users Now that we have great response time

Dashboards and alerts


Ad-hoc

Use-case checklist predictive analytics

How much do you think it would improve results to

Run more models? Model on more data? Add more variables? Increase model complexity?

Which of those can the DBMS help with anyway? What about scoring?

Real-time Other latency issues

SLA realism

What kind of turnaround truly matters?


Customer or customer-facing users Executive users Analyst users Customer or customer-facing users Executive users Analyst users

How bad is downtime?

Short list constraints

Cash cost

But purchases are heavily negotiated Appliances can be good You might as well consider incumbent(s) Appliances can be frowned on

Deployment effort

Platform politics

Filling out the shortlist


Who matches your requirements in theory? What kinds of evidence do you require?

References? How many? How relevant? A careful POC? Analyst recommendations? General buzz?

A checklist for shortlists


Whats your tolerance for specialized hardware? Whats your tolerance for set-up effort? Whats your tolerance for ongoing administration? What are your insert and update requirements? At what volumes will you run fairly simple queries? What are your complex queries like? For which third-party tools do you need support?

and, most important,

Are you madly in love with your current DBMS?

Proof-of-Concept basics

The better you match your use cases, the more reliable the POC is Most of the effort is in the set-up You might as well do POCs for several vendors at (almost) the same time! Where is the POC being held?

The three big POC challenges

Getting data

Real?

Politics Privacy

Synthetic? Hybrid? And more? Workload Platform Talent

Picking queries

Realistic simulation(s)

POC tips

Dont underestimate requirements Dont overestimate requirements Get SOME data ASAP Dont leave the vendor in control Test what youll actually be buying Use the baseball bat

Evaluate and decide


It all comes down to

Cost Speed Risk

and in some cases


Time to value Upside

Further information
Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com

You might also like