0% found this document useful (0 votes)
23 views12 pages

Starburst Introduction - March 2021

Starburst provides a high-performance SQL query engine that enables businesses to reduce decision risk, increase revenue, and accelerate time to market. The platform supports both on-premise and cloud deployments, offering a federated semantic layer for real-time data access and improved analytics. With significant performance improvements and cost reductions, Starburst aims to streamline data management and analytics processes for enterprises.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views12 pages

Starburst Introduction - March 2021

Starburst provides a high-performance SQL query engine that enables businesses to reduce decision risk, increase revenue, and accelerate time to market. The platform supports both on-premise and cloud deployments, offering a federated semantic layer for real-time data access and improved analytics. With significant performance improvements and cost reductions, Starburst aims to streamline data management and analytics processes for enterprises.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Analytics Anywhere

About Starburst Our Platform

600% Growth YoY Named Open Source ANSI SQL MPP On-Prem, High Massive
Startup to Watch Query Engine or Cloud Concurrency Scale
2020

82

100+ NPS Score Rapid Time to Low Cost of Enterprise 24x7 Expert
Enterprise Insights Ownership Grade Security Support
Customers

2
Our Customers Our Value to Them

Reduced
Decision Risk

Increased Revenue
and Profit

Higher Customer
Retention

Accelerated Time
to Market

3
Today’s data management approach delays analytics
Business has a question Data Engineering services the request Business gets an answer

ETL Data Data Lake


Warehouses

Database

Multiple Copies
Cloud Data Cloud Data
Warehouse Lake

Delays decision-making, increases data costs & complexity


4
Comcast Journey

18 Months 5 Weeks 0 Disruption 93% Faster


Existing Hadoop $200M in Migrating to S3 Adding Delta Lake
Data Lake too Revenue from Teradata and Kafka
TimelineSlow
provided by Timeline to execute the Accelerate cloud Begin ingesting new data
team to execute CMO same CMO cross sell migration without sources in real time for
cross sell campaign campaign business disruption predictive analytics

5
Connectivity: Creating a Portable Access Layer
Data Scientists Finance Marketers Data Analysts

Existing analytics tools

The Data Consumption Layer

Fine-grained access Column + Row-level


Data Masking Data Encryption Query Auditing Global Security control permissions

Data Lakes Relational Databases NoSQL Stores Publish/Subscribe

Azure Event Hub

6
Starburst Trino: SQL Engine Architecture
Data: Storage
Trino Cluster: Compute
Report,
SQL Coordinator Parse
Node Metadata
API Glue/Hive
Optimize
Catalog
Results (CBO)
Data Location
BI Tool, SQL Client, API
CLI Schedule

Key Data
Worker
Coordinator Node GCS ADLS Blob Storage S3

Worker Worker
Node
Auto-scaling group
Data
Worker
Connectors
Node
ODBC/JDBC, CLI
Intra-Cluster
API Call
Separation of compute and storage
Deploy Starburst everywhere - On-premise or Cloud

Hive Metastore
Horizontal Pod
via Helm Charts Service Object Store
Autoscaler (HPA) Storage
Pod

Hadoop / Hive
/ Delta

Starburst Starburst Admin Starburst Worker


Service Presto Worker
Pod Presto Worker Any RDBMS
Pod
Pod Config Properties:
connection-url=<pgsql_url>
connection-password=<password
>
Starburst connection-user=<sb-user>
RDS=PostgreSQL
Coordinator
Pod
Hybrid Deployment - Standard Approach
Global Users
Local Users

● High Latency for large


datasets dependent on the
network connectivity

● High Costs generated by


data movement required for
each query
Data Read
● Metadata duplicated in
Data Read
different environments

● No Control around Data


Residency

Cloud Storage Local Storage


Hybrid Deployment - Starburst Approach
Global users Local Users

● Low Query Latency with reduced


network traffic

Starburst-Remote
● Fully scalable approach that
Connector allows connection between all
your environments

● Unified Metadata model with no


Data Read data duplication
Data Read

● Improved Security Model that


meets any data residency
requirements

Local Storage
Cloud Storage
A Federated Semantic Layer - powered by Trino ( formerly Presto SQL )

● Access data in real-time -


where it lies

● Connect the tool of your


choice

● Different clusters for


different functions
(chargeback)

● Build business views over a


variety of sources

● Additional access control


over all sources
Don’t just take our word for it…
Faster analytics on data in your data lake Faster analytics on data anywhere + query federation

Data Engineers Data Engineers

• “This streamlined workflow helps our executives make


• 99.9% cluster uptime to drive higher CEX
the right decisions on time, and fosters innovation
• 3x performance improvement over OS Presto through machine learning.”

• 25% reduction in TCO • 50% savings on infrastructure compute costs to


expand use cases with same budget

Data Engineers

• Achieved GDPR compliance by leaving data where it lives • Eliminated ETL for joining Oracle, HDFS while
locally replacing Spark/Impala

• Reduced infrastructure usage by 30% • Reduce time to insight for critical risk models 96%

• Improved time to insights for engineers/analysts by 800% • De-risk business decisions in real time

12

You might also like