0% found this document useful (0 votes)

203 views65 pages

Denodo

Denodo Babu

Uploaded by

Avinnaash Suresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

203 views65 pages

Denodo

Denodo Babu

Uploaded by

Avinnaash Suresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Logical Data Warehouse,

Data Lakes, and Data

Services Marketplaces
New York City
9th June, 2016
1.Introductions Agenda
2.Logical Data Warehouse and Data Lakes
3.Coffee Break
4.Data Services Marketplaces
5.Q&A
LEADERSHIP
 Longest continuous focus on data
virtualization and data services.
 Product leadership.
 Solutions expertise.
THE LEADER IN DATA VIRTUALIZATION

Denodo provides agile, high performance data

integration and data abstraction across the broadest
range of enterprise, cloud, big data and unstructured
data sources, and real-time data services at half the
cost of traditional approaches.

HEADQUARTERS
Palo Alto, CA.

DENODO OFFICES, CUSTOMERS, PARTNERS

Global presence throughout North America,
EMEA, APAC, and Latin America.

CUSTOMERS

250+ customers, including many

F500 and G2000 companies across every
major industry have gained significant
business agility and ROI.

3
Speakers

Paul Moxon Pablo Álvarez Rubén Fernández

Senior Director of Product Principal Technical Account Technical Account Manager,

Management, Denodo Manager, Denodo Denodo
Logical Data Warehouse and
Data Lakes
New York City
June 2016
1.The Logical Data Warehouse Agenda
2.Different Types, Different Needs
3.Performance in a LDW
4.Customer Success Stories
5.Q&A
What is a Logical Data Warehouse?
A logical data warehouse is a data system that follows
the ideas of traditional EDW (star or snowflake schemas)
and includes, in addition to one (or more) core DWs,
data from external sources.
The main motivations are improved decision making
and/or cost reduction
Logical Data Warehouse
Gartner Definition

Description:
 “The Logical Data Warehouse (LDW) is a new data management architecture for
analytics combining the strengths of traditional repository warehouses with
alternative data management and access strategy. The LDW will form a new
best practice by the end of 2015.”
 “The LDW is an evolution and augmentation of DW practices, not a replacement”
 “A repository-only style DW contains a single ontology/taxonomy, whereas in the
LDW a semantic layer can contain many combination of use cases, many
business definitions of the same information”
 “The LDW permits an IT organization to make a large number of datasets
available for analysis via query tools and applications.”

Gartner Hype Cycle for Enterprise Information Management, 2012

8
Logical Data Warehouse
Gartner Definition

Gartner Hype Cycle for Enterprise Information Management, 2012

9
Logical Data Warehouse

Description:
 A semantic layer on top of the data warehouse that keeps the business data
definition.
 Allows the integration of multiple data sources including enterprise systems,
the data warehouse, additional processing nodes (analytical appliances, Big
Data, …), Web, Cloud and unstructured data.
 Publishes data to multiple applications and reporting tools.

10
Three Integration/Semantic Layer Alternatives
Gartner’s View of Data Integration

Application/BI Tool as Data EDW as Data Data Virtualization as Data

Integration/Semantic Layer Integration/Semantic Layer Integration/Semantic Layer

Application/BI Tool Data Virtualization

EDW

EDW ODS ODS EDW ODS

11
Application/BI Tool as the Data Integration Layer

Application/BI Tool as Data

• Integration is delegated to end user tools
Integration/Semantic Layer and applications
• e.g. BI Tools with ‘data blending’

Application/BI Tool • Results in duplication of effort – integration

defined many times in different tools
• Impact of change in data schema?

• End user tools are not intended to be

EDW ODS integration middleware
• Not their primary purpose or expertise

12
EDW as the Data Integration Layer

• Access to ‘other’ data (query federation) via

EDW as Data
Integration/Semantic Layer EDW
• Teradata QueryGrid, IBM FluidQuery, SAP
Smart Data Access, etc.
• Often coupled with traditional ETL replication
EDW
of data into EDW
• EDW ‘center of data universe’
• Provides data integration and semantic layer
ODS • Appears attractive to organizations heavily
invested in EDW
• More than one EDW? EDW costs?

13
Data Virtualization as the Data Integration Layer

• Move data integration and semantic layer to

Data Virtualization as Data
Integration/Semantic Layer independent Data Virtualization platform
• Purpose built for supporting data access
across multiple heterogeneous data sources
Data Virtualization
• Separate layer provides semantic models for
underlying data
• Physical to logical mapping

EDW ODS • Enforces common and consistent security

and governance policies
• Gartner’s recommended approach

14
Logical Data Warehouse

HDFS Document
ERP Sales
Files Collections

Database EDW Hadoop NoSQL Excel

Cluster Database

15
Logical Data Warehouse
Reference Architecture by Denodo

16
Physical data movement architectures that aren’t designed to
support the dynamic nature of business change, volatile
requirements and massive data volume are increasingly being
replaced by data virtualization.

Evolving approaches (such as the use of LDW architectures) include

implementations beyond repository-centric techniques

The State and Future of Data Integration. Gartner, 25 may 2016

17
What about the Logical Data Lake?
A Data Lake will not have a star or snowflake schema, but rather a more
heterogeneous collection of views with raw data from heterogeneous
sources

The virtual layer will act as a common umbrella under which these
different sources are presented to the end user as a single system

However, from the virtualization perspective, a Virtual Data Lake shares

many technical aspects with a LDW and most of these contents also
apply to a Logical Data Lake
Different Types, Different
Needs
Common Patterns for a Logical Data Warehouse
Common Patterns for a Logical Data Warehouse
1. The Virtual Data Mart
2. DW + MDM
 Data Warehouse extended with master data
3. DW + Cloud
 Data Warehouse extended with cloud data
4. DW + DW
 Integration of multiple Data Warehouse
5. DW historical offloading
 DW horizontal partitioning with historical data in cheaper storage
6. Slim DW extension
 DW vertical partitioning with rarely used data in cheaper storage

20
Virtual Data Marts
Simplified semantic models for business users

Business friendly models defined on top of one or multiple systems,

often “flavored” for a particular division
Motivation
 Hide complexity of star schemas for business users
 Simplify model for a particular vertical
 Reuse semantic models and security across multiple reporting engines

Typical queries
 Simple projections, filters and aggregations on top of curated “fat tables”
that merge data from facts and many dimensions

21
Virtual Data Marts

Sales Product

Retailer
Dimension
Prod. Details
Time Dimension Fact table
(sales)
Product

EDW Others

22
DW + MDM
Slim dimensions with extended information maintained in an external
MDM system
Motivation
 Keep a single copy of golden records in the MDM that can be reused across
systems and managed in a single place

Typical queries
 Join a large fact table (DW) with several MDM dimensions, aggregations on
top

Example
 Revenue by customer, projecting the address from the MDM

23
DW + MDM dimensions

Retailer
Dimension

Time Dimension Fact table

(sales) Product Dimension

EDW MDM

24
DW + Cloud dimensional data
Fresh data from cloud systems (e.g. SFDC) is mixed with the EDW, usually
on the dimensions. DW is sometimes also in the cloud.
Motivation
 Take advantage of “fresh” data coming straight from SaaS systems
 Avoid local replication of cloud systems

Typical queries
 Dimensions are joined with cloud data to filter based on some external attribute
not available (or not current) in the EDW

Example
 Report on current revenue on accounts where the potential for an expansion is
higher than 80%

25
DW + Cloud dimensional data

Customer
Dimension
SFDC
Customer
Time Dimension Fact table
(sales) Product Dimension

EDW CRM

26
Multiple DW integration
Use of multiple DW as if it was only one

Motivation
 Merges and acquisitions
 Different DWs by department
 Transition to new EDW Deployments (migration to Spark, Redshift, etc.)

Typical queries
 Joins across fact tables in different DW with aggregations before or after the JOIN

Example
 Get customers with a purchases higher than 100 USD that do not have a fidelity
card (purchases and fidelity card data in different DW)

27
Multiple DW integration

Product Store
Dimension City
Region

Time
Dimensi Sales fact
on Customer Fidelity facts
Product
Dimension

Marketing EDW
Finance EDW

*Real Examples: Nationwide POC, IBM tests

28
DW Historical Partitioning
Horizontal partitioning

Only the most current data (e.g. last year) is in the EDW. Historical data is
offloaded to a Hadoop cluster
Motivations
 Reduce storage cost
 Transparently use the two datasets as if they were all together
Typical queries
 Facts are defined as a partitioned UNION based on date
 Queries join the “virtual fact” with dimensions and aggregate on top
Example
 Queries on current date only need to go to the DW, but longer timespans need to merge
with Hadoop

29
DW Historical offloading
Horizontal partitioning

Retailer
Dimension

Time Dimension Fact table

(sales) Product Dimension

Current Sales Historical Sales

EDW

30
Slim DW extension
Vertical partitioning

Minimal DW, with more complete raw data in a Hadoop cluster

Motivation
 Reduce cost
 Transparently use the two datasets as if they were all together
Typical queries
 Tables are defined virtually as 1-to-1 joins between the two systems
 Queries join the facts with dimensions and aggregate on top
Example
 Common queries only need to go to the DW, but some queries need attributes or
measures from Hadoop

31
Slim DW extension
Vertical partitioning

Retailer
Dimension

Time Dimension Fact table

(sales) Product Dimension

Slim Sales Extended Sales

EDW

32
Performance in a LDW
It is a common assumption that a virtualized solution will
be much slower than a persisted approach via ETL:

1. There is a large amount of data moved through the

network for each query

2. Network transfer is slow

But is this really true?

34
Debunking the myths of virtual performance

1. Complex queries can be solved transferring moderate data volumes when

the right techniques are applied
 Operational queries
 Predicate delegation produces small result sets
 Logical Data Warehouse and Big Data
 Denodo uses characteristics of underlying star schemas to apply
query rewriting rules that maximize delegation to specialized sources
(especially heavy GROUP BY) and minimize data movement
2. Current networks are almost as fast as reading from disk
 10GB and 100GB Ethernet are a commodity

35
Performance Comparison
Logical Data Warehouse vs. Physical Data Warehouse
Denodo has done extensive testing using queries from the standard benchmarking test
TPC-DS* and the following scenario
Compares the performance of a federated approach in Denodo with an MPP system where
all the data has been replicated via ETL

vs.
Customer Dim. Items Dim.
Sales Facts Sales Facts
2 M rows 400 K rows Customer Dim. Items Dim.
290 M rows 290 M rows
2 M rows 400 K rows

* TPC-DS is the de-facto industry standard benchmark for measuring the performance of
decision support solutions including, but not limited to, Big Data systems.
36
Performance Comparison
Logical Data Warehouse vs. Physical Data Warehouse

Time Denodo
Returned Optimization Technique
Query Description Time Netezza (Federated Oracle,
Rows (automatically selected)
Netezza & SQL Server)

Total sales by customer 1,99 M 20.9 sec. 21.4 sec. Full aggregation push-down

Total sales by customer and

5,51 M 52.3 sec. 59.0 sec Full aggregation push-down
year between 2000 and 2004

Total sales by item brand 31,35 K 4.7 sec. 5.0 sec. Partial aggregation push-down

Total sales by item where

sale price less than current 17,05 K 3.5 sec. 5.2 sec On the fly data movement
list price

37
Performance and optimizations in Denodo
Focused on 3 core concepts

Dynamic Multi-Source Query Execution Plans

Leverages processing power & architecture of data sources
Dynamic to support ad hoc queries
Uses statistics for cost-based query plans
Selective Materialization
Intelligent Caching of only the most relevant and often used
information
Optimized Resource Management
Smart allocation of resources to handle high concurrency
Throttling to control and mitigate source impact
Resource plans based on rules

38
Performance and optimizations in Denodo
Comparing optimizations in DV vs ETL

Although Data Virtualization is a data integration platform,

architecturally speaking it is more similar to a RDBMs
Uses relational logic
Metadata is equivalent to that of a database
Enables ad hoc querying

Key difference between ETL engines and DV:

ETL engines are optimized for static bulk movements
Fixed data flows
Data virtualization is optimized for queries
Dynamic execution plan per query

Therefore, the performance architecture presented here

resembles that of a RDBMS

39
Query Optimizer
How Dynamic Query Optimizer Works
Step by Step
• Maps query entities (tables, fields) to actual metadata
• Retrieves execution capabilities and restrictions for views involved
Metadata
Query Tree in the query

• Query delegation
• SQL rewriting rules (removal of redundant filters, tree pruning, join
Static reordering, transformation push-up, star-schema rewritings, etc.)
Optimizer
• Data movement query plans

• Picks optimal JOIN methods and orders based on data distribution

Cost Based statistics, indexes, transfer rates, etc.
Optimizer

• Creates the calls to the underlying systems in their corresponding

Physical protocols and dialects (SQL, MDX, WS calls, etc.)
Execution Plan

41
How Dynamic Query Optimizer Works
Example: Total sales by retailer and product during the last month for the brand ACME

SELECT retailer.name,
product.name,
SUM(sales.amount)
FROM
sales JOIN retailer ON
sales.retailer_fk = retailer.id
Retailer JOIN product ON sales.product_fk =
Dimension
product.id
JOIN time ON sales.time_fk = time.id
Time Dimension Fact table
(sales) Product Dimension WHERE time.date < ADDMONTH(NOW(),-1)
AND product.brand = ‘ACME’
GROUP BY product.name, retailer.name

EDW MDM

42
How Dynamic Query Optimizer Works
Example: Non-optimized

GROUP BY
product.name,
10,000,000
retailer.name rows

JOIN

1,000,000,0 100 rows 10 rows 30 rows

00 rows

SELECT SELECT SELECT SELECT time.date,

sales.retailer_fk, retailer.name, product.name, time.id
sales.product_fk, retailer.id product.id FROM time
sales.time_fk, FROM retailer FROM product WHERE time.date <
sales.amount WHERE add_months(CURRENT_
FROM sales produc.brand = TIMESTAMP, -1)
‘ACME’
43
How Dynamic Query Optimizer Works
Step 1: Applies JOIN reordering to maximize delegation

GROUP BY
product.name,
10,000,000
retailer.name rows

JOIN

100,000,000 100 rows 10 rows

rows

SELECT sales.retailer_fk, SELECT SELECT product.name,

sales.product_fk, retailer.name, product.id
sales.amount retailer.id FROM product
FROM sales JOIN time ON FROM retailer WHERE
sales.time_fk = time.id WHERE produc.brand = ‘ACME’
time.date <
add_months(CURRENT_TIMESTAMP, -1)
44
How Dynamic Query Optimizer Works
Step 2
GROUP BY
Since the JOIN is on foreign keys product.name,
retailer.name
1,000 rows

(1-to-many), and the GROUP BY is

JOIN
on attributes from the dimensions,
JOIN
it applies the partial aggregation
push down optimization
10,000 rows 100 rows 10 rows

SELECT sales.retailer_fk, SELECT SELECT product.name,

sales.product_fk, retailer.name, product.id
SUM(sales.amount) retailer.id FROM product
FROM sales JOIN time ON FROM retailer WHERE
sales.time_fk = time.id WHERE produc.brand = ‘ACME’
time.date <
add_months(CURRENT_TIMESTAMP, -1)
GROUP BY sales.retailer_fk,
sales.product_fk

45
How Dynamic Query Optimizer Works
Step 3
GROUP BY
product.name, 1,000 rows
retailer.name

Selects the right JOIN

HASH
strategy based on costs for JOIN
NESTED
data volume estimations JOIN

10 rows 1,000 rows 100 rows

SELECT product.name, SELECT sales.retailer_fk, SELECT

product.id sales.product_fk, retailer.name,
FROM product SUM(sales.amount) retailer.id
WHERE FROM sales JOIN time ON FROM retailer
produc.brand = ‘ACME’ sales.time_fk = time.id WHERE
time.date <
add_months(CURRENT_TIMESTAMP, -1)
GROUP BY sales.retailer_fk,
sales.product_fk
WHERE product.id IN (1,2,…)

46
How Dynamic Query Optimizer Works
Summary

1. Automatic JOIN reordering

 Groups branches that go to the same source to maximize query delegation and reduce processing in the DV
layer
 End users don’t need to worry about the optimal “pairing” of the tables

2. The Partial Aggregation push-down optimization is key in those scenarios. Based on

PK-FK restrictions, pushes the aggregation (for the PKs) to the DW
 Leverages the processing power of the DW, optimized for these aggregations
 Reduces significantly the data transferred through the network (from 1 b to 10 k)

3. The Cost-based Optimizer picks the right JOIN strategies based on estimations on data
volumes, existence of indexes, transfer rates, etc.
 Denodo estimates costs in a different way for parallel databases (Vertica, Netezza, Teradata) than for regular
databases to take into consideration the different way those systems operate (distributed data, parallel
processing, different aggregation techniques, etc.)

47
How Dynamic Query Optimizer Works
Other relevant optimization techniques for LDW and Big Data

Automatic data movement

 Creation of temp tables in one of the systems to enable complete delegation
 Only considered as an option if the target source has the “data movement” option
enabled
 Use of native bulk load APIs for better performance

Execution Alternatives
 If a view exist in more than one system, Denodo can decide in execution time which one
to use
 The goal is to maximize query delegation depending on the other tables involved in the
query

48
How Dynamic Query Optimizer Works
Other relevant optimization techniques for LDW and Big Data

Optimizations for Virtual Partitioning

Eliminates unnecessary queries and processing based on a pre-execution analysis of the
views and the queries
 Pruning of unnecessary JOIN branches
 Relevant for horizontal partitioning and “fat” semantic models when queries do not
need attributes for all the tables
 Pruning of unnecessary UNION branches
 Enables detection of unnecessary UNION branches in vertical partitioning scenarios
 Push down of JOIN under UNION views
 Enables the delegation of JOINs with dimensions
 Automatic Data movement for partition scenarios
 Enables the delegation of JOINs with dimensions

49
Caching

50
Caching
Real time vs. caching

Sometimes, real time access & federation not a good fit:

 Sources are slow (ex. text files, cloud apps. like Salesforce.com)
 A lot of data processing needed (ex. complex combinations, transformations,
matching, cleansing, etc.)
 Limited access or have to mitigate impact on the sources

For these scenarios, Denodo can replicate just the relevant data in
the cache

51
Caching
Overview

Denodo’s cache system is based on an external relational database

 Traditional (Oracle, SQLServer, DB2, MySQL, etc.)
 MPP (Teradata, Netezza, Vertica, Redshift, etc.)
 In-memory storage (Oracle TimesTen, SAP HANA)

Works at view level.

 Allows hybrid access (real-time / cached) of an execution tree
Cache Control (population / maintenance)
 Manually – user initiated at any time
 Time based - using the TTL or the Denodo Scheduler
 Event based - e.g. using JMS messages triggered in the DB

52
References

53
Further Reading

Data Virtualization Blog (http://www.datavirtualizationblog.com)

Check the following articles written by our CTO Alberto Pan in our blog:
• Myths in data virtualization performance
• Performance of Data Virtualization in Logical Data Warehouse scenarios
• Physical vs Logical Data Warehouse: the numbers

• Cost Based Optimization in Data Virtualization

Denodo Cookbook
• Data Warehouse Offloading

54
Success Stories
Customer Case Studies
Autodesk Overview

• Founded 1982 (NASDAQ: ASDK)

• Annual revenues (FY 2015) $2.5B
 Over 8,800 employees
• 3D modeling and animation software
 Flagship product is AutoCAD
• Market sectors:
 Architecture, Engineering, and Construction
 Manufacturing
 Media and Entertainment
 Recently started 3D Printing offerings

56
Business Drivers for Change

• Software consumption model is changing

 Perpetual licenses to subscriptions
 User want more flexibility in how they use software
• Autodesk needed to transition to subscription pricing
 2016 – some products will be subscription only
• Lifetime revenue higher with subscriptions
 Over 3-5 years, subscriptions = more revenues
• Changing a licensing model is disruptive

57
Technology Challenges

• Current ‘traditional’ BI/EDW architecture not

designed for data streams from online apps
 Weblogs, Clickstreams, Cloud/Desktop apps, etc.
• Existing infrastructure can’t simply ‘go away’
 Regulatory reporting (e.g. SEC)
 Existing ‘perpetual’ customers
• ‘Subscription’ infrastructure work in parallel
 Extend and enhance existing systems
 With single access point to all data
• Solution – ‘Logical Data Warehouse’

58
Logical Data Warehouse at Autodesk

59
Logical Data Warehouse at Autodesk
Traditional BI/Reporting

60
Logical Data Warehouse at Autodesk
‘New Data’ Ingestion

61
Logical Data Warehouse Example
Reporting on Combined Data

62
Case Study Autodesk Successfully Changes Their
Revenue Model and Transforms Business
Autodesk, Inc. is an American multinational software corporation that makes software for the
architecture, engineering, construction, manufacturing, media, and entertainment industries.

Problem Solution Results

 Autodesk was changing their business  General purpose platform to deliver  Successfully transitioned to
revenue model from a conventional data through logical data warehouse. subscription-based licensing.
perpetual license model to
subscription-based license model.
 Denodo Abstraction Layer helps live  For the first time, Autodesk can do
 Inability to deliver high quality data in invoicing with SAP. single point security enforcement and
a timely manner to business have uniform data environment for
stakeholders. access.
 Data virtualization enabled a culture
of “see before you build”.
 Evolution from traditional operational
data warehouse to contemporary
logical data warehouse deemed
necessary for faster speed.

63
Q A
&
Thanks!

www.denodo.com [email protected]
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical,
including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

Semi-Detailed Lesson Plan in English 8
100% (1)
Semi-Detailed Lesson Plan in English 8
2 pages
Denodo Data Virtualization Basics
100% (1)
Denodo Data Virtualization Basics
57 pages
Imprest Format
No ratings yet
Imprest Format
3 pages
Be Electrical Engineering Semester 5 2023 December Renewable Energy Sourcesrev 2019 C Scheme
No ratings yet
Be Electrical Engineering Semester 5 2023 December Renewable Energy Sourcesrev 2019 C Scheme
1 page
Harshit Ipr PPT Mba Sec B First Sem
No ratings yet
Harshit Ipr PPT Mba Sec B First Sem
12 pages
A.Datum Case Study
No ratings yet
A.Datum Case Study
23 pages
Modern Data Warehouse White Paper PDF
100% (1)
Modern Data Warehouse White Paper PDF
26 pages
DC-6 Om
100% (4)
DC-6 Om
522 pages
Abrahams & Millar (2008)
No ratings yet
Abrahams & Millar (2008)
27 pages
FINAL MANUSCRIPTTTTTTTTTTtttttttttttttttttttttttttttttttttttttTTTTTTTTTTT
No ratings yet
FINAL MANUSCRIPTTTTTTTTTTtttttttttttttttttttttttttttttttttttttTTTTTTTTTTT
24 pages
Data Lake 1
No ratings yet
Data Lake 1
19 pages
Second Quarter Lesson Plan in English 7
No ratings yet
Second Quarter Lesson Plan in English 7
5 pages
Syllabus
No ratings yet
Syllabus
7 pages
Chapter 7 Software Reuse
No ratings yet
Chapter 7 Software Reuse
30 pages
Carolina Reaper
No ratings yet
Carolina Reaper
19 pages
Intervention21120-5570393 152823
No ratings yet
Intervention21120-5570393 152823
10 pages
PLC Interview Questions
No ratings yet
PLC Interview Questions
3 pages
6648 0400 5 PS Pi 0001 - F PDF
100% (1)
6648 0400 5 PS Pi 0001 - F PDF
97 pages
Data Warehouse
No ratings yet
Data Warehouse
68 pages
Practical Set-1: The Result Is 600 The Result Is 70
No ratings yet
Practical Set-1: The Result Is 600 The Result Is 70
12 pages
Kimball Vs Inmon
No ratings yet
Kimball Vs Inmon
28 pages
Hazard Identification: 2. Risk Analysis/Evaluation 3. Risk Control
No ratings yet
Hazard Identification: 2. Risk Analysis/Evaluation 3. Risk Control
2 pages
Data Warehousing Interview Questions
No ratings yet
Data Warehousing Interview Questions
6 pages
4as Tle7 LC4
No ratings yet
4as Tle7 LC4
5 pages
My NoteBook
No ratings yet
My NoteBook
17 pages
H 0010-20-43061 2 10 0 Pds Protocol Programmer S Guide
No ratings yet
H 0010-20-43061 2 10 0 Pds Protocol Programmer S Guide
172 pages
CH 2 Introduction To Data Warehousing
No ratings yet
CH 2 Introduction To Data Warehousing
31 pages
NoSQL Data Modeling Techniques - Highly Scalable Blog
0% (1)
NoSQL Data Modeling Techniques - Highly Scalable Blog
32 pages
1. 听力部分SL Mock Examination02-S
No ratings yet
1. 听力部分SL Mock Examination02-S
8 pages
Cambridge O Level: Environmental Management 5014/22
No ratings yet
Cambridge O Level: Environmental Management 5014/22
11 pages
Microsoft Modern Data Estate
No ratings yet
Microsoft Modern Data Estate
48 pages
DimensionalityModeling 2023
No ratings yet
DimensionalityModeling 2023
25 pages
Conceptual Data Vault Model
100% (1)
Conceptual Data Vault Model
7 pages
Data Security
No ratings yet
Data Security
13 pages
Dimensional Modeling
100% (1)
Dimensional Modeling
12 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
Dimension Modeling
No ratings yet
Dimension Modeling
37 pages
Informatica CDC
No ratings yet
Informatica CDC
4 pages
Data Modeling Principles
100% (1)
Data Modeling Principles
21 pages
DQ Architecture
0% (1)
DQ Architecture
3 pages
Lesson 4 (Computer Maintenance)
No ratings yet
Lesson 4 (Computer Maintenance)
4 pages
Govindarajan Data Vault PDF
100% (1)
Govindarajan Data Vault PDF
29 pages
DT-EDU-DeN60EDU0101. Virtual DataPort Architecture
No ratings yet
DT-EDU-DeN60EDU0101. Virtual DataPort Architecture
23 pages
Talend Open Studio For Master Data Management: A Practical Starter Guide 2nd Edition
No ratings yet
Talend Open Studio For Master Data Management: A Practical Starter Guide 2nd Edition
100 pages
External Tables
No ratings yet
External Tables
105 pages
Speed Your Data Lake ROI
100% (1)
Speed Your Data Lake ROI
16 pages
Lakehouse: A Unified Data Architecture
No ratings yet
Lakehouse: A Unified Data Architecture
9 pages
ETL QA Sample Scenario V3
100% (2)
ETL QA Sample Scenario V3
3 pages
Access Control Snowflake
No ratings yet
Access Control Snowflake
6 pages
100 Days of Data Engineering - Make A Copy and Use As You Need - Sheet1
No ratings yet
100 Days of Data Engineering - Make A Copy and Use As You Need - Sheet1
4 pages
Teradata Studio User Guide
No ratings yet
Teradata Studio User Guide
256 pages
Service Manual, PM7100, English PT00112534 Rev A Release 8-2020
No ratings yet
Service Manual, PM7100, English PT00112534 Rev A Release 8-2020
64 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
HOL Informatica DataQuality 9.1
No ratings yet
HOL Informatica DataQuality 9.1
48 pages
Post WW Ii Latin American Boom: 21 Century Literature From The Philippines and The World Week 4 Topic
No ratings yet
Post WW Ii Latin American Boom: 21 Century Literature From The Philippines and The World Week 4 Topic
2 pages
OBIEE Semantic Layer
No ratings yet
OBIEE Semantic Layer
3 pages
Informatica IDQ Dashboard Reports 961
No ratings yet
Informatica IDQ Dashboard Reports 961
14 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Case Ih Tractor Ignition Electrical Parts
100% (2)
Case Ih Tractor Ignition Electrical Parts
16 pages
Chemical Engineering in Practice Second Edition - Sampler
100% (1)
Chemical Engineering in Practice Second Edition - Sampler
99 pages
Slowly Changing Dimensions
No ratings yet
Slowly Changing Dimensions
26 pages
FSLDM Data Modeller
No ratings yet
FSLDM Data Modeller
1 page
IOT Smart Energy Grid
No ratings yet
IOT Smart Energy Grid
10 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
3 Snowflake+Architecture
No ratings yet
3 Snowflake+Architecture
20 pages
Senior Data Engineer Resume Example
No ratings yet
Senior Data Engineer Resume Example
1 page
Big Data Engineer Interview Questions
No ratings yet
Big Data Engineer Interview Questions
1 page
Data Modeling Interviews
No ratings yet
Data Modeling Interviews
16 pages
150 Data Engineering Interview Questions PDF
No ratings yet
150 Data Engineering Interview Questions PDF
8 pages
Dice Resume CV SN
No ratings yet
Dice Resume CV SN
5 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Battle of The Giants - Comparing Kimball and Inmon
No ratings yet
Battle of The Giants - Comparing Kimball and Inmon
15 pages
A Framework For ETL Systems Development
No ratings yet
A Framework For ETL Systems Development
16 pages
Data Architect or ETL Architect
100% (1)
Data Architect or ETL Architect
4 pages
Data Vault and HQDM Principles PDF
No ratings yet
Data Vault and HQDM Principles PDF
8 pages
Designing Data Integration The ETL Pattern Approac
No ratings yet
Designing Data Integration The ETL Pattern Approac
9 pages
Star and Snowflake Schemas
No ratings yet
Star and Snowflake Schemas
4 pages
Documenting ETL Rules in CA ERwin
No ratings yet
Documenting ETL Rules in CA ERwin
25 pages
Autopage C3-RS665 PDF
No ratings yet
Autopage C3-RS665 PDF
34 pages
Snowflake:: Data Warehouse For Cloud
No ratings yet
Snowflake:: Data Warehouse For Cloud
2 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
Data Modeling Interview Questions
No ratings yet
Data Modeling Interview Questions
2 pages
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Self-Service Data Analytics and Governance for Managers
From Everand
Self-Service Data Analytics and Governance for Managers
Nathan E. Myers
No ratings yet
IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance
From Everand
IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance
Sunil Soares
3.5/5 (2)
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
From Everand
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
Exam OG
5/5 (1)
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet

Denodo

Uploaded by

Denodo

Uploaded by

Logical Data Warehouse,

Data Lakes, and Data

Denodo provides agile, high performance data

DENODO OFFICES, CUSTOMERS, PARTNERS

250+ customers, including many

Paul Moxon Pablo Álvarez Rubén Fernández

Senior Director of Product Principal Technical Account Technical Account Manager,

Gartner Hype Cycle for Enterprise Information Management, 2012

Gartner Hype Cycle for Enterprise Information Management, 2012

Application/BI Tool as Data EDW as Data Data Virtualization as Data

Application/BI Tool Data Virtualization

EDW ODS ODS EDW ODS

Application/BI Tool as Data

Application/BI Tool • Results in duplication of effort – integration

• End user tools are not intended to be

• Access to ‘other’ data (query federation) via

• Move data integration and semantic layer to

EDW ODS • Enforces common and consistent security

Database EDW Hadoop NoSQL Excel

Evolving approaches (such as the use of LDW architectures) include

The State and Future of Data Integration. Gartner, 25 may 2016

However, from the virtualization perspective, a Virtual Data Lake shares

Business friendly models defined on top of one or multiple systems,

Time Dimension Fact table

*Real Examples: Nationwide POC, IBM tests

Time Dimension Fact table

Current Sales Historical Sales

Minimal DW, with more complete raw data in a Hadoop cluster

Time Dimension Fact table

Slim Sales Extended Sales

1. There is a large amount of data moved through the

2. Network transfer is slow

But is this really true?

1. Complex queries can be solved transferring moderate data volumes when

Total sales by customer and

Total sales by item where

Dynamic Multi-Source Query Execution Plans

Although Data Virtualization is a data integration platform,

Key difference between ETL engines and DV:

Therefore, the performance architecture presented here

• Picks optimal JOIN methods and orders based on data distribution

• Creates the calls to the underlying systems in their corresponding

1,000,000,0 100 rows 10 rows 30 rows

SELECT SELECT SELECT SELECT time.date,

100,000,000 100 rows 10 rows

SELECT sales.retailer_fk, SELECT SELECT product.name,

(1-to-many), and the GROUP BY is

SELECT sales.retailer_fk, SELECT SELECT product.name,

Selects the right JOIN

10 rows 1,000 rows 100 rows

SELECT product.name, SELECT sales.retailer_fk, SELECT

1. Automatic JOIN reordering

2. The Partial Aggregation push-down optimization is key in those scenarios. Based on

Automatic data movement

Optimizations for Virtual Partitioning

Sometimes, real time access & federation not a good fit:

Denodo’s cache system is based on an external relational database

Works at view level.

Data Virtualization Blog (http://www.datavirtualizationblog.com)

• Cost Based Optimization in Data Virtualization

• Founded 1982 (NASDAQ: ASDK)

• Software consumption model is changing

• Current ‘traditional’ BI/EDW architecture not

Problem Solution Results

You might also like