0% found this document useful (0 votes)
7 views38 pages

DT-EDU-DENEDU2001 Introduction To Logical Data Management For Data Architects

Uploaded by

ahmed.atef11390
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views38 pages

DT-EDU-DENEDU2001 Introduction To Logical Data Management For Data Architects

Uploaded by

ahmed.atef11390
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Logical Data Management for Data

Architects

Introduction to Logical Data


Management
for Data Architects

DENEDU2001
FOR TRAINING PURPOSES ONLY
Licensed to OnDemand Training Courses - 2024
AGENDA
1. Logical Data Architectures
2. What is Data Virtualization?
3. Logical Data Management Reference Architecture
4. Denodo Logical Data Management Platform: Data Delivery
Models

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024
Logical Data
Architectures

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024
Move All Your Data into a Single…

Data Warehouse Data Lake Cloud

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 4
Centralized Architectures

▪ Centralized: All/Most data in a single system


▪ All data needs to be copied to the target system
▪ Data needs to be replicated to adapt it for each new use
case (data marts)
▪ Managed by a central IT data team

▪ Physical: Consumers need to know:


▪ Data Location
▪ How data is represented in that system
▪ What access methods are supported in that system

▪ Examples: Data Warehouse, Data Lake, Data


LakeHouse

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 5
Centralized Architectures: One Size Never Fits All

“Inherent in the LDW architecture is the recognition that a single data persistence
tier and type of processing is inadequate to meet the full scope of modern data
and analytics demands”

The Practical Logical Data Warehouse (Dec 2020) by Henry Cook, Rick Greenwald and Adam Ronthal

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 6
Most large, established companies maintain multiple data warehouses and data lakes —
some on-prem, some in the cloud; some modern, and some legacy. For these companies,
it would be an expensive 5-10 year endeavor to centralize their data in one place, and that
assumes they don’t start new projects with different data stores or acquire companies.
Unsurprisingly, we are yet to see a large company with a single data warehouse or lake,
and instead, companies are looking to query data, wherever it is, without the need for an
architectural lift and shift to a single point of storage"
Andreesen Horowitz blog post, January 6th 2021

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 7
The Evolution of Data Architectures
This is a Second Major Cycle of Analytical Consolidation
1980s 1990s Time 2000s 2010s
Pre EDW EDW Post EDW Logical Data
Architecture
s
Operational Operational Operational Application Data Operational Application Data Fabric/
Application
Cube Application Warehouse Data Mesh
Operational Application Operational Application Data
Warehouse

Staging/Ingest
Operational Operational Data
Application ? Application Warehouse
Operational Application Operational Application
ODS

IoT Data Data IoT Data Marts


Lake
Operational Operational
Cube Data Lake
Application Application Other NewData ? Other NewData

Fragmented/ Unified analysis Fragmented analysis Unified analysis


nonexistent analysis
› Consolidated data › "Collect the data" (Into › Logically consolidated view of all data
› Multiple sources › "Collect the data" › different repositories) › "Connect and collect"
› Multiple structured sources › Single server, multiple › New data types, › Multiple servers, of multiple nodes
nodes › processing, requirements › More analysis than any one system can provide
› More analysis than any › Uncoordinated views
one server can provide

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 8
What is a Logical Data Architecture?

• Separates business view of data (logical data model)


from physical data storage (physical data model)
• Decouples users from complexity and changes in
underlying data infrastructure
• The ‘logical data layer’ ingests metadata from
physical data stores
• Used to define reusable ‘business’ data
entities
• Logical data layer is searchable and queryable
• Logical layer takes care of getting data from
physical data stores

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 9
A Logical Data Architecture hides complexity

• Sits atop the “physical” data sources

• Connects with the data sources

• Transforms and combines data for


processing

• Exposes ‘business’ views of data

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 10
Logical Data Warehouse Architecture

DATA VIRTUALIZATION

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 11
Logical Data Management Complements Your Data Architecture
A modern data architecture includes a semantic layer to provide unified access to secure and
governed data

You Must • Ease of use: Consumers have a single location (logical) to


Balance access any data.
“Collect” with
“Connect”
• Agile data integration options: One platform to support full
Logical Data range of data integration options from full replication and
transformations, caching and real-time federation options.
Architecture
Connects to data • Centralized security and governance: Access control and
through semantic policy implementation are done consistently in a single
models, decoupled from location.
data location and
• Futureproof: Decoupling from data location and schemas
physical schemas.
allows for technology evolution and infrastructure changes.

83% reduction 65% decrease


in time-to-revenue in delivery times over ETL

Source: Forrester Total Economic ImpactTM of Data


Virtualization, 2021

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 12
What is Data
Virtualization?

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024
What is Data Virtualization?
Data Virtualization is the foundation of a Logical Data Management architecture

Data Abstraction Layer: Abstracts access


to disparate data sources

Acts as a single repository (virtual) DATA ABSTRACTION LAYER

Makes data available in real-time to consumers

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 14
Denodo Architecture

SQL REST SOAP OData GraphQL

VIRTUAL SCHEMA

⨝ ∪

Metadata Cache

ETL

Operational RDBMS EDW SaaS apps Hadoop & NoSQL

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 15
How Does It Work?

JDBC/ODBC/ADO.Net SOAP / REST WS

Development Tools Development


Customer 360 Virtual Data Application
and SDK View Mart View Layer
Lifecycle Mgmt
U J
Business
Scheduled Tasks Unified View Unified View Unified View Unified View
Layer Monitoring & Audit
S J A

Transformation J Derived View Derived View


Data Caching & Cleansing Governance
J J
Data
Source
Base View Base View Base View Base View Base View Base View Base View
Layer
Query Optimizer Security
Abstraction

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 16
Unified Data Integration and Delivery for the Business

1. Single access point to data – 4. Trusted Data: enforce consistent semantics,


Location Independence governance and security

2. Data exposed in business-friendly 5. Active Data Catalog builds a data marketplace


form adapted for each consumer / for the business
LOB – Semantic Layer
6. ML and Automation to accelerate all steps of
3. Up to 80% reductions in data the data management
integration costs and time-to-market

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 17
Denodo Platform Architecture

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 18
Logical Data
Management Reference
Architecture

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024
Logical Data Management Reference Architecture
Enterprise Data Fabric Architecture
Logical Data Fabric / Data
Data Sources Delivery Data Consumption
Data Ingestion Data Storage

Analytical
Business
Reporting
Data Sources EDW Users
Cloud / On Prem Virtual marts

Data Products
iPaaS

Data Science

Core DWH Model


Staging Area
Marketing Domain
Databases
ETL / ELT Data Scientists

Unified Semantic
Finance Domain
CDC
Logical Models

Layer
IoT / Streaming Remote tables
Data

Curated Data Zone


Citizen Analysts

Raw Data Zone


Azure Data Factory, Data Catalog /
Sales Domain Data Marketplace
AWS Glue, etc.

Cloud Apps / Logistic Domain


Kafka / Flume

Semantic Policies
APIs

RBAC / ABAC
Canonical Entities
Customer Service Customer Service
Data Lake Desktop Rep

Operational
Business Customer Service
Applications Read/write

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 22
Reference Architecture from Industry Analysts

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 26
Architecture and Components for Data and Analytics (2022)

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 27
Evolution of logical architectures

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 28
Chronological evolution of logical architectures

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 29
Data Fabric: supported by major Analysts

Source: Forrester Enterprise Data Fabric Wave, June 2020 Source: Demystifying the Data Fabric Gartner, September 2020

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 31
Data Mesh – Federated Governance

Data Mesh

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 32
Data Mesh – Decentralized Data Engineering and Governance

3. Domains can share standardized


definitions.
4. A central team can set
Products can be used to define
guidelines and enforce global
other products
SQL REST GraphQL OData security, quality and
governance policies in the
virtual models
Customer Management Event Management Human Resources

1. Domains create
virtual models in Customer Product Event Location Employee
separate schemas.
Execution servers can
be scaled
independently

2. Domains can choose and


autonomously evolve their
data sources

Operational SaaS APIs Data Lakes Files EDW

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 33
Data Mesh Implementation Best Practices

Domain oriented decentralized data ownership and architecture


JDBC ODBC REST GraphQL OData

Marketing Customer Sales Consumer-Aligned Domain Data Products


Domain Service Domain
Domain

Data Products
Data Products layer Aggregate-Domain Data Products

Source-Domain Data Products


Campaigns Customer Sales

Integration layer

Connection layer

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 34
Data Fabric: foundation of Data Mesh

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 35
Denodo Logical Data Fabric: Multi-location Architecture

Asia Region Amazon RDS,


Aurora
US East
Availability Zone

EMEA
Availability Zone
On-prem
data center

US West Data Center

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 36
Denodo Logical Data
Management Platform:
Data Delivery Modes

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024
Different Data Delivery Modes
Multiple data delivery modes

▪ A modern Logical Data Management platform must support multiple data delivery modes
▪ Real-time, on-demand (with synchronous request-response style query)
▪ Reactive (event-based) using asynchronous messaging
▪ e.g. Kafka, JMS, MQSeries, etc.
▪ Scheduled (batch) delivery using files or target database
▪ Use Denodo Scheduler to query data and write to file or target database
▪ Even delivery data file to user via email
▪ Data can be materialized in an external store at any time
▪ Using Remote Tables the user can materialize any virtual view in an external store

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 39
Reactive/Event-Driven Mode
Message Enrichment or Message Sink

1. Rather than synchronous request-response, execution is


triggered asynchronously by an incoming message. Denodo
acts as a listener to a queue

 
3. Results can be placed back in a
queue

2. A query is executed using the message data as 4. Or persisted in any target system
input values (e.g. filter by device ID)

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 40
Native Kafka Consumer/Producer
Message Enrichment or Message Sink

{
“deviceId”: “123”,
“customerName”: “Pablo”
“contract”: “Premium”,
“email”: “[email protected]”,
“Temperature” : “65”
}
123,65

subscriber

producer
Kafka
Kafka
Device Id: 123
Temperature: 65 F

SELECT * FROM devices WHERE id = 123

Incident Management CRM Web Service


System

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 41
Scheduled/Batch Mode
Denodo Task Scheduler

3. The result handler in the Scheduler can write


the results to a target data repository in a
2. The results of the query are returned number of different formats
to the Scheduler

 Data Exports
Denodo Scheduler • Database

• CSV
• XML
• TDE
• etc.

1. At the configured date/time, the Scheduler 


executes a query against a view in the
Denodo Platform

Email notifications

4. Optionally, an email notification can be sent to


inform the recipient that the data is ready for
them

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 42
Data Materialization with Remote Tables
Denodo Remote Tables

Virtual Tables

join “Remote Table”

union

“Remote Table”

Remote Tables can be created:


a) Through Graphical UI on-demand
b) With CREATE_REMOTE_TABLE command from Shell
c) With CREATE_REMOTE_TABLE via Scheduler
Make use of:
- Parquet files are automatically generated and uploaded
to HDFS / S3 (Spark, Presto, Impala, Hive)
- Bulk data load APIs for databases and data warehouses
Data Lineage is registered:
- Denodo stores the query used to generate the dataset
so it is a fully governed process

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 43
Summary
Data Virtualization allows data abstraction being the foundation for a Logical Data Management Architecture

▪ Data Virtualization is the foundation for a Logical Data Management Architecture


▪ A Logical Data Management architecture provides an abstraction layer that decouples the
consumers from the physical nature of the data stores
▪ A Logical Data layer enables a future-proof data architecture
▪ One that can change and adapt to new requirements, new data sources and formats without
unduly impacting the consumers
▪ A modern Logical Data Management platform must support multiple data delivery modes (or
styles)
▪ Real-time / on-demand (request-response), reactive (asynchronous event-based), scheduled
(batch), data materialization

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024 44
Thanks!

www.denodo.co [email protected]
m
© Copyright Denodo Technologies. All rights reserved
m
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including
photocopying and microfilm, without prior the written authorization from Denodo Technologies.

FOR TRAINING PURPOSES ONLY


Licensed to OnDemand Training Courses - 2024

You might also like