Denodo8 - Metadata Management Overview
Denodo8 - Metadata Management Overview
INTRODUCTION
Denodo is not an 'inventory-style' Data Catalog solution like others in the market, but a modern Data Delivery platform
which includes a 'data marketplace-style' Data Catalog to help business users, data scientists, etc. to discover the data
assets that are available through the Denodo Platform itself.
The Denodo Platform aims to be the unique point of access for 'certified data' for business users and applications across
the enterprise. Denodo provides real-time integration of heterogeneous, distributed, structured and semi-structured
data sources. In essence, Denodo integrates and manages data that is relevant to the enterprise, regardless of its origin
and format and makes it available for making business decisions in real-time.
In Denodo you create models (metadata only, as Denodo does not store data but accesses the sources in real-time /
right-time) that expose certified data (i.e. data certified by IT, secured and governed) to the business. Denodo shares
extensive Metadata Management capabilities with traditional Data Catalog solutions, including discovery, search, and
lineage. The key here is that Denodo manages the metadata related to the data assets that are actually delivered to the
business. This means that Denodo Data Catalog will not contain information about data assets that do not contain
curated data meaningful for decision making or analytics (e.g. certain tables in transactional systems, ETL workflow
transformations metadata, etc.)
Unique to the Denodo Data Catalog, in Denodo there is no mismatch between the metadata that business users, data
scientists, data stewards or LOB developers use to discover useful information, and the actual data that is delivered to
them through the data delivery layer, the Denodo Platform. This means that the Denodo Data Catalog forms a tandem
with the Denodo Platform, and both should be used jointly.
With that said, the Denodo Platform provides extensive metadata management capabilities such as:
● discovery and import of metadata from data sources and/or data models to define data entities within the
Denodo Platform
● graphical tools to manage the data entity definitions
● metadata catalog of the available data entities and their definitions
● data lineage tracing to understand how the data is altered as it flows through the Denodo Platform
● impact analysis tools to understand the effect of changes in underlying data source schemas
● discovery and definition of relationships between data entities
● user defined metadata attributes and relationships
● sharing of metadata with external third-party tools, such as business glossaries, metadata management tools,
data quality tools, etc.
The Denodo Platform supports the discovery and capture of metadata from database management systems and the
synchronization of data models with data modeling tools.
For the synchronization of data models created in Denodo with the data sources’ actual schemas, the Denodo Platform
provides source refresh and change impact analysis capabilities. The Denodo Platform identifies any change in the
underlying data source schema, allowing the user to decide whether to propagate the change to the Denodo model or
not.
For the capture and synchronization of the Denodo logical model with data models defined in external third-party data
modeling tools, the Denodo Model Bridge allows the user to import the data models from data modeling tools such as
ERwin, Embarcadero E/R Studio, InfoSphere Data Architect or SAP PowerDesigner to create the corresponding data
entities within the Denodo Platform. These logical data entities (called 'Interface Views') can then be implemented by
connecting them to the corresponding physical data model entities (that represent the underlying data sources). The
Denodo Model Bridge also allows users to view differences between the data entities within the Denodo Platform and
the data entities in the modeling tool - e.g. if changes have been made to the conceptual model in the modeling tool,
how is it different from the implemented data entities in the Denodo Platform.
These activities can be done through GUI tools. In addition, the same functionality is offered through an API so it can be
seamlessly integrated within 3rd-party Metadata tools or within your corporate processes. Denodo provides a metadata
API - a set of built-in stored procedures for metadata management. Those stored procedures can be invoked from SQL
(JDBC/ODBC/ADO.Net) clients or published as WS (SOAP/RESTful) for lightweight clients.
METADATA DISCOVERY
The Denodo Platform enables automatic discovery of metadata across a broad range of sources allowing powerful data
relationships to be identified. It allows introspection of RDBMS (through JDBC/ODBC) including Primary Key and Indexes,
SOAP Web Services (through WSDL), XML repositories (through DTD, XML Schemas, or automatic generation for
schema-less XML files), LDAP, CSV, delimited files, JSON documents, and salesforce.com data, among others, as well as
schemas from packaged apps such as SAP R/3, ECC, etc.
Denodo provides metadata introspection from the data sources, including data types, primary and foreign keys, NOT
NULL restrictions, indexes and field statistics (such as the number of distinct values, max and min value, etc.).
In addition, Denodo supports the modeling of relationships (called 'associations') between data entities. On the one
hand, Denodo can automatically infer the associations between base views of the same data source when there is a
PK-FK relationship defined at the source level. On the other hand, associations between elements of different data
sources (so there is not a PK-FK relationship defined at source level) can be inferred programmatically based on ad-hoc
heuristics (e.g. on fields with the same name and data type).
The Denodo Platform provides a graphical web-based tool that lets the user define access, combination, and publishing
of data from heterogeneous data sources as data services. This is available for all types of data sources. Using this tool,
users can introspect the metamodels of the sources, import the required metadata, establish relationships between
source metadata, and combine/transform source data in complex ways before publishing the resulting data service. It is
also supported to refresh the metadata from the sources to detect changes and examine the differences between the
previous and the current metadata in the source.
DATA GOVERNANCE
The Denodo Platform leverages data virtualization to offer comprehensive data and metadata discovery and
management capabilities including data governance, data lineage, and change impact analysis. Data virtualization
enables organizations to create central data access, data governance, and security policies across heterogeneous
systems. Whether some of the data sources or consuming applications are spread across geographies or divided
between on-premises and cloud, the Denodo Platform seamlessly facilitates the central control of data governance and
security.
Data Lineage
View Lineage
When working with complex models, it is necessary to know the different elements that are involved in the definition of
a view, as well as other elements that could depend on this view. This information is crucial in a query’s performance
optimization or impact analysis, for example.
Denodo’s Design Studio allows you to see graphically the lineage of elements of the catalog. The mechanisms below can
be applied on ‘views’ or ‘interface views’.
Tree view
The Tree View displays the successive levels of views that have been composed to construct a given view. It also displays
the data sources where the data is coming from. To open this dialog, double-click on the view in the Elements Tree to
open the schema of the view and click Tree view, or alternatively right-click on the view in the Elements Tree and select
Tree view.
For example, the figure below shows the tree of the view customer_details.
Used by
The Tree view shows the elements that are involved in the definition of a view, but to have the whole picture it is
necessary to know also what other elements, if any, depend on it.
Denodo provides quick access to this functionality in the Design Studio. To open this dialog, double-click on the view in
the Elements Tree to open the schema of the view and click Used by, or alternatively right-click on the view in the
Elements Tree and select Used by.
For example, the figure below shows the dialog with the elements that depend on the view customer_details.
This dialog displays a diagram with all the views that depend on the current one. On the left side, there is a list of the
top-level views that use the current one in their definition. Top-level views are those that are not used to define any
other view. Click on one view to display how it is constructed and then, on the right side of the dialog, you can:
● Click on the nodes of the tree that represent combining operations (joins, unions, selections, etc.) to display
their main properties.
● Click on a data source to view its details.
● Click on a view or a base view to open the dialog showing its schema.
● Click on Download image to export this “tree” into an image file.
Column Lineage
The view lineage information explained above is complemented with column lineage information, that is, how the
columns are passed, and sometimes transformed, from the sources up to the given view.
The Design Studio allows you to see graphically the lineage of the different columns of a view. The mechanisms below
can be applied on ‘views’ or ‘interface views’.
Data Lineage
This dialog displays a diagram of the data sources and views used to build the current view. It also displays the source
where the information of each field comes from. To open this dialog, double-click on the view in the Elements Tree to
open the schema of the view and then, click Data lineage, or alternatively right-click on the view in the Elements Tree
and select Data lineage.
For example, the figure below shows the dialog with the lineage of the field full_name of the view customer.
On the left side of the dialog, there is a list of all the view's fields. By clicking on one of the fields, all the views and data
sources that participate in the creation of this field will be highlighted. For instance, if a field f is obtained by evaluating
an expression involving two fields f1 and f2 from different data sources, DataPort will highlight the data sources (and
their associated views) providing f1 and f2, and the view where the expression to obtain the value of f is defined. The
available options in this dialog are the following:
● Click on the tree nodes representing combination operations (joins, unions, selections, etc) to display their main
properties.
● View the details of the data sources by clicking on them (only if connected in administrator mode)
● Click on a view to open the dialog showing its schema.
● Click Download image to export the current Data Lineage Tree into an image file.
For example, the lineage of the field full_name of the view customer_details will show the following:
Impact Analysis
The steps to assess change impact in Denodo are as follows:
● Denodo visualizes the changes graphically and the user can decide to propagate or not those changes through
the data flow.
Change detection can be automated by using the Source Refresh API in combination with the Denodo Scheduler.
Note: The Denodo Scheduler provides ETL-lite capabilities to the Denodo Platform. It allows the scheduling and
execution of data extraction and integration jobs.
The Denodo Data Catalog tool promotes self-service and discovery capabilities for business users, enabling them to
explore both data and metadata in a single Web front-end tool (See Figure).
With this tool, the end user will be able to access a graphical representation of the business entities and associations, as
well as the data lineage and tree view information. The tool includes some reporting capabilities and export options
(CSV, Excel, Tableau, etc.).
The Denodo Data Catalog is still under the scope of the security restrictions that the virtualization layer defines in the
data model, so administrators can determine exactly which data each user and role sees through the Data Catalog.
The Denodo Data Catalog allows to configure and personalize several aspects of the tool, for example:
- Which elements from the catalog users are allowed to see (e.g. views, web services, both)
- Browse by Databases/Folders options
- Customize the tool to adapt it to the branding of your company (e.g. logo, background image, etc)
- Include an informative message that will be displayed to all users when they access the Data Catalog.
- Export of the results options (e.g. export formats, limit the number of results)
- Show/hide the connection URIs to access the data from client applications
- etc.
In addition to the categories and tags, the Denodo Data Catalog also supports customizing the metadata stored for each
element of the catalog. For instance, with custom properties, you can specify:
Custom properties can also help to deal with Data Quality and Data Privacy regulations.
Denodo provides a set of built-in stored procedures for metadata management. Those stored procedures can be invoked
from SQL (JDBC/ODBC/ADO.Net) clients or published as WS (SOAP/RESTful) for lightweight clients.
Metadata
CATALOG_VDP_METADATA_VIEWS
Description
The stored procedure CATALOG_VDP_METADATA_VIEWS returns information about all the fields of all the views of a
Virtual DataPort database. This information includes the type of the field, precision in case of numbers, etc.
You can filter by view and/or database.
Syntax
CATALOG_VDP_METADATA_VIEWS (
input_database_name : text
, input_view_name : text
)
● input_database_name: name of the database.
● input_view_name: name of the view you want to obtain its fields.
Example
CALL CATALOG_VDP_METADATA_VIEWS('customer360','customer')
Data Lineage
VIEW_DEPENDENCIES
Description
The stored procedure VIEW_DEPENDENCIES returns a list containing all the dependencies of a given view of a given
database. This allows you to programmatically obtain the same information that the Design Studio displays in the Tree
View dialog of a derived view.
This stored procedure works recursively, from the given view down to the data sources. It applies to base views, views
and interface views.
NOTE: It must be taken into account that the same element (view, data source, stored procedure) can appear several
times in the Tree View. One tuple is enough for the Tree View but several tuples could be necessary for the data lineage,
so several occurrences are returned.
Syntax
VIEW_DEPENDENCIES (
input_view_database_name : text
, input_view_name : text
)
● input_view_database_name: name of the database whose views you want to obtain the dependencies of.
● input_view_name: name of the view you want to obtain its dependencies from.
The procedure returns a row for each dependent view of each view. The dependencies can be:
● direct: the dependency is a direct child of the view.
● indirect: the dependency is not a direct child of the view.
If both parameters are null, the procedure returns the dependencies of all views in all databases.
If input_view_database_name is null and input_view_name is not, the procedure returns the dependencies of all views
having this name, in all databases.
If input_view_name is null and input_view_database_name is not, the procedure returns the dependencies of all views
in that database (note that if a view has a dependency with a view of another database, it is also shown).
Privileges
The information returned by the procedure changes depending on the type of user that executes the procedure:
● If the user does not have metadata privilege over a view, the name of the view or the subtree is not shown in
the tree.
○ The stored procedure will return a special row to indicate that the dependency is with a view without
privileges (the name of the view and the database should NOT be returned).
● The data sources are only shown if the user is an administrator or if the user has created them.
○ Same behavior in the stored procedure: the dependencies with the data sources are only returned if the
user is an administrator or if the user has created them.
In the following example, the user (who is not an administrator) does not have metadata permission over address, and
has not created the data sources of the base views us_customer and customer_demographics.
COLUMN_DEPENDENCIES
Description
The stored procedure COLUMN_DEPENDENCIES returns a list containing all the dependencies of a given field of a
derived view. This allows you to programmatically return the same information that is displayed in the “Data Lineage”
dialog.
This stored procedure works recursively, from the given view and field down to the data sources.
Syntax
COLUMN_DEPENDENCIES (
input_view_database_name : text
, input_view_name : text
, input_column_name : text
);
Similar to the privileges explained for VIEW_DEPENDENCIES, but in this case we also need to take into account the
column privileges.
The procedure returns information about the dependencies of the fields of the views on which the user has the
METADATA privilege granted. If the user does not have METADATA privilege granted over the dependency, the value of
the field “dependency_type” will be “No Privileges”.
In the previous example, if the user does not have column privileges over the column first_name of the view
customer_details, then the result will be:
Syntax
GET_JDBC_DATASOURCE_TABLES(
input_datasource_name : text,
input_catalog_name : text,
input_schema_name : text,
input_table_name : text,
input_type : text
)
● input_datasource_name: name of the data source for which you want to get the list of tables.
● input_catalog_name (optional): name of the catalog for which you want to get the list of tables. If the data
source does not support catalogs, set to null. If null and the data source does support catalogs, the procedure
will return all the matching tables across all catalogs.
©Denodo Technologies, Inc. 2020 - Confidential & Proprietary
8.0 Metadata Management Overview
● input_schema_name (optional): name of the schema for which you want to get the list of tables. If the data
source does not support schemas, set to null. If null and the data source does support schemas, the procedure
will return all the matching tables across all schemas.
● input_table_name (optional): name of the table.
● input_type (optional): type of element you want to find. If null, it returns all types of elements. The possible
values of this parameter depend on the underlying database. To know which values you can pass to this
parameter, execute this procedure passing null to this parameter and see the values of the parameter “TYPE”.
The procedure returns one row for each table/view in the underlying database of the JDBC data source that matches the
search criteria:
GENERATE_VQL_TO_CREATE_JDBC_BASE_VIEW
Description
This procedure returns the VQL necessary to create a JDBC base view for a given table in a data source.
Syntax
GENERATE_VQL_TO_CREATE_JDBC_BASE_VIEW (
data_source_name : text
, catalog_name : text
, schema_name : text
, table_name : text
, base_view_name : text
, folder : text
, i18n : text
)
● i18n: i18n of the base view to be created. If null, the procedure will assign the i18n of the Virtual DataPort
database to which the data source belongs. We recommend setting this to null.
The procedure returns one row for each VQL statement necessary to create the desired base view:
CREATE_REMOTE_TABLE
Description
The stored procedure CREATE_REMOTE_TABLE is one of the components of the feature remote tables. This procedure
does the following:
Syntax
CREATE_REMOTE_TABLE(
remote_table_name : text
, replace_remote_table_if_exist : boolean
, query : text
, datasource_database_name : text
, datasource_name : text
, datasource_catalog : text
, datasource_schema : text
, base_view_database_name : text
, base_view_name : text
, base_view_folder : text
, replace_base_view_if_exist : boolean
, options : text
)
● remote_table_name: name of the new table in the underlying database of the JDBC data source. It has to be a
valid identifier in the target database.
● replace_remote_table_if_exist (optional): if true and a table with the same name already exists in the database
(in the same schema/catalog), the procedure will drop the table and create it again. The default value is false so
if the table already exists, the procedure will fail.
● query: query executed in the Virtual DataPort server to obtain the data that will be inserted in the new table of
the database.
● datasource_database_name (optional): database of the JDBC data source in which the new table will be created.
If null, it looks for datasource_name on the current database.
● datasource_name: JDBC data source that points to the database in which the table will be created.
● datasource_catalog (optional): catalog of the database where the table will be created. If the database does not
support catalogs, set this to null.
● datasource_schema (optional): schema of the database where the table will be created. If the database does not
support schemas, set this to null.
● base_view_database_name (optional): Virtual DataPort database where the base view will be created. If null,
the base view is created in the current database.
● base_view_name (optional): name of the new base view. If null, the name will be the value of
remote_table_name.
● base_view_folder (optional): folder in which the base view will be created. If the folder does not exist, the
procedure will create it. If null, the base view will be created in the root folder.
● replace_base_view_if_exist (optional): if true and a view with the same name already exists in the Virtual
DataPort database, the procedure will drop the view and create it again. The default value is false so if the view
already exists, the procedure will fail.
● options (optional): options to modify the default value of some properties of the CREATE REMOTE TABLE
command. Format: 'option1=value1, option2=value2, ...'.
The procedure returns three rows with the status of each step. For example:
Denodo supports top-down modeling approaches. In "Top-down" design the user defines a set of “Interface Views”
representing a “contract” with the consumer applications.
Denodo is able to import data models and schema definitions from XSD, WSDL, XML files and from third-party modeling
tools such as CA ERwin, Embarcadero ER/Studio, IBM Infosphere Data Architect (Rational) or SAP PowerDesigner. These
models can be imported into Denodo by means of interface views and associations.
The Denodo Model Bridge is a graphical tool that streamlines the process of importing logical data models created with
external/third-party modeling tools into Denodo.
Interface views define a structure (columns and data types) and are independent of the data sources. The
implementation views are developed bottom-up, based on the base views imported from your data sources.
This process will result in a set of business entities (implemented as Denodo Interface Views) that are not linked or tied
to any physical model. Denodo supports rich modeling features such as associations that allow the designer to define
specific relationships between the different business entities with multiple levels of multiplicity (1:1, 1:many,
many:many). Later these associations can be traversed from external tools or used as the base for the Denodo Linked
Data Services.
Once business entities are defined as Interface Views, the designer provides an implementation for them. Since the
implementation view and interface view are decoupled, it is easy to generate data for testing prior to the creation of the
final implementation, where the designer will introspect the data sources, import the required base views, generate the
needed data transformations and combinations until reaching the fully executable view.
Denodo provides a Metadata API, so it is also possible to export Denodo Metadata towards third-party metadata tools.
Thanks to Denodo Metadata API, it is possible to integrate metadata in Denodo with third-party solutions to provide
end-to-end metadata management capabilities. For example, Denodo can access the enterprise DW, and after applying
some transformations and combinations (with data from the same DW or from any other source), publish the final data
that business users and applications need. In this scenario, Denodo provides the lineage of the data from the final data
up to the source (the DW in this case), but it is not possible to see in Denodo the lineage of the data from the DW up to
the original operational systems, because such metadata belongs to the ETL job that feeds the data into the DW.
By using the Denodo Metadata API, Denodo can be integrated with third-party Metadata Management solutions such as
Collibra, IBM Information Governance Catalog, Informatica Metadata Manager, or Informatica Enterprise Data Catalog.
As an example, see below an overview of how Denodo and IBM IGC integrate each other.
The Denodo Governance Bridge tool allows the synchronization of Denodo metadata and data lineage into IBM®
InfoSphere® Information Governance Catalog (IGC) in order to support the inclusion of Denodo virtual databases in
enterprise-wide Data Governance initiatives.
To achieve this the Denodo Governance Bridge Tool extends IGC by registering new types of assets from the Denodo
catalog, such as databases, data sources, base views, derived views, interface views, columns, associations, stored
procedures, parameters and folders.
Once synchronized, these Denodo assets can be governed using Glossary Terms, Governance Rules, Data Stewards,
Custom Properties and Collections.
You can find more information on integration with Informatica Enterprise Data Catalog (EDC) here