Data Hub
Data Hub
Data Hub Fundamentals
The SAP Commerce Data Hub is designed to remove obstacles
related to the loading and manipulation of large amounts of data. It
acts as the staging platform for the core data that is essential to
operations in your business. It also protects your current investments
in back end master data management systems.
Introducing new, front-end commerce applications into an environment with existing business data
management systems creates challenges for data integration. Traditionally, the introduction of these
new applications means lengthy, and costly, custom integration projects. Data Hub is designed to
overcome barriers related to the import and manipulation of large amounts of data. It acts as the
staging platform for data from multiple sources, and protects existing investments in back-end master
data management systems such as SAP ERP.
Data Hub is designed with SAP Commerce-SAP integration in mind, but is customizable for any data
integration scenario. Its vendor-independent workflow is fully extensible. Customizations can easily
be introduced that alter the data transformation workflow for specific integration needs. The ease of
customization greatly reduces project implementation costs and time-to-market wherever such data
integration is required.
Data Hub works primarily asynchronously, which means the data transformation process is
independent of external systems. Removing this reliance on the availability or responsiveness of
back-end systems allows for faster response times to the customer. Customer data is transmitted to
back-end systems outside the customer transaction workflow, and the customer experience is
improved.
With the right extensions, Data Hub brings together fragmented data from diverse systems into a
single, authoritative master view. It provides a full history of the data transformation workflow, and
performs error correction. It even retries data publication in unstable environments without any data
loss.
Data Hub offers all of the principle features of good Master Data Management (MDM) - collection,
aggregation, correlation, consolidation, and quality assurance. It allows for the synchronization of all
critical business data to a single reference system - an authoritative source of master data. These
features make Data Hub an ideal data staging platform for a unified MDM strategy.
The SAP Commerce Data Hub provides a vendor independent, asynchronous workflow that is fully
extensible to support the loading, processing, and publication of any data structure.
1. Raw items come directly from the source system. The data they contain may undergo some
preprocessing before entering the Data Hub.
2. Canonical items have been processed by Data Hub and no longer resemble their raw origins.
They are organized, consistent, and ready for use by a target system.
3. Target items are transition items. The process of creating target items reconciles the
canonical items to target system requirements.
Context
Data Hub is a Java web application, and it utilizes a relational database. It requires certain
prerequisites for a minimal installation. Unless otherwise stated, the following instructions
are not valid for third party software versions other than those stated here.
Solution Book
The Solution Book files are useful as examples or samples of what
can be done with Data Hub.
You can find the Solution Book files in the Data Hub Suite ZIP file. After you install Data Hub, you
can find the files in hybris/bin/ext-integration/datahub/solution-book. You can
1. HSQLDB is the default database of SAP Commerce and, unless otherwise specified, is the
default database used by Data Hub. No configuration is necessary, and you do not need to
start it. It just works.
Context
By default, Data Hub enables basic authentication on its REST API. Configure your installation with
access credentials for the two basic security roles, then test your installation. After you complete
these steps, you can use this environment for the Tutorial: Setting Up and Running Hello World.
Summary
You have just created and loaded three custom Data Hub extensions. The extensions consist of a data
model for each of the three data types, Raw, Canonical, and Target. Data Hub uses the raw and
canonical models to structure and transform data internally during the load and composition phases.
It then uses the target model to transform data for specific target systems, and to connect to those
systems. Such data models are the minimum requirement for any custom Data Hub extension. Your
new extension is used for the tutorials that follow.
Composing Data
Once the raw data is loaded, compose it into canonical items. The
composition phase has two processes: grouping and composition.
Context
Results
The POST request you sent in step one triggered the composition of all items in the GLOBAL data
pool. The request caused the raw item to be transformed into a canonical item, according to your
canonical data model. This canonical item is now ready for transformation into a target item for one
or more target systems. In the next tutorial you publish this item to a file using the file adapter.
Results
Congratulations! You've just successfully published your first set of target data. When sent in a
POST request to Data Hub, the contents of datafile2 triggered a publication of all data in the
specified pool. The resulting target item was picked up by the output adapter which then wrote the
data to the output file. There were no complex data transformations in your data model, only one-to-
one mappings. Therefore the output was the same as the input. The only thing that has changed is the
column headers
Source System
The SAP Commerce Data Hub takes data from any source system. However, the SAP Commerce
Data Hub treats each data fragment as its own raw item during the load process. So, it is good to
understand the nature and complexity of your data before loading data into the SAP Commerce Data
Hub.
Load
During the load phase, data is converted into raw items ready for processing. Loading means
resolving the data into raw item fragments as key-value pairs ready for the next step of the process,
composition.
When the data is loaded, it goes through a data feed. Raw items enter the data feed as fragments, and
not as a single, monolithic, data block. The data feed routes the raw items to a data pool. Pools are
containers for isolating items during the processing lifecycle.
Compose
The composition phase converts raw items to canonical items using two subprocesses: grouping and
composition. Both of these subprocesses are influenced by the use of default data handlers. Handlers
are custom extensions to the SAP Commerce Data Hub and are used to enhance its functionality.
Grouping handlers pull the raw imported items into logically ordered groups, while composition
handlers apply the composition rules by which the canonical items are composed.
Canonical items represent a master data type view that is independent of the structure of both the
source and target systems. It is during this phase that the power of the SAP Commerce Data Hub as a
data staging platform is seen. The canonical view provides a reference model or template that may be
reused regardless of source or target system.
It is also during the composition phase that data can be consolidated from multiple sources, the
integrity of data checked, and any quality issues remedied. Imported data is open to inspection at any
phase in the SAP Commerce Data Hub workflow. The inspections allow for complete transparency
of the data processing and error remediation before publishing to target systems. Once data is fully
composed, you can publish it to one or more target systems.
Publish
The publication phase transforms canonical items into target items ready for export to the target
system. Once the data has been processed into target items, outbound extensions or target adapters
then provide the means for delivering the data to target systems.
Target System
SAP Commerce Data Hub includes a target adapter that integrates with SAP Commerce. This target
adapter is called the SAP Commerce Data Hub Adapter, and it provides connectivity between SAP
Commerce Data Hub and SAP Commerce. If you are using a different target system, you create your
own custom target adapter.
The typical use case for Data Hub is to deploy it together with SAP Commerce. SAP
Commerceprovides the Backoffice Administration Cockpit, a framework for building web-based
administration tools. The Backoffice Data Hub Cockpit is one of the provided Backofficecomponents
shipped with SAP Commerce.
Backoffice Data Hub Cockpit is an easy-to-use graphical user interface for performing key Data
Hub tasks in a simple and intuitive way. With Backoffice Data Hub Cockpit, you can manage the
entire Data Hub workflow, including the following:
Initiate load, compose, and publication actions for small data sets
When the Backoffice Data Hub Cockpit extension is installed, you can access it by simply logging
into the SAP Commerce Backoffice Administration Cockpit. Once logged in, choose the Data
Hub perspective from the perspectives menu to access the Backoffice Data Hub Cockpit.
The Backoffice Data Hub Cockpit consists of several main areas, as shown in the following figure.
These main areas are as follows:
1. The perspective menu - Using the options provided here, you can switch between the Data
Hub perspective and other perspectives as required.
2. The current Data Hub instance - Here you can choose which Data Hub instance you want to
use for this session. Just click the down arrow and select.
3. The function menu - Here you choose one of the available functions of Backoffice Data Hub
Cockpit:
o Dashboard - See an overview of status and item counts for raw, canonical, and target
items. You can filter this count by pool, to view only those items in a selected pool.
o Quick Upload - This tool is designed to accommodate the quick upload of data into
a Data Hub data feed. You can also initiate the composition of the data, and the
publishing of the data to a target system.
o Errors & Failures - This tool makes it simple to review the errors that may occur
during each step of the Quick Upload process. The Errors & Failures menu option
can be useful even for large production runs due to its advanced, field level searching
capability.
o Manage Pools - Here you can create named pools for data import and composition.
The GLOBAL pool is provided by default.
o Manage Feeds - Here you can create new feeds and assign them to pools. The
DEFAULT_FEED is provided by default, and is mapped to the GLOBAL pool.
4. The control pane - This pane contains the screens associated with the different menu options;
allowing you to perform tasks associated with these options. When you select each item in
the function menu, the contents of this pane change to reflect your choice.
Data Handling
Data Hub is all about data. Data is loaded from a source system,
which can be any kind of source. Once the data is in the Data Hub, it
is organized and converted into a canonical format. While the data is
in the Data Hub, there are opportunities to modify filter it. When
preparing the data for a target system, further manipulation can occur.
Then the target adapter ensures that the data is compatible with the
target system.
All adapters and handlers are really just extensions. They have other names to designate their
function. There are several different types of extensions for processing data including the following:
source adapters
target adapters
grouping handlers
composition handlers
The source system adapter receives data from a source system and feeds it to the Data Hub in a
standardized way.
Some kinds of data, such as IDocs, require additional preparation to be efficiently processed. The
SAP Data Hub Extensions help handle IDocs.
Target Adapters
After data is processed by the Data Hub, it is sent to a target system through the target system adapter
as part of the publication phase.
Grouping Handlers
Grouping handlers are responsible for determining what canonical items are going to be created
given an input set of raw items. The Data Hub ships with two default grouping handlers. One handler
groups by type and the other by primary key. Custom handlers can be added by the user to perform
more sophisticated data handling operations such as filtering.
Composition Handlers
The Data Hub ships with three default composition handlers that create localized values, collections,
and simple strings. Custom handlers can be added by the user to perform more sophisticated data
handling operations. Composition handlers put the grouped raw item data into canonical form.
Shaping the raw items includes the following:
Data Hub Extensions
Data Hub includes an engine for data management and
transformation, and a framework for building data integrations. With
aData Hub extension you can construct these data integrations and
influence the functionality of the Data Hub engine.
Extensions are compiled as .jar files and placed on the Data Hub class path. They are loaded when
the Data Hub starts before it performs any data handling functions. Extensions are not loaded
haphazardly. Some extensions, like grouping or composition handlers, have explicit Spring
properties defined to control when the handler runs in relation to other handlers. Other extensions
have dependencies upon each other, so they only load after their defined predecessors. You control
loading order by defining the dependencies in the extension XML. Data Hub automatically resolves
these relationships during the extension loading process.
Test Adapter
1. The only required element in an extension is the XML file. It defines the data structure.
2. Optionally, you can write custom java code for your extension when you want it to do more
than an XML file can accomplish. Custom composition handlers are a good example. You
may want to do more to the data during composition than the default handlers do. The simple
solution is to create your own composition handler and place it
in /opt/datahub/extensions.
3. Optionally, you can use a Spring XML file. If you create a custom composition or grouping
handler, you must create a Spring XML file, because that is where you set the processing
order property. The name template for a Data Hub Spring file is your-extension-
name/datahub-extension-spring.xml.
Summary: Concepts
In this topic area, you reviewed a more detailed data flow diagram for
the Data Hub. You were introduced to some of the basic data handling
structures, and learned some fundamental information about
extensions.
The following areas are the expected Learning Outcomes for this topic area:
You have some familiarity with the Backoffice Data Hub Cockpit, and can relate the features
there to the data flow concepts you have learned
You have been exposed to some of the Data Hub data manipulation tools and you understand
their basic purpose
You understand the concept of an extension. You have seen what is used to create them, and
how Data Hub incorporates them.
In this topic area, you use the extension.xml file from the Hello World tutorial to manipulate the
data during the publication phase. Data manipulation is achieved using various transformation
expressions that you learn about here.
The extension.xml file uses an easy-to-understand XML dialect that allows you to define a data
model for your project. This data model describes the internal structure of raw items, canonical
items, and target items within Data Hub. It also allows you to specify the necessary transformations
the data must undergo on its journey from raw to canonical, and then to target items suitable for
publishing to your target systems.
A raw item represents the structure of the unprocessed data entering the Data Hub. Here is a
The canonical data model is independent of both source and target systems. You therefore should not
need to modify it, even when changing the source or target.
The target item XML file differs only slightly in structure to the raw and canonical extension XML
files. Data Hub can publish to more than one target, and each of target can have a unique schema.
You can therefore have multiple target data models. For the purposes of this chapter, there is only
one target system defined in our file.
The <exportCode> provides the name of the target item attribute into which the canonical
item data is to be placed. Here, once again, you can simply provide a name,
but exportCode can also include an ImpEx expression.
The following areas are the expected Learning Outcomes for this topic area:
Build Essential Data
HubKnowledge
Once you've completed Start Your Journey, you're ready to dive in a
little deeper. Here is where you build a solid foundation of the
essential Data Hub concepts and processes.
Careful planning is required to undertake a successful data integration project using SAP Commerce
Data Hub. Understanding what components are required for the needs of your project is the first step
in this planning. Certain components are mandatory, and for some SAP Commerce provides an out-
of-the-box solution. These out-of-the-box components are shipped together with the SAP
Commerce software package. Other components are purely optional, their inclusion being dictated
solely by the needs of your project.
The following figure provides a complete end-to-end overview of a Data Hub integration. Use this
overview and subsequent explanations to gain some insight into which components may be relevant
to you. If a component is optional, it is indicated in the description. You can find links to further
information after each description.
Source System Adapter - The source system adapter provides an integration between the source
system and Data Hub. The source system adapter is an essential part of your data integration solution
because it is required to import raw, unprocessed data from the source system to Data Hub. The
REST endpoint used to import data corresponds to the Data Hub integration channel and data feed. A
more complex adapter may include event responses and data preprocessing capabilities. If your
source system is SAP Commerce or SAP ERP, then you can use one of the provided, out-of-the-box
adapters for this purpose
Inbound Channel - Data Hub provides a single inbound channel that is a Spring Integration channel.
Spring Integration is the preferred method for importing data into Data Hub. The CSV inbound
adapter provides a convenient method for data already in CSV format. It sends the data along to the
Spring Integration channel.. For more information, see:
Data Feeds and Pooling Strategy - Do you need more than one feed or pool for either concurrent
processing or data segregation? Data Hub provides a single feed and a global pool by default. If you
need more than one feed or pool, you can use either a named pool or new pool per input pooling
strategy. Using a strategy other than the default feed and global pool is optional. For more
information, see:
Data Model - The data model defines how raw, canonical, and target data items are structured, and
includes metadata that assists with composition and publication. Typically, you create separate XML
files for RawItem, CanonicalItem, and TargetItem definitions. The target item XML also includes
target system definitions. The data model is an essential component of your Data Hub integration
project
Data Handlers - Data handlers are classes that are used to execute data transformations, including
grouping, especially during the composition phase of theData Hub workflow. Data Hub is shipped
with a set of default grouping and composition handlers. You may develop your own dedicated
handlers in addition to the provided defaults. Handlers can run in any order and you can insert as
many handlers as you need. Publication grouping handlers are optional, but provide powerful data
transformation possibilities during the publication phase.
Target System Adapter - The target system adapter is a means to deliver data to target systems and
gather results. The target system adapter is the normal endpoint for the Data Hub publication phase.
It does the final packaging of target items for delivery, then forwards them to the target system. If
your target system is SAP Commerce, or either SAP ERP or SAP C4C, then you can use one of the
provided out-of-the-box adapters. For more information,
In addition to the out-of-the-box solution described above, you may also find examples of solutions
for specific test and real-world scenarios in the Solution Book.
Data Hub – SAP ERP
Integration Use Case
Data Hub plays a vital role in simplifying the data integration between
the SAP Commerce platform and SAP ERP systems.
As a part of the SAP ecosystem, SAP Commerce is the natural choice for companies already heavily
invested in SAP back-end systems. This includes companies who now also wish to expand their
footprint into digital commerce channels. Extending SAP ERP to interchange data with SAP
Commerce ordinarily creates challenges for data management. Leveraging the Data Hub and the out-
of-the-box SAP Commerce-SAP integration, companies have the solution to simplify, extend, and
develop these vital data integrations.
Companies have invested considerable resources and time into implementing their SAP ERPsystems
and customizing them to suit their particular business needs. It is natural, then, to want to build upon
this system and its data that forms the basis of critical business processes. Using the Data Hub,
transactional, customer, pricing, and product data can be bi-directionally integrated between SAP
ERP and SAP Commerce.
The SAP ERP system then handles this data according to its pre-defined business processes. It sends
an order confirmation back to SAP Commerce through Data Hub in a similar process. Later it may
update the customer view in SAP Commerce periodically with the order status or delivery
notifications.
The SAP ERP system also serves as an authoritative master data source for pricing information. The
pricing information includes campaigns and discounts, product information, and inventory, as well as
anticipated future stock. Data Hub can propagate this product and pricing data from SAP
ERP to SAP Commerce as it evolves, ensuring customers always have the most up-to-date view. In
addition, the event-driven architecture of Data Hub ensures that the propagation happens in near-real
time.
Data Hub is shipped with out-of-the-box SAP integration extensions. These extensions provide a
path to move data to and from SAP ERP and Data Hub using the native SAP IDOC format.
Support for SAP integration includes the extensions to receive inbound data and send outbound data,
and also a complete set of data model extensions. These extensions provide for the mapping of raw
data fragments to canonical items, and then to target data for SAP systems. Data models are also
provided for data flowing from SAP ERP to SAP Commerce. Components marked in color in the
following graphic represent current out-of-the-box supplied integration extensions.
The Integration Workflow for Data Flowing from SAP Commerce to SAP ERP
With data flow in both directions, it is possible to deploy the SAP Commerce solution as the
responsive, transactional front-end. SAP ERP remains the authoritative master data source. Some
customization may be necessary to adapt this workflow to specific environments. However, the SAP
Commerce software package includes out-of-the-box adapters and handlers for this scenario. The
complete package greatly simplifies the effort involved by reducing both time-to-market and total
project costs.
Data Hub is able to integrate and consolidate various types of data between SAP ERP and commerce
front-end systems. Types of data that may be integrated include the following:
Customer Data
With asynchronous integration, customer data is replicated between the SAP Commercesolution
and SAP ERP. When a customer submits an order, there is the option to create a profile in SAP
Commerce. The unique customer data, along with preferences and transaction history, is stored
in SAP Commerce. It is also replicated to the SAP ERP with an asynchronous call along with the
order data. The replication enables a responsive customer experience while ensuring all master data
in the SAP ERP is up-to-date. Existing customer data in the SAP ERPcan similarly be replicated
to SAP Commerce using either a bulk data transfer or a change event trigger.
Order Data
Submission of a customer order from SAP Commerce to SAP back-end systems is a critical part of
the SAP Commerce-SAP integration solution. Asynchronous order updates may be triggered by a
periodic cron job or an event. The ability to pause between the submission of the order and update to
SAP has the added benefit of allowing a window of time for updates to the order. This window could
be used as a "buyer's remorse" period, for example.
Product Data
In most cases, considerable time has been invested in specifying, organizing, updating, and
classifying product data in the SAP ERP. Such product data may be imported into SAP Commerce.
Once there, it is available in the SAP Commerce Product Information Management (PIM) solution
through the SAP Commerce Product Cockpit. It is also available to customers during the search and
transaction process. This import can be done in bulk, either manually or triggered by an automated
process. It can also be updated at the time of a customer transaction as described previously. PIM
plays a vital role in the overall MDM strategy.
Pricing Data
SAP Commerce provides extensive support for B2B pricing scenarios. In these scenarios, a single
price for a product applies to all customers and discounts or campaign strategies applied across the
board. In such a scenario, basic pricing data, including discounts and campaigns, can be stored
in SAP Commerce. The information is then updated to the SAP ERPon a per-transaction basis with
the customer order.
In a B2C scenario, pricing is determined on a per-customer basis depending on various, complex,
factors. It therefore may be the SAP ERP that is best equipped to be the source of the master pricing
data. In such a scenario, you may opt for synchronous integration between SAP ERP and SAP
Commerce. Synchronous integration also depends on how often these prices are updated. An
asynchronous integration solution always means faster response times, decreased load on the SAP
ERP back end, and a better customer experience.
Inventory Data
Your SAP ERP system is ordinarily the master data store for inventory, including availability,
shipping locations, back orders, and number of items in stock. The business processes in place here
already are not replicated in SAP Commerce. How and where to calculate inventory data is a
decision based on factors such as performance and customer satisfaction. The solution may be
implemented either in the SAP ERP or the SAP Commerce solution. It is then presented to the
customer through the SAP Commerce product details page, at checkout, or through the order history.
Order Status
After submitting an order, both B2C and B2B customers want to track the status of their order. The
order status data stored and updated in SAP ERP is potentially vast as notifications are issued at
every stage of the order fulfillment process. How and if each of statuses are mapped to order status
updates in SAP Commerce is a matter of choice in your implementation. Out-of-the-box, Data
Hub provides support for data mapping using one of several pre-built extensions. As with other
aspects of the SAP ERP integration scenario, these extensions may be customized to suit your
business processes and customers' needs
Getting data from SAP Commerce to SAP ERP and back again involves more than simply inbound
and outbound transport adapters. Data Hub ships complete with a dedicated suite of extensions for a
typical SAP Commerce-SAP integration use case. Each of the extensions has a specific role to play
in the preparation and manipulation of data on its path towards and through Data Hub.
The data flow models in the following diagram outline this path, and the role of the various
extensions in more detail. Extensions indicated with blue titles are standard SAP Commerce or Data
Hub extensions, while the yellow titled ones are for SAP integration use cases. All the extensions
included in these data flow models can be shipped together with Data Hub. They can be used out-of-
the-box as the foundation for a SAP Commerce-SAP data integration. Having out-of-the-box
solutions greatly simplifies and reduces the effort involved in your integration project. Click each of
the image thumbnails for a larger view.
SAP Cloud for Customer (C4C) provides a complete solution for targeted customer support, sales,
and marketing that goes well beyond traditional Customer Relationship Management
(CRM). C4C allows sales and customer service teams to provide a relevant customer experience at
every stage of the customer journey. However, in a commerce environment, it is the front-end web
shop that is the first point of contact for customers. It is also the primary repository for customer
data, which is vital to enabling this targeted customer experience. With Data Hub, customer data
created in the SAP Commerce solution can easily be transferred to SAP C4C for use during sales,
marketing, and support activities.
Customers create or update their information in the SAP Commerce solution, either within or outside
the context of a transaction. The data change is detected by the SAP Commerce y2ysync extension.
The relevant customer data, including billing and shipping addresses and contact details, are then
written to an ImpEx file to be collected by Data Hub. Data Hub processes this data through its
standard workflow and delivers target data items suitable for C4C over a SOAP interface.
The y2ysync Commerce extension is able to detect the delta of any customer data changes in SAP
Commerce. It is possible to send only that delta to Data Hub. However, SAP C4C requires the
complete set of data for a single customer to be able to update the customer entry. Therefore
the C4C integration model includes all customer and address data related to any change. The
following types of data are integrated:
Customer Data
Customer Data includes names, addresses, IDs, payment addresses, and shipping addresses. During
the data transformation process, Data Hub adds new fields required by C4C that are not present in
the SAP Commerce platform. Fields might include items such as gender or form of address, which is
not necessary for an online commerce platform but is useful in customer service. It additionally splits
or joins fields, such as address or name fields, where a single field maps to multiple fields in the
target system. It also works in the opposite way where there is only one field in the target system for
data in multiple fields in the source.
Address Data
Address data includes customer contact details, and also such items as shipping and billing addresses.
In SAP Commerce, billing and shipping addresses are closely tied to the customer persona. In C4C,
such details are distinct. The data mapping for C4C integration splits SAP Commerce address data,
transferring it from the CustomerItem to the AddressItem and splitting or combining fields as
necessary.
The following data flow model outlines this path, and the role of the various extensions in more
detail. Extensions indicated with blue titles are standard SAP Commerce or Data Hubextensions.
Dedicated C4C use case customer data integration extensions are shown with yellow titles. All the
extensions included in this data flow model are shipped together with Data Hub. They can be used
out-of-the-box as the foundation for a SAP Commerce-C4C integration. Being out-of-the-box greatly
simplifies and reduces the effort involved in your C4C integration project. Click the image thumbnail
for a larger view.
Basic Aspects of Load,
Compose, and Publish
What happens to data as it passes from the source system
through Data Hub to the target system? This more detailed overview
of the Data Hub workflow shows how Data Hub handles data end-to-
end.
This topic area discusses the three main phases the Data Hub passes through when handling data—
the load phase, the composition phase, and the publication phase. These phases are tightly
intertwined, and there are well defined pathways that the data takes from one phase to the next. With
a sound grasp of how these phases handle data, you have a deeper undertanding of how Data
Hub actually works.
When you are finished with this topic area, you should understand how Data Hub loads raw data
fragments. You have a more detailed view of the composition of data into canonical items using both
grouping and composition handlers. Finally, you should be aware some of the key aspects of
publishing data to target systems.
1. Data is loaded into the Data Hub through an inbound extension (for example, the CSV
inbound extension). The extension converts the data into a standardized format and passes it
to the exposed Spring Integration inbound channel. At this point, the Data Hub converts the
data from the inbound channel into raw items and makes them available for further
processing.
The raw item begins the Composition phase by being copied. The copy of the raw item now goes
through Grouping. During Grouping, the raw items are arranged by canonicalItemType or
by primaryKey. If custom Grouping handlers exist, the raw items may also be arranged in other
ways
Multiple values combine into a set. Sets occur when a given attribute on a type is a collection
of values as opposed to a single value. Sets often occur in one-to-many relationships.
If an attribute has a single value that cannot be localized and cannot be a member of a set. Please see
the following transformation rules for these types of attributes:
Transformation is used when the raw attribute name does not match the canonical attribute
name. To implement this logic, populate the <expression></expression> tag in the
XML file with the name of the raw attribute that supplies the desired value.
Use a SpEL transformation if a basic reference attribute does not suffice. The SpEL
expression is specified in the <expression spel="true"></expression> tag. A
SpEL expression can be used to construct a value made up of 0 or more reference attributes,
as well as any other valid SpEL expression.
You can add custom Composition handlers that further manipulate the Grouped raw
fragments. After all of the handlers are done, the Grouped raw item has been composed into a
canonical item.
You can add custom Composition handlers that further manipulate the Grouped raw
fragments. After all of the handlers are done, the Grouped raw item has been composed into a
canonical item.
The canonical item is published to a target system. During publication, the publication phase
uses the target.xml file to transform the canonical data into target compatible output.
Then the target system adapter accepts the target item from the publication process and
passes it to the target system.
Load
Data Hub provides several methods for loading data. The primary and
recommended method is through the Spring Integration Channel, but
there is also a service for loading source data in CSV format.
Data is loaded into Data Hub as key-value pairs. With CSV data, which is already a key-value
mapping, this is straightforward. With other data input formats, some processing occurs to extract the
key-value pairs.
Each new batch, file, or stream is considered a new data loading action. All data in the provided
batch, file, or stream, must be completely processed before the data loading action completes. As
each key-value pair is loaded, it is recognized as a valid raw fragment and assigned a PENDING
status. Only raw items with the PENDING status, and a completed data loading action, are eligible
for composition.
The goal of the CSV Web Service extension is to simplify the load process of raw items into the Data
Hub. It is possible to make REST calls to the CSV web service to load the data. The REST call
resource contains the name of the raw item where you want to load data. The request body contains
the attribute names and data that you want to load.
CSV Header
The first line in the CSV body is called the header. The header of the CSV corresponds to the
attribute names of the raw item where you are trying to load data. The attribute names should be
comma separated with no spaces. For example, if you want to load data into a raw item that has the
raw attributes: identifier, name, description, and unit, you would place the following header in the
request body:
itemnum,name,description,unittype
The order of the attributes must match the order of the data.
CSV Body
The following rows in the CSV body include the data you are loading. Each row corresponds to
exactly one raw item. Following the previous example, if you have the raw attributes: identifier,
name, description, and unit, and you want to load three raw items, the corresponding CSV body
includes the header plus the values for each attribute:
itemnum,name,description,unittype
In this example, you create three raw items. Each value in a column corresponds to the value of the
attribute in the same column. So, for item 0, the value for identifier is 0, the value for name is pants,
and so on.
A blank value for an attribute implies that this attribute should be ignored. In this case, the value of
the attribute does not change. If this attribute value has never been set, it remains the default, and, if
it has been set before, the value is not updated. This is in contrast to an empty string, which causes
the existing attribute value to be replaced. In the following example, the value for description of item
0 is not set, and will therefore be ignored.
itemnum,name,description,unittype
0,pants,,pieces
1,shorts,kinda like pants but shorter,pieces
The goal of the CSV Web Service extension is to simplify the load process of raw items into Data
Hub. It is possible to make REST calls to the CSV web service to load data into Data Hub. The
REST call resource contains the name of the raw item you want to load data into. The request body
contains the attribute names and data that you want to load.
The REST call resource contains the feed name, and the name of the raw item where you want to
load data. For example, if you want to load data into a raw item called RawProduct, you call the
following resource:
POST -
http://{host:port}/datahub-webapp/v1/data-feeds/{feed_name}/items/RawProduct
Localized Attributes
If you want to add values for attributes that are localized, the item should span multiple lines - one
line per locale. You must include the isoCode attribute when loading localized attributes. You must
also include the attributes that are defined as a primary key in each line. Let's assume, for example,
that identifier is the only attribute that is defined as a primary key in the metadata. You then must
include it in every line. Including attribute values for non-primary key attributes is optional. For
example, if you want to load item 0 with the name and description in English, German, and French,
you load the following CSV content:
identifier,isoCode,name,description,unit
NoteIn the snippet above, the German version uses quotation marks to escape the comma
in tragen Sie es, um Ihre Beine zu bedecken. Also, note that the pieces parameter is implicit in
the German and French CSV content.
Collection Attributes
Loading values for attributes that are defined as collections is similar to loading localized attribute
values. However, there is no need for the isoCode attribute to be present. For the following example,
you add the attribute category, which is defined as a collection in the Canonical model. You also
assume that identifier is the only primary key attribute. If you want to load several categories for
your item 0, you must include 1 category per line.
identifier,category,name,description,unit
0,wearables,,,
0,accessories,,,
You can "empty" attribute values in SAP Commerce after they have been populated with values.
Basic Attributes - In the case where you want to reset a value of an attribute back to its
default, you must use a special string reserved for the command to clear a field. By default,
the string <empty> instructs Data Hub to empty an attribute value. For example, if you want
to clear out the attribute unit for item 0, you use the following CSV:
identifier,unit
0,<empty>
Localized Attributes - If you want to empty an attribute value for a localized attribute, you
must specify the isoCode of the language you are trying to clear. For example, if you would
like to clear the English name for item 0:
identifier,isoCode,name
0,en,<empty>
identifier,category
0,<empty>
You can configure the empty property csv.empty.value to be a different value. The default
value is <empty>, but you can easily configure it to be a different string in
the local.properties file.
Return Codes
If all attributes in the CSV header are defined for the raw item where you are trying to load
data, a 200 OK code is returned
At least one of the attributes in the CSV header has to be a valid attribute of the raw item.
The data for the attributes that are defined correctly for that raw item are loaded and
persisted. However, the data that corresponds to the attributes that are not defined for that
raw item are ignored, and a 200 OK code is returned.
If none of the attributes in the CSV header exist for that raw item, a 400 Bad Request code
is returned from Data Hub. The message is No valid attribute names exist in csv header
for type 'MyRawType'.
If the raw item type you are trying to load data into does not exist, or is not a subtype of raw
item, a 400 Bad Request code is returned. The message is InvalidRawType type is not a
subtype of RawItem.
If the feed name you are trying to use does not exist, a 400 Bad Request code is returned.
The message is Invalid feed name specified - InvalidFeedName.
The SAP Commerce Test
Adapter Extension
The SAP Commerce Test Adapter enables you to quickly set up Data
Hub instances for testing. It intercepts Data Hub publications that
would ordinarily be sent to a HybrisCore target. You can then
complete the full load, composition, and publication cycle without the
need to install and configure SAP Commerce.
The SAP Commerce Test Adapter is available as an optional extension for Data Hub. To install it,
simply place the extension in the /opt/datahub/extensions/. No further configuration is
necessary. The Test Adapter automatically intercepts any publication being sent to
a HybrisCore target.
Data Hub Test Adapter provides the following benefits:
When used in performance testing, it removes the extra latency and processing time of
publishing to SAP Commerce, thus providing more accurate results.
It can optionally write the generated ImpEx to a file on the file system.
1. You are going to utilize Data Hub for extraction and preparation of data for your SAP
Commerce installation.
To install Data Hub Adapter, you first include it as an extension in your SAP Commerce build. Then
configure the endpoints and authorization credentials on both Commerce Platform and Data Hub.
Summary: Default Data
Hub Adapters
The following areas are the expected Learning Outcomes for this topic area:
You have an understanding of the difference between an input channel and an adapter.
You can use the CSV Web Service for basic data load using a REST call.
You are aware of the available out-of-the-box adapters shipped with Data Hub.
Optionally, you can set up and use the Data Hub Adapter with a local SAP
Commerce installation.
In the Next Topic Area...
You receive:
A more advanced overview of the various methods used to transform data in Data Hub
Transforming Data
The Data Hub provides many default tools to simplify data
transformation. In addition, you can create custom tools that make
your options endless.
In this topic area, you build upon what you have already learned about transforming data using XML
and handlers. By the end of this topic area you should be able to create your own Grouping and
Composition handlers. You are also able to transform data using the extension XML.
You can use simple expressions, SpEL expressions, and/or custom SpEL expressions provided
by SAP Commerce to modify, alter, or disable XML attribute definitions.
Using a Simple Expression
It is essential the custom extension containing the override or disable XML attribute loads after the
original extension. To ensure the proper load order, define a dependency in the custom extension for
the original extension.
While you can describe some transformations with the extension XML, a handler provides
opportunities for implementing advanced logic using Java. Handlers are most important during the
composition phase, but may also be used during publication. A handler is typically a class file
included in your extension project that implements the relevant handler interface. It extends the
default abstract handler classes, or adds new process logic to the data chain.
There are three different types of handler, each executing a vital part of the data integration process:
grouping handlers, composition handlers, and publication grouping handlers. Data Hub is shipped
with a set of default grouping and composition handlers that provide logic essential to its workflow.
You can also add your own handlers to perform custom transformations.
Grouping Handlers
Grouping handlers group raw item fragments according to specified attributes. These may be
attributes such as canonicalItemType or primaryKey. For example, a grouping handler that
groups by canonicalItemType may place all raw item fragments related
to canonicalProduct in one group, and all item fragments related to canonicalVariant in
another. A grouping handler that groups by primaryKey would look for all raw item fragments
with the same primary key and group these such that the result represents a single row in the
database. Grouping handlers work on copies of the original raw items, which remain in the database
with their status unchanged at this stage. This grouping behavior, the first stage of the composition
process, delivers unified raw data items from the data pool ready for transformation into canonical
items.
By redefining the order of execution of the default grouping handlers, you may influence the final
result. You can also introduce new, custom grouping handlers of your own. The execution of
grouping handlers always precedes the execution of composition handlers.
Composition Handlers
Composition handlers put the grouped raw item data into canonical form. This may include
populating the canonical data fields, handling empty data fields, merging data from several fields into
one, and creating new canonical item primary keys. Composition handlers are executed after
grouping handlers in the composition phase, and the resulting canonical items are persisted in the
database. Upon successful execution of the composition handlers, the raw items, which remain in the
database, are marked with the status PROCESSED, while the resulting canonical items are given the
status SUCCESS. As with grouping handlers, you may introduce your own custom composition
handlers.
Default Handlers
Data Hub is shipped with default grouping and composition handlers. These are defined as abstract
classes, which may be extended by your own custom handlers by way of the available handler
interfaces in the SDK. These default handlers represent a simple use case involving the set of test
data shipped with Data Hub, and provide essential grouping and composition processes that form a
good foundation for many data integration scenarios. You may extend the default handlers, but it is
not recommended you override or exclude them unless you are certain this is necessary for your
project.
Custom Handlers
Custom handlers introduce new grouping or composition logic into the process chain alongside the
default handlers. They are implemented as part of your custom extension and loaded with that
extension during Data Hub startup. Because the default handlers are implemented as abstract classes,
you may easily extend or modify them with your own custom handler logic. Custom handlers need to
be registered in your Spring application context. The order property in the Spring context determines
the order in which the custom handler executes in the process chain.
The following areas are the expected Learning Outcomes for this topic area:
You learned the different techniques for using extension XML to manipulate data
You receive:
You learn how you can use the Backoffice Data Hub Cockpit to perform many of the
fundamental functions of Data Hub
You learn how to view and analyze errors using the Backoffice Data Hub Cockpit
With the Backoffice Data Hub Cockpit installed and configured, you can do the following:
View item counts and statuses for each of the three phases of the Data Hub workflow
Context
Backofficeapplications are created as part of the Backoffice Framework. This enables application
designers to create or modify widgets in order to customize Backoffice without writing code.
Described below is the process for installing and initializing the Backoffice Data Hub Cockpit.
Procedure
Results
You can now access Backoffice Data Hub Cockpit by going
to http://<hybris_host>:9001/backoffice/, and logging in with the Data Hub Admin
Group role. Select the Data Hub perspective icon from the drop down perspectives menu.
Tutorial: Create a New Data
Hub Instance
You can define new Data Hub instances directly in the Backoffice
Data Hub Cockpit.
Prerequisites
For this tutorial you must have both Data Hub and SAP Commerce installed and running. Backoffice
Administration Cockpit must include the datahubbackoffice extensio
Procedure
1. Connect to Backoffice Administration Cockpit in your browser at the appropriate URL.
http://localhost:9001/backoffice/
In the right pane, enter datahubinstance (one word) in the search box, as shown in the
following example.
a.
b. Click the Search button
c. Click the DataHubInstanceModel called DataHub Instance
Results
Your new Data Hub instance should now appear in the drop-down instances menu. If there is an
issue resolving the instance you just created, it appears as grayed out with a red x in the instance list.
An incorrect URL.
Procedure
Tutorial: Perform a Quick
Compose
You can trigger a composition of previously loaded data using
the Data Hub Backoffice Quick Upload page.
Tutorial: Perform a Quick
Publish
It is easy to trigger a publication of previously composed data using
the Data HubBackoffice Quick Upload page. Quick publish allows you
to define one or more target systems for publication
Prerequisites
Ensure that you have canonical items in Data Hub that have a SUCCESS state, awaiting publication.
You do this by issuing the following curl command
Context
Occasionally you will encounter errors when using the quick upload, compose, and publish options
in the Data HubBackoffice. Data Hub returns these errors to Backoffice where the details are stored
for review.
Procedure
Tutorial: Create a Feed and a
Pool
You can use the Backoffice Data Hub Cockpit to create new feeds and
pools, and to define pooling strategies. This tutorial walks you through
the process.
Context
The DEFAULT_FEED and GLOBAL pool, along with the global pooling strategy, are present by
default in Data Hub. Create additional feeds and pooling strategies for custom data management
requirements.
Procedure
Master Your Data Hub Project
Master Your Data Hub Project gives you the advanced knowledge to
prepare Data Hubfor a production environment. Complete the earlier
modules before proceeding with this one.
Upon completion of this module you have all of the tools necessary for any Data Hub integration
project. To proceed with the first topic of this topic area, click the following related link
Installing Data Hub
There are several things you must do to complete your installation
of Data Hub and prepare it for a production environment.
This topic area discusses the final steps you take for your Data Hub installation to progress to a
production ready installation. There is also information about version upgrade and running under
Oracle WebLogic.
At the end of this topic area, your Tomcat installation is gong to be properly tuned. A relational
database is going to be installed and integrated with Data Hub. Encryption is going to be enabled.
You will have determined what, if any, database cleanup strategy to employ.
Data Hub Pre-Requisites
You must have the following knowledge to take full advantage of Data Hub capabilities.
Program in Java
Tuning Tomcat
Tomcat can be tuned to use memory more efficiently, which improves
performance and helps Data Hub move more data.
Create your Tomcat startup file. There is a Linux example and a Windows example listed below. Just
copy and paste to your system. Note that the <CATALINA_OPTS> described below is just an
example and should not be used as isin the production environment. You need to determine your own
best configuration
Optional: If you need a more secure environment for your Data Hub installation, you can add SSL
authentication
Choosing a Database
Data Hub requires a dedicated database for staging data, and saving
item metadata and statuses related to load, composition, and
publication actions. Data Hub can be configured to use several
common relational databases. Flexibility of this type provides you the
opportunity to select the ideal database solution for your data
integration project.
Overview
Data Hub is a data integration and staging platform that works primarily asynchronously. Raw item
data is first loaded from source systems. It is then composed into canonical items. Finally, it is
published to one or more target systems in a form suitable for those systems. Each of these stages
occurs independently. Because load, composition, and publication events may be triggered at any
time, Data Hub must store the following:
In addition, each item includes metadata that defines both its structure, and its relationship to the next
state in the data transformation workflow. The workflow being raw to canonical, or canonical to
target. Additional metadata is also required to describe each target system
Thirdly, each item is marked with a status that indicates its progress in the Data Hub workflow.
Which would be the outcome of any load, composition, or publication event. Statuses are also
recorded for target system publication events.
Persist all of these data types - data items, metadata, and statuses. In sum, they not only enable the
function of Data Hub, but also constitute a complete history of all data transformations. A sort
of golden record history of Data Hubevents for auditing purposes.
Persisting this data requires a dedicated database.
Performance
Data Hub employs highly concurrent processing for maximum efficiency and throughput. It uses
hibernate for non performance-critical transactions and has its own implementation of a jdbc
repository for performance-critical transactions. These Data Hub features provide a level of
persistence abstraction that is compatible with a range of common relational databases. Some of the
databases may have their own performance limitations, depending on configuration. The choice of
database does not affect Data Hub performance in any significant way.
Please refer to the related links section about the individual database topics. Consult with a DBA for
the right choice of database. The DBA can help you create a performance-related configuration
tailored for the needs of your data integration project.
Data Retention
Any data from completed publications remains in the database, forming a complete auditing record
of your data transformation history, as described previously. You may wish to keep the auditing
record indefinitely, but over time it can affect Data Hub performance. Previous auditing records can
be cleaned up, either manually or automatically, using the provided Data Hub clean up extension.
You may also wish to develop your own extension to perform the clean-up according to your
requirements. See Activating Data Hub Database Cleanup.
Database Schema
During initialization, Data Hub creates its own schema and initializes this schema with the metadata
loaded from its extensions. By default, the kernel.autoInitMode is set to update to prevent
data loss. To refresh the database at any time, drop and create the database manually. Then
restart Data Hub to regenerate the schema.
Supported Databases
By default, Data Hub is configured to use the HSQL database. However, HSQL is not a supported
database for production deployments.NoteData Hub is case sensitive, so your chosen database must
also be case sensitive. Of the databases supported by Data Hub, only MySQL is not natively case
sensitive. Instructions for configuring it to be case sensitive are included.For production, Data
Hub supports several relational databases. They include:
MySQL
Oracle
SAP HANA DB
MSSQL
To avoid potential issues in certain cases, ensure that your database supports case-sensitive queries.
More information is provided in the individual database topics.
Using MySQL
MySQL is a popular, open-source relational database system. Data
Hub can be easily configured to use MySQL
Note
By default, MySQL performs case-insensitive queries, which may be an issue in some cases. Case
sensitivity is set using the collate parameter. To enable case-sensitive queries in Data Hub, create the
schema so it is configured as follows
Using Oracle
Oracle is an enterprise database management system (DBMS)
produced by the Oracle Corporation. Data Hub can be easily
configured to use an Oracle database.
Context
Complete the following steps to configure your Data Hub installation to use an Oracle database.
RestrictionWhen using Oracle SE, Data Hub and SAP Commerce cannot share the same Oracle
SE Instance.
Procedure
Context
Complete the following steps to configure your Data Hub installation to use an SAP HANA
database.
Procedure
Using MSSQL
MSSQL is a relational database management system (DBMS)
produced by Microsoft. Data Hub can be easily configured to use
MSSQL.
Context
Complete the following steps to configure your Data Hub installation to use a MSSQL
database.NoteMSSQL is case sensitive by default.
Procedure
Create your database.
The default Data Hub installation relies on a database instance with the name of integration, with an
administrative user named hybris and the password hybris. You can change the database instance
name as well as the username and password. Reflect the changes in your database connection
information located in the local.properties file. A user with sufficient privileges to grant the
rights for the Data Hub database creates the database. Including full schema privileges to the
database instance.
Auto-initialization of the Data Hub database schema is possible during the start up cycle of Data
Hub. To specify how and if this auto initialization occurs, add the
property datahub.autoInitMode to your local.properties file, as follows:
Securing Your Data
Hub Application
There are several ways in which you can secure your Data
Hub application using simple configuration. These steps ensure basic
end-to-end security of REST endpoints and data attributes.
Context
Data Hub provides a default Spring security profile. You must provide authentication credentials for
the roles defined in this profile.
Procedure
Context
If you are using Data Hub with SAP Commerce, provide connection credentials for the Data Hub
Adapter so it can connect to Data Hub.
Procedure
Context
If you are using Data Hub with SAP Commerce and the Data Hub Adapter, configure a dedicated
OAuth client for Data Hub. This configuration is done in the Backoffice Administration Cockpit.
Procedure
Set Up Encryption
Context
Data Hub comes with some built-in encryption capabilities for attributes that you wish to keep secure
in the data store. Use this service for such items as passwords and other sensitive data. This is a
mandatory step, as target system passwords are encrypted by default.
Procedure
Context
Once you have configured encryption and stored your key, you can specify which attributes you wish
to secure.
Procedure
Activating Data Hub Database
Cleanup
Over time, a Data Hub database accumulates database records that
are used for auditing, but as these records accumulate, they can also
affect performance. The following document describes these records
and which database tables are affected
When a Data Hub instance has been running for a long time, too many audit records accumulate in
the Data Hubdatabase. These records do affect performance. If not needed in the active Data
Hub database, the historical auditing database records can be migrated to an archive database before
elimination. They can also be eliminated without affecting the current state of Data Hub. These
operations can be performed even when there are processes currently being performed by Data
Hub on the database. The only consequence is that removing the records also removes the audit
history for the records. The audit records show how and when they have been imported, composed,
and published.
Out of the box, the datahub-cleanup extension executes a set of default deletion behaviors. The
default behavior deletes the following:
All archived canonical items and their associated publication errors and status after a
composition
All target items after they are finished being used during a publication
To enable the default behavior, the corresponding properties must be set to true in the
deployed local.propertiesfile as follows:
Data Hub is shipped with these property values set to false.
Defining the Cleanup Batch Size
If not explicitly specified, the default batch size is 1000 - the Oracle maximum. You can set the
default batch size to any positive integer, but it cannot be set to a negative value.
all canonical items that have previously failed to publish but have not reached their max retry
limit
All of these canonical items in the pool are processed with the publication. The publication is
comprised of one or more publication actions. If there is a maximum publication action size, then the
publication action is limited to that number of items. Whatever the number of publication actions
needed to publish all items, Data Hub creates them and queues them. One set of input data has the
possibility of being split across several publication actions, because items are not pre-assigned to
specific publication actions.
The cleanup extension is required for any Data Hub installation, because it has a powerful, positive
impact on performance. However, it can have negative impacts if it is misconfigured. You use the
following two timeout properties to configure it. The two timeout properties described below are
critical for a proper configuration. The properties must be in your local.properties file
at Data Hub startup. If you activate the cleanup extension without specifying a value for these
properties, they each default to 12 hours.
Excluding Certain Types from Cleanup
It may be useful to exclude certain canonical item types from deletion by the cleanup extension
Data Hub CLASSPATH
Configuration Files and
Recipes
Data Hub relies on configuration files and property files. Your recipe
can create and properly place these files.
A typical Data Hub deployment requires some resources that are placed on the application classpath.
All resources can be configured inside a resources element within the Data Hub configuration clause.
Here is how you can do that
Configuring Data Hub Binaries
with a Recipe
Data Hub is useless unless some extensions are deployed with
it. SAP Commercealready contains some Data Hub extensions, which
can be used in the recipes. The extensions can be found
in <PLATFORM_HOME>/../ext-
integration/datahub/extensions directory. If you develop a
new Data Hubextension, and it exists somewhere on your local file
system, it may be included in the recipe also.
Customizing Data
Hub Deployment
You can use a recipe to customize a Data Hub deployment.
Context
Data Hub is a Java web application, and it utilizes a relational database. It requires certain
prerequisites for a minimal installation. Unless otherwise stated, the following instructions
are not valid for third party software versions other than those stated here.
Procedure
1. Install the Java Development Kit 1.8
Using your browser, go
to http://docs.oracle.com/javase/8/docs/technotes/guides/install/install_overview.html an
d install the Java JDK for your Operating System. 64 bit Java is supported.
2. Install Apache Tomcat 7.x.
documentation, https://tomcat.apache.org/tomcat-7.0-doc/setup.html .
TipStep 3 through Step 5 are critical to a successful installation of Data Hub. You are going
to use these folders often as you work through the documentation and use Data Hub.
3. Create an /opt/datahub folder in the root directory.
4. In the datahub folder, create the following new folders:
a. config - for eample, to be used for the
files local.properties and datahub.encryption.key.txt
b. extensions
5. Create a Tomcat webapp context XML file called datahub-webapp.xml. datahub-
webapp is going to be the name of the web service.
RememberAll of the Data Hub documentation is based on the following datahub-
webapp.xml file.
a. Open <TOMCAT_HOME>/conf/Catalina/context.xml in your favorite
text editor. If it does not exist, go to
the <TOMCAT_HOME>/conf/Catalina/localhost folder, open a new file,
and move on to the next step.
b. If it is already customized, edit it as follows. If it is not customized, replace the
contents of the file with:
6.
a. NoteEdit the <docBase> parameter to reflect the path to the SAP
Commerce installation and the current Data Hub version. You can find this version
by following the path mentioned for the <docBase> to the WAR file. Also, the
version number on the end of the docBase attribute is just for identification. It serves
no other purpose and has no other impact.
b. Save the file with the new name datahub-webapp.xml into
the<TOMCAT_HOME>/conf/Catalina/localhost folder.
7. Install the cURL command-line tool.
ou can find the Solution Book files in the Data Hub Suite ZIP file. After you install Data Hub, you
can find the files in hybris/bin/ext-integration/datahub/solution-book. You can
Note
The Solution Book files and recipes are optimized for versions 6.7 and above, and are provided as
examples only. It is assumed and recommended that you regularly patch the Data Hub application.
Deploying to a Docker Image
You can choose to convert and deploy the Data Hub image as a
Docker image
Customizing Data Hub
Out of the box, Data Hub allows you to perform the tutorials in this
documentation. Customizing Data Hub functionality requires custom
tools and extensions. With customizations, the Data Hub can do a
great variety of data handling operations
Anatomy of an Extension
A Sample Extension XML File
The extension XML file is used to define the data structure. As mentioned previously, Data Hub uses
three main data structures: raw, canonical, and target. Each one has its own XML file. Following is
an abbreviated canonical extension XML file. The key tags in the canonical item XML file are
The extension tag is the root element for the extension definition. The attribute name is
mandatory, and must contain your extension name. Other extensions refer to your extension
by this name, so be thoughtful about the name you choose. The name should be descriptive
enough that other developers, teams, or companies know what business domain your
extension addresses. It should also be unique in a way that other teams or companies do not
pick an identical name for their extension. If the version attribute is used, it should help
you identify the version of the extension you are deploying.
While Java source is optional in an extension, it is common. You can do some data manipulation
within the XML file, but for intricate work or faster bulk work, java source code is more powerful.
The XML that defines your data structures is the minimum requirement for a Data Hub extension.
Package your extension into a JAR file, and place it in the Data Hub class path. Follow the steps here
to create and load an extension jar file for your canonical data structure.
Procedure
If you develop multiple Data Hub extensions, you can define multiple build profiles with additional
dependencies declared. With these, you can easily deploy each of the extensions separately or certain
combinations of the extensions simply by adding the profile name to the build command.
Illustrating a Dependent
Extension
Your extension may declare dependencies to other Data
Hub extensions. Ensure that all dependencies can be successfully
resolved
To discover and resolve dependencies, Data Hub loads all of the extension XML files, reads all of the
declared dependencies, and checks whether they have already been loaded or not. If the declared,
required extension is found in Data Hub but is not loaded yet, Data Hub loads it before loading your
extension. If the required extension is not found, your extension is not loaded, and a corresponding
warning or error message is reported in the log files. The failure of your extension to load means any
other extension dependent on your extension is not loaded either.
Besides missing a required extension, it is possible that two or more extensions create a circular
dependency. For example, your extension depends on another extension, the other extension depends
on yet another extension, and that one depends back on your extension. Such circular dependencies
are identified by Data Hub during extension loading, and Data Hub excludes any extension in the
circular dependency chain from loading.
Dynamically Loading an
Extension
Data Hub primarily relies on extensions that are loaded at startup.
However, it is possible to dynamically load an extension while Data
Hub is running. There are some limitations with this process but using
dynamic extensions is better than shutting Data Hub down.
You can either load an extension statically or dynamically. Extensions loaded statically are loaded
during Data Hubstartup and initialization. Extensions loaded dynamically are loaded during runtime
using a REST call, without the need to shut down and restart Data Hub.
Security
Various aspects must be considered when planning a secure Data
Hub deployment. Alongside application-specific features, you should
also pay attention to the network, database, web container, and
monitoring.
In this topic area you get an overview of all aspects of security related to your Data Hub installation
and operation. You learn the importance of setting up a secure network and container,
configuring Data Hub to use basic authentication for all REST endpoints, and using SSL for all
communication to and from Data Hub. In addition, you learn how to encrypt data stored in the Data
Hub database.
Infrastructure Security
Security is an important topic, and Data Hub provides several security
features and is compatible with others
Network Security
To prevent denial-of-service and similar attacks, install Data Hub within a DMZ. Then the network
ports used by Data Hub are not exposed to the internet. The DMZ ensures that only known clients
can connect to Data Hub resources.
Container Security
Configure your application server to use SSL, thus encrypting all requests and responses and further
securing your data. Apache Tomcat, for example, can be configured to force any Data
Hub connections over SSL/TLS (HTTPS). To configure the SSL, have an experienced system
administrator with knowledge of generating self-signed certificates do the work. More information
and a tutorial is available at the Apache Tomcat website in the topic SSL/TLS
Monitoring Security
Secure monitoring communications between a network administration center and Data Hub to
prevent man-in-the-middle attacks. Data Hub and its JVM can be monitored using the out-of-box
monitoring solution named Jconsole. There are also other monitoring solutions that are compatible
with the Java Management Extensions (JMX).
Application Security
Data Hub implements basic authentication configured with Spring
Security to minimally secure its RESTful API endpoints. Authentication
is enabled by default, and Data Hubreturns a HTTP status code 401
Unauthorized when secured REST endpoints are accessed without a
valid authentication header.
Basic authentication requires credentials (username and password) to be included in all requests
to Data Hub REST endpoints. Only the status and version endpoints are not secured. These can
be used to validate your Data Hubinstallation without the need for authentication.
By creating a separate profile, you ensure that no conflicting filters are defined, and that you properly
override the default security configuration.
A custom Spring Security profile enables you to expand and customize the available user roles, or to
use other Spring Security features such as Spring LDAP, thus providing a single-sign-on experience
using an external LDAP server and circumventing the need to define user credentials in a properties
file.
Using SSL
When using basic authentication, the username and password is encoded using base64 and
transmitted in the Authentication header of REST requests. This request can be intercepted and
decoded to reveal the username and password, and it is therefore strongly recommended that you
configure your Tomcat installation to use SSL for all such requests, and to refuse non-encrypted
connections. To read about enabling SSL with Tomcat, see SSL/TLS
Configuring Data Hub Adapter for OAuth
Context
Data Hub ships with a default Spring security profile that includes definitions for an admin user, and
a read-only user.
If you want to define an additional role to gain more granular control over Data Hub access, you can
do so in a custom security profile. With a custom security profile, you can, for example:
restrict a user to only using the GET and POST methods, but not PUT or DELETE
Both of these examples are included in this tutorial. Many more possibilities exist for your own
custom security profile.
Procedure
The username must be unique. Do not define a user name for the load user that is already defined for
another role.
You now have a new security profile that you can use in place of the existing default profile. It can
include any number of new user roles, allowing fine control over access to the various Data
Hub REST endpoints, and available methods
Data Security
Data Hub data is stored in your chosen RDBMS. When configuring it
for use with Data Hub, create a dedicated user for Data Hub with only
the required permissions required.
The two aspects of database security you should consider at a minimum are the user and permissions,
and attribute encryption.
Attribute Encryption
Data stored in Data Hub database can include target system passwords, and other sensitive data from
source systems. Data Hub provides support for the encryption of sensitive data in the database. It also
masks this type of data when returned in the body of RESTful responses or recorded in log files.
Always enable encryption and attribute masking to protect sensitive data. By default, Data
Hub application password is always masked.
Set Up Encryption
Context
Data Hub comes with some built-in encryption capabilities for attributes that you wish to keep secure
in the data store. Use this service for such items as passwords and other sensitive data. This is a
mandatory step, as target system passwords are encrypted by default.
Procedure
Adapter Security
Basic Authentication is implemented in both directions between Data
Hub and SAP Commerce. The credentials configured in Data
Hub must also be configured in Data Hub Adapter in order for them to
communicate
Data Hub Adapter is an SAP Commerce extension that links SAP Commerce to Data Hub. Data
Hub and SAP Commerce interact in a client/server architecture where Data Hub is the client,
and SAP Commerce is the server. Authentication is implemented on both client and server side.
Inbound Connections
For inbound connections (client to server), set up a user in SAP CommerceData Hub with a
username and password. Then provide these credentials in the target-item.xml file of
your Data Hub extension in the target system definition,
Outbound Connections
For outbound connections (server to client), Data Hub employs HTTP Basic Authentication. This is
defined in the specifically for Data Hub Spring Security configuration. Provide these credentials
to SAP Commerce in the SAP Commerce local.properties file
Context
A default OAuth configuration exists that uses a client called eic. However, you should define your
own OAuth client in any production system.
Examples of things covered include the concept of actions and a series of topics addressing items
with specific issues in each of these three areas. The three primary topics, load, compose, and
publish, comprise the action points for the Data Hub. There is a linear relationship between them.
Once you complete this topic area, you should have a full understanding of these three main
processes.
Load
Load is the most straightforward of the three main Data
Hub processes. As such, there are not many modifications you can
make to the load process. However, there are a few things you can do
to make it more efficient with specific kinds of data and with specific
load actions.
The Data Hub Spring Integration mechanism enables data loading of raw (fragmented) data
as key/value pairs. Each fragment is represented as a Map<String, String> where the key
represents the name of the raw attribute. The second element is the value corresponding to the key. In
the following snippet from a rawExtension.xml file, the raw
attribute <name>city</name> has city as the key. When the source data is prepared for loading,
the key city is matched with the correct source data; a city name.
If there is no value (for example, the value is null), the attribute is ignored. Any string, including the
empty string, is supported as an attribute value. When an attribute value is set to an empty string, this
value means that the attribute has been cleared. There is a difference between empty strings and null
values in raw items. Null is treated like an ‘ignore’. An empty string, however, is treated as an
explicit value, so the empty string overrides any previous value.
CSV
Before the CSV data begins to load, Data Hub creates a data loading action, assigns an ID to it, and
gives it a status of IN_PROGRESS. Because it is CSV data, all of it must load before the data
loading action is complete. Once it is complete, the data loading status changes to COMPLETE. All
of the raw item statuses are set to PENDING. The raw items are ready to be composed.
Spring
Since the custom code loads data into the Spring channel, each piece of data is given its own data
loading action. Each data loading action is assigned an ID and given a status of IN_PROGRESS.
After the piece of data is loaded, the data loading status changes to COMPLETE. The raw item status
is set to PENDING, and the raw item is ready to be composed. If the data fails to load, the action is
set to a FAILURE status.
In both cases, when the composition is complete, the data loading action status is set to
PROCESSED.
Compose
There are several things mentioned here that help you understand the
fundamentals of composition. Additionally, there is information about
composing IDocs, which may improve the performance of Data Hub.
Canonical Metadata
1. What item types can be created from the imported item type.
3. How the composed item attributes are constructed from the imported raw items.
Don't use reserved identifiers such as "id", "status", or "type" when naming attributes of Raw or
Canonical models. If a Composition or Publication expression uses one of these attributes, the actual
value read is not the one provided by the user on the Raw or Canonical model, but the underlying
technical attribute.
At the end of Data Composition, the canonical data set is complete and the data pool is populated
with the new canonical items. The metadata defining this process consists of:
IntegrationKey
The IntegrationKey is a critical element in the Data Hub data structure. It is created
automatically by Data Huband is composed by concatenating the primaryKeys you have identified
in your canonical extension XML. The method used to create it
is IntegrationKeyGenerationStrategy, which can be found in the SDK. It looks like:
There are some instances where you might want to override the default creation method. The
simplest way to do this is to override this bean with your own implementation in an extension.
Each attribute can be defined as single string attributes, sets, or as localized values.
Composition Actions
Composition actions are the construct within which raw items are
moved through the composition phase.
The composition phase starts when you initiate it using an event
(InitiateCompositionEvent) or a POST request. The Data Hub then creates a composition
action and opens the composition queue. It then composes raw items from the specified pool. The
composition phase runs asynchronously, and the POST request returns immediately indicating the
composition action is IN_PROGRESS. If you are using events, Data Hub fires several events that
help track composition activity.
Data Hub deals with composition actions in one of two ways.
1. All raw items are composed. Data Hub assigns the appropriate status to the composition
action, and it is done.
2. Composition continues until the datahub.max.composition.action.size is met.
(Default size is 100,000.) At that point, Data Hub assigns the appropriate status to the
composition action, and it is done. If further composition is needed, a new composition
action is triggered
1. Current canonical items with a SUCCESS status are updated when new raw items matching
their data pool and integration key are composed.
2. For each item to be updated, a new merged canonical item is created. Its values are populated
by merging the existing canonical item and the just composed raw item. The new, merged
canonical item has a status of SUCCESS.
3. The status of the old matching canonical item and the just composed raw item are set to
ARCHIVED.
Transforming Data (Advanced)
Transforming data is the primary task of Data Hub. Data Hub provides
several tools for complex data transformations, in addition to simple
data item mapping in your data model XML files.
All common SpEL expressions are supported by Data Hub. You can use SpEL to add more powerful
transformations to your data model XML. Data Hub also provides the custom SpEL resolve()
function, which allows the use of a lookup table for resolving data values. For even more complex
transformations in the publication phase, consider using a Publication Grouping Handler.
ImpEx
SAP Commerce is shipped with a text-based import and export
functionality called ImpEx. The ImpEx engine allows creating,
updating, removing, and exporting Platformitems such as customer,
product, or order data to and from comma-separated value (CSV) data
files, both during runtime and during the initialization or update
process.
The ImpEx key features are:
Import and export of SAP Commerce data from and into CSV files.
Create initial data for a project, or migrate existing data from one SAP Commerce instance
into another (during an upgrade, for example).
Facilitate data transfer tasks, for example, for CronJobs, or for synchronization with third-
party systems (such as an LDAP system, or SAP R/3).
An SAP Commerce extension may provide functionality that is licensed through different SAP
Commercemodules. Make sure to limit your implementation to the features defined in your contract
license. In case of doubt, please contact your sales representative.
Function Overview
The ImpEx engine matches data to be imported to the SAP Commerce type system. ImpEx allows
separate data to be imported into two individual CSV files, one for the matching to the type system
and the other file for the actual data. That way, swapping input files is easy. Importing items via
XML-based files is not supported. For details, refer to the ImpEx documentation. There are three
main fields of use for ImpEx:
During Development:
TipImpEx import is based on the ServiceLayer. This means that actions like INSERT, UPDATE,
and REMOVE are performed using ModelService, thus the ServiceLayer infrastructure like
interceptors and validators is triggered.
The Data Hub is a service-oriented, standards-based data integration solution, designed to help lower
implementation time and costs, and allows you to maintain control over ongoing data integration and
maintenance. The Data Hub is designed to easily import and export data between SAP
Commerce and external data storage systems (including ERP, PLM, CRM, etc.) taking advantage of
features like event publication and subscription data fragments can easily be consolidated and
grouped, and users can create categories and assign data accordingly. The Data Hub reduces the
effort for systems integration and initial data migration through a fully integrated toolset and pre-
built data mappings, including SAP pre-built extension integrations for key types.
In December 2008, SAP conducted an evaluation of existing ETL tools available at that time as
shown below:
During the evaluation, SAP found that Talend Open Studio was quite powerful and easy to use and
might be a good choice for smaller fields of use. For large-scale applications, the Oracle Data
Integrator might be a good choice, especially if an Oracle support contract is already available.
Impex stands for Import and export. As the name suggests, Impex in hybris is used for
importing data from CSV file / impex file to hybris system and exporting data
from hybris system to CSV file. ... Impex import means always data flows intoHybris system
Hot Folder is a pre-designated common folder on the server. Any CSV data placed in
the folder invokes the import process called ImpEx and the result of the import is instantly
loaded into SAP Hybris system using a pre-defined data translation mode
ImpEx API
The ImpEx API allows you to interact with the ImpEx extension. You
can use the API to import and export data, and extend or adapt it to
your needs
Import API
You can trigger an import using the import API in a number of ways. These include using the
back end management interfaces, as well as triggering it programmatically
Export API
You can trigger an export using the export API in a number of ways. These include using the
back end management interfaces, as well as triggering it programmatically.
Validation Modes
The validation mode controls validation checks on ImpEx. By default, strict mode is enabled
meaning all checks are run.
Customization
You can extend your import or export process with custom logic. Customization allows you
to addresses requirements that cannot be achieved completely with the ImpEx extension.
Scripting
You can use Beanshell, Groovy, or Javascript as scripting languages within ImpEx. In
addition, ImpEx has special control markers that determine simple actions such
as beforeEach, afterEach, if.
User Rights
The ImpEx extension also allows modifying access rights for users and user groups.
Translator
A translator class is a converter between ImpEx-related CSV files and values of attributes
of SAP Commerceitems
There are four basic ways of triggering an import of data for the ImpEx extension:
1. Hybris Management Console using the ImpEx Import Wizard. For more details, see Using
ImpEx with Hybris Management Console or SAP Commerce Administration Console.
2. Hybris Management Console creating an Import CronJob. For more details, see Using ImpEx
with Hybris Management Console or SAP Commerce Administration Console.
3. Using ImpEx extension page. For more details, see Using ImpEx with Hybris Management
Console or SAP Commerce Administration Console.
Procedure
Instantiate the class.
While instantiating the Importer class, this CSV-stream is given using a CSVReader or
an ImpExImportReader. If you only want to specify the input stream, use an CSVReader class,
the Importerinstantiates a corresponding ImpExImportReader. The usage of
an ImpExImportReader is only needed, if special settings while instantiation are needed (settings
after instantiation can be done using the getReader()method of the Importer instance).
Using ImpExImportCronJob
When using ImpExImportCronjob, you have the advantage of
persistent logging, as well as persistent result and settings holding.
Export API
You can trigger an export using the export API in a number of ways.
These include using the back end management interfaces, as well as
triggering it programmatically.
You can trigger an export of data for the ImpEx extension in the following ways:
When using the ImpExExportCronjob, you have the advantage of persistent logging, as well as
persistent result and settings holding. You can create a cron job using
the createDefaultImpExExportCronJob method of theImpExManager class. Possible
settings are provided in the API of the cron job class.
Import Strict - Mode for import where all checks relevant for import are enabled. This is the
preferred one for an import.
Import Relaxed - Mode for import where several checks are disabled. Use this mode for
modifying data not allowed by the data model like writing non-writable attributes. Please be
aware of the fact that this mode only disables the checks by ImpEx, if there is any busines
logic that prevents the modification, the import fails anyway.
Export Strict (Re)Import - Mode for export where all checks relevant for a re-import of the
exported data are enabled. This is the preferred mode for export if you want to re-import the
data as in migration case.
Export Relaxed (Re)Import - Mode for export where several checks relevant for a re-
import of the exported data are disabled.
Export Only - Mode for export where the exported data are not designated for a re-import.
There are no checks enabled, so you can write for example a column twice, which cannot be
re-imported in that way. Preferred export mode for an export without re-import capabilities.
Customization
You can extend your import or export process with custom logic.
Customization allows you to addresses requirements that cannot be
achieved completely with the ImpEx extension.
Writing A Custom Cell Decorator
Using a cell decorator you can intercept the interpreting of a specific cell of a value line between
parsing and translating of it. It means the cell value is parsed, then the cell decorator is called, which
can manipulate the parsed string and then the translation of the string starts
Scripting
You can use Beanshell, Groovy, or Javascript as scripting languages
within ImpEx. In addition, ImpEx has special control markers that
determine simple actions such asbeforeEach, afterEach, if.
With the scripting engine support in SAP Commerce, you can set the value of the
flag impex.legacy.scriptingto false to benefit from new scripting features in ImpEx. You
can then use not only Beanshell, but also Groovy and Javascript.
Standard Imports
By default, a number of standard imports are always provided to you by default in ImpEx scripting,
so you do not need to call them yourself. Here is a list of those imports:
User Rights
The ImpEx extension also allows modifying access rights for users
and user groups.
Translator
A translator class is a converter between ImpEx-related CSV files and
values of attributes of SAP Commerce items
A translator is one of the two ways SAP Commerce offers for using business logic when importing
or exporting items.
On import, a translator converts an entry of a value line into a value of an SAP Commerce item. It
writes the value from the CSV file into SAP Commerce.
On export, a translator converts a value of an attribute of a SAP Commerce item into a value line. It
writes the value from SAP Commerce into a CSV file.
ImpEx Syntax
SAP Commerce ships with an integrated text-based import/export
extension called ImpEx, which allows creating, updating, removing,
and exporting platform items such as customer, product, or order data
to and from Comma-Separated Values (CSV) data files - both during
run-time and during the SAP Commerce initialization or update
process.
CSV Files
The ImpEx extension uses Comma-Separated Values (CSV) files as the data format for
import and export. As there is no formal specification, there is a wide variety of ways in
which you can interpret CSV files.
You can use syntax highlighting rules in UltraEdit to have a more colored view upon ImpEx
files.
Import
For importing data to the platform via the Hybris Management
Console (HMC), you have to create and configure a CronJob of type
ImpExImportCronJob.
The configuration of such a CronJob and the import result attributes are described in the next
paragraph. To make the configuration of such a CronJob easier, the HMC provides a wizard of type
ImpExImportWizard. For more information, seeImport Wizard.
Another possibility for importing data that has nothing to do with the HMC but is also a more
graphical alternative, is the usage of the ImpEx Web. This alternative is intended for development
only and can only be used by administrators. For more information
Export
For exporting data from Platform to CSV-files via the Hybris
Management Console(HMC) you have to create and configure a
CronJob of type ImpExExportCronJob.
You first need an export script that you can generate using the Script Generator. The configuration of
such an export CronJob and the export result attributes are described in the next section. For making
the configuration of such a CronJob more easy, the HMC provides a Wizard of
type ImpExExportWizard. This Wizard is described in the Export Wizard section below.
Another possibility for exporting data, which has nothing to do with the HMC but is also a more
graphical alternative, is the usage of the ImpEx Web. This alternative is intended for development
only and can only used by administrators.
ImpEx Media
An ImpEx media represents in general a CSV/impex file or a ZIP
archive containing CSV/impex files. They are used only by the ImpEx
extension for import and export processes.
3. Finish phase
Executing Import
from Administration Console
To import data in the distributed mode using Administration Console,
use the same Administration Console page that the classical ImpEx
uses.
Context
Choose a file that includes data you want to import and start importing.
Procedure
1. Log into Administration Console.
2. Hover the cursor over Console to roll down a menu bar.
3. In the menu bar, click ImpEx Import.
ImpEx Import page displays.
4. Switch to the Import script tab.
5. Click Choose file and load your file.
6. Tick the Distributed mode option.
Click Import.
You should get a message about import status.
Executing Import
from Backoffice
To import data in the distributed mode, use the standard ImpEx import
wizard.
Context
Choose a file that includes data you want to import and start importing.
Procedure
1. Log into Backoffice.
2. Open ImpEx import wizard.
a. Click System.
b. Click Tools.
c. Click Import.
ImpEx import wizard opens.
3. Choose the data file to upload.
4.
a. Click upload and choose your file.
b. Click create.
5. Click Next to switch to the Advanced Configuration tab.
6. Select Distributed Mode.
7. Click Start.
The view switches to ImpEx Import Results.
8. Click Done.
Executing Import on Selected
Node Groups
Distributed ImpEx uses TaskEngine internally, which was designed to
work well in a cluster environment. This enables you to choose which
node group to execute import on.
Context
For backward compatibility, an instance of ImpExImportCronJobModel is available as a result
of data import execution. This CronJob contains all the logs from a given execution, as well as its
status.Caution
To look up logs from a particular Distributed ImpEx import execution, follow the procedure.
Procedure
1. Log into Backoffice.
2. Look up a CronJob from a given import execution:
a. Click System.
b. Click Background Processes.
c. Click CronJobs.
You can see a list of CronJob instances from particular import executions.
3. Click the CronJob instance you're interested in.
The CronJob's editor area opens. Here you can find the status of the CronJob execution, as
well as a list of items containing the logs:
Reduce import complexity and empower data managers to define import mappings using an
intuitive graphical user interface.
Keep your content accurate by importing the most current product information from your
suppliers and business partners.
Aggregate all product information scattered across various systems and departments.
The Import Cockpit enables the user to import data into the SAP Commerce platform using a CSV
source file without the need of specifying an ImpEx import script. Using Import Cockpit you can avoid
extensive integration efforts consumed by the creation of import interfaces
A flexible and high-performing Import Cockpit integrates with multiple source systems and supports
the rapid and seamless migration of vast product data. All data imports are managed from a central
location. Via import mappings you can load high volume data from various external sources as a flat
CSV file into the SAP Commerce application. The Import Cockpit allows you to manage, configure,
run, cancel and monitor import jobs. Also you can attach files and define import parameters.
The Import Cockpit also facilitates a selective data import. You can define a number of data lines to
be skipped. It is also possible to define constant values for attributes in all imported objects instead
of using field values of the CSV source data file. In addition, it is not necessary to map all source
data columns to attributes, so that certain columns can be omitted during import.
Moreover the functionality of the Import Cockpit provides the user with additional information about
the current job status, results or the history of imports. Due to these generated imports, the basis is
given to analyze and enhance the support service.
A flexible and high-performing Import Cockpit integrates with multiple source systems and supports
the rapid and seamless migration of vast product data. All data imports are managed from a central
location. Via import mappings you can load high volume data from various external sources as a flat
CSV file into the SAP Commerce application. The Import Cockpit allows you to manage, configure,
run, cancel and monitor import jobs. Also you can attach files and define import parameters.
The Import Cockpit also facilitates a selective data import. You can define a number of data lines to
be skipped. It is also possible to define constant values for attributes in all imported objects instead
of using field values of the CSV source data file. In addition, it is not necessary to map all source
data columns to attributes, so that certain columns can be omitted during import.
Moreover the functionality of the Import Cockpit provides the user with additional information about
the current job status, results or the history of imports. Due to these generated imports, the basis is
given to analyze and enhance the support service.
Mapping Tab
n the mapping tab of a particular import job you can define or edit mapping data columns. The data
columns can be dragged to the mapping area in order to assign a column to the internal SAP
Commerce data structures. The mapping area shows a list of mapping entries that provide a
correlation between the SAP Commerce type attribute and the source column.
Benefits
Reduce import complexity and empower data managers with an intuitive, graphical UI.
Keep your content accurate by importing the most current product information from your
suppliers and business partners.
Data mapping via drag&drop - Allows cost-efficient supplier self-service models without IT4.5.0
involvement.
Mapping validation - Ensures data consistency and reduces follow up cost of bad data. 4.5.0
Progress tracking - Monitors the progress of an import job including last started job in the 4.5.0
current session.
Browser-based, no client installation needed - easy roll-out to a large and external user
base.
Import Cockpit Interface
SAP Commerce Import Cockpit enables you to import data into the SAP Commerce using a CSV
source file without the need of specifying an ImpEx import script. You can perform the import
operations in the user-friendly interface of the SAP Commerce Import Cockpit.
Navigation area on the left side for previewing of the import jobs history.
Browser area in the center for browsing import jobs and mappings.
Editor area on the right side for editing the details of the import jobs.
Navigation Area
The navigation area consists of the following UI(user interface) components:
Menu
History box
Info box
You can expand or collapse all boxes using the triangle button on the upper right side of a box.
You can rearrange the most boxes in the navigation area using drag-and-drop operations.
The Info box cannot be moved.
Menu
Use the Menu for the following:
Choosing the language in which screen texts, catalog names, product descriptions, and the
like appear
Logging out
History Box
The History box displays a list of up to 20 modification steps. Every entry represents a modification
you have done in the course of the current Product Cockpit session. The list displays the earliest
modification that has been done at the top of the list, while the bottom entry indicates the latest
modification.
Click an entry to undo the represented modification plus any others that were done chronologically
after the modification step.
Undone modification steps are displayed with gray text. Click on an undone entry in the list to redo
the modification done in that entry and all modifications prior to that entry. In this
context redo means undo for undo, that is, a change you have undone is redone
You also can click the Undo and Redo buttons or use keyboard shortcuts Ctrl+Z for undoing
and Ctrl+Y for redoing.
As the modifications you make are written to the database outright and the undo/redo history is kept
with the Import Cockpit session, you are not able to directly review modifications other Import
Cockpit users have done. For a modification history of SAP Commerceitems, see SavedValues -
Keeping Track of Attribute Value Modification.
Info Box
The Infobox displays how many workflow-based tasks a user currently has assigned. It also
displays the number of comments.
If you have any comments, clicking on the number of comments brings up the comments
screen. You can review, edit, add attachments, delete, and reply to existing comments here.
If you have tasks assigned, clicking on the number brings up the task screen. You can
review your tasks and select an outcome for the tasks here.
Tabs:
o Welcome tab
o Job tab
o Mapping tab
Caption
Caption
The caption bar consists of the following UIcomponents:
Content Browser
se the content browser to enter a search string narrowing the number of displayed
Advanced Search
To access advanced search dialog, click Advanced Search button in the search input field.
Use the Clear button to delete all you entered in the input fields.
Use the Edit button to add search criteria. They appear as additional input fields.
Select Sticky check box to make advanced search options visible all the time. If you want to hide it,
clear the Sticky check box.
Click the Search button to perform search.
Welcome Tab
The Welcome tab is the default tab of the Import Cockpit. Here you can find the information on
You can also create a new import job or go to the jobs, you created. It is also possible to go to Wiki
documentation on Import Cockpit.
Job Tab
The Job tab is composed of two parts
Main area, displaying the list view of defined import jobs with the information on their status
and with action buttons.
Context area, with tabs showing detailed information on the selected job.
Mapping Tab
The Mapping tab is composed of two parts
Main area, displaying three sections and a toolbar, used to create and edit mappings.
Editor Area
By double-clicking on a job in the browser area, you open the job in the editor area. Here, you can
edit job data
The editor area consists of the header at the top plus a number of sections.
Header
The header displays the job's name. You can also browse through the products displayed in the
browser area using arrows.
Sections
The sections of the editor area display a number of job attributes you can maintain. You can
configure the list of attributes to be displayed. Use the TAB key to jump to the next attribute field,
and SHIFT + TAB to jump to the previous attribute field. Via the drag-and-drop operation, you can
move sections and the attribute fields within the editor area. You can re-order sections and the
attribute fields, even across sections. In addition, you can show hidden attribute fields and hide
displayed attribute fields.
The editor area of the Import Cockpit consists of the following sections:
Basic
Source File
Timetable
Logging
Basic
In the Properties section you can change the job name and upload the new source file. Here you
can also change the default Ignore error mode to Fail or Pause.
Source File
In the Source File section you can set the properties of the source file. Pay special attention to the
field Separator Character and to the radio button Has Header Line.
Timetable
It is possible to define an import job as a cron job. You can configure the trigger and other cron job
details in the Timetable section.
Logging
In the Logging section you can define the logging settings. By default the Log level database for
import jobs is set to ERROR, and Log level file is set to INFO.
Welcome Tab
The Welcome tab of the SAP Commerce Import Cockpit displays the essential information for
starting the work with the SAP Commerce Import Cockpit.
The Welcome tab is the default tab of the SAP Commerce Import Cockpit. Here you can find the
information on
ob Tab
In the Job tab of the SAP Import Cockpit you can browse the import jobs.
Main area, displaying the list view of defined import jobs with the information on their status
and with action buttons.
Context area, with tabs showing detailed information on the selected job.
Main Area
In the main area of the Job tab you can see the import jobs in a list view.
To start the job click the Start Import Job button . Note, that if this button is inactive that means
you should first create a mapping for this job.
Figure: The main area of the Job tab displaying the status of the Test Job
Trace Log tab, showing errors that occurred during the job run time.
Source Data (Table) tab, displaying the source file in the table format.
Output Impex tab providing output ImpEx script, generated from the source file.
Log History tab containing the logs from the job run time.
Mapping Tab
In the Mapping tab of the SAP Import Cockpit you can create and edit the mapping for your import
job.
Main area, displaying three sections and a toolbar, used to create and edit mappings.
Main Area
In the main area of the Job tab you can see three sections:
Source section, containing the names of the imported attributes from the source file.
Mapping section, used for matching the source attributes with the SAP Commerceattributes.
Drag the attributes from the Source column and drop them to the Mapping column. Next select the
equivalent from the SAP Commerce attributes column and drop it to the same row.
In the upper part of the main area you can find the toolbar, containing buttons for uploading ZIP files,
validating, and saving the mapping.
Figure: The main area of the Job tab displaying the mapping of the Test Job with the Product type.
Context Area
The context area of the Job tab consists of the following tabs:
Figure: The context area of the Job tab with the displayed mapping errors.
ou can also use the Hybris Management Console HMC) to perform the tasks that you can perform in
the SAP Import Cockpit. For more information, see Using ImpEx with Hybris Management Console
or SAP Commerce Administration Console.
Procedure
1. Open an Internet browser.
2. Enter the URL of the SAP Commerce Cockpit in the browser's address bar. The default URL
is http://localhost:9001/mcc. If you are not already logged in, the SAP Commerce Cockpit
Login appears.
3.
Data Language: Choose the language in which the screen text should appear.
User Group: View to which user group your user account is assigned.
A navigation area on the left side for previewing of the import jobs history.
A browser area in the center for browsing import jobs and mappings.
An editor area on the right side for editing the details of the import jobs.
1. n select an already uploaded file (Selet an Existing Reference) or create a new upload
(Create a New Item). Select Create a New Item.
6. Browse your file folders to locate the file that you want to upload.
7. Select the file that you want to upload and click the Done button.
8. Select the job you want to use. If no jobs are defined, you must create one. Enter a name in
the Job field.
Click the Done button.
Context
The job attributes are displayed in the editor area. All changes are saved automatically;
you do not need to save them manually. For more information on the editor area, see
Context
You cannot start an import job without using a mapping. If you do not have a mapping specified,
the Start import job button is inactive and you cannot import the desired file. The Start Import
Job button is only activated once you have defined a mapping.
To create a mapping:
Procedure
1. On the Welcome tab, click the View Your Jobs button.
In the Overview Import Jobs page, all your defined jobs are displayed.
2. In the Actions column of the job for which you want to create a new mapping, click the Edit
Mapping button .
5. Click the Done button.
9. Click the Save button to save the mapping. You can later reuse this mapping for other jobs.
In the example below you can see the created Test Job with the active Start import job button
:
Context
You need to have your import job mappings created as described in the Creating Mappings
documentation.
Procedure
1. On the Welcome tab, click the View your jobs button.
2. Click the Edit mapping button . Load the mapping that you want to edit.
Context
You can start your job directly after creating a mapping. To do so:
Procedure
You can also start your job from the Jobs tab by clicking the Start import job button
.
3. You can also check the history of all executed jobs since you have logged in in
the Execution history box in the navigation area of the Import Cockpit:
importcockpit Extension
The importcockpit extension provides the SAP Commerce Import Cockpit Module, which
enables you to import data into SAP Commerce using a CSV source file without specifying an ImpEx
import script. You simply define the type of data to be imported and the target attributes each source
data column corresponds to.
To ensure proper mapping and avoid omitting mandatory attributes, a set of mapping validation rules
are applied. T
he importcockpit extension offers a preconfigured user interface, also known as a perspective.
This perspective is widely customizable. Below, you can find more information about this topic.
Customization Options
The import cockpit consist of a single perspective that can be configured to offer multiple role-based
sets of user interface elements. For example, a principal group can be restricted from creating jobs
or mappings and only work with existing ones. The principal group can be further restricted to the
attributes the group is allowed to see, modify or map. This level of customization is a result of the
Cockpit framework on which the importcockpitextension is built. The Cockpit framework
offers several customization options described in cockpit Extension. You can distinguish different
levels of customization, easy, medium, and expert.
Easy Customization
You can also use the easy customization options described in cockpit Extension, section Easy
Customization, to configure most elements of the importcockpit extension. Further options not
fully covered in cockpit Extension are discussed below.
Configure the job tab. Configuring the Job Tab of the SAP Import
Cockpit
Learn about the customization options of the Configuring the Mapping Tab of the SAP
mapping tab. Import Cockpit
Change the console message descriptions. Configuring the Console Message Descriptions
Customization Options Documentation
Learn how to display and hide attributes in the Configuring Target Section of the Mapping Tab
target section.
Learn how to configure the main area. Configuring the Main Area of the Mapping Tab
Medium Customization
The medium customization needs very little implementation, because it can be done by using an
existing importcockpit extension as a template to be modified. For details, see the cockpit
Extension, section Medium Customization.
You need a valid ZK Studio Enterprise Edition license for the medium customization.
Expert Customization
You can also construct a new cockpit extension, based on the importcockpit extension or using
the yCockpit template. For details see the cockpit Extension, section Expert Customization.
Validation Types
There are five types of validation:
INSERT Mode Validators 1 to 4
Please note, that there is no way to configure the rules or strategies for validation.
Below is a list of the relational rules of the ImpExImportCronJob model depicted in the figure above
An ImportCockpitCronJob can have:
Only one mappingValid flag set. Although this is a mandatory attribute, if not set during
creation time the default value of false is assigned.
Only one nextExecutionTime object and has to be of the type java.util.Date. This is an
optional attribute and can be added after creation.
ImportCockpitJob
The ImportCockpitJob type extends ImpExImportJob
mpExImportCockpitMedia
The ImpExImportCockpitMedia type extends ImpExMedia
ImportCockpitMapping
The ImportCockpitMapping type extends Media.
ImportCockpitInputMedia
The ImportCockpitInputMedia type extends ImpExMedia.
hasHeaderLine A boolean flag that indicates whether the source input media CSV file contains any
header column names.
Configuring the Job Tab of
the SAP Import Cockpit
This document covers the default configuration shipped out of the box as well as other custom
configurations available to the importcockpit extension.
The following documents outlines the general configuration options for the cockpit browser area,
including the List View, Advanced Search and the base views:
Figure: Advanced Search and Sort Order options for the Job tab of the SAP Import Cockpit.
On the toolbar, a user has the option to choose the number of ImportCockpitCronJob to display in
the list view. The current values in the drop down box shows the default values.
Figure: The list view in the main area of the Job tab of the SAP Import Cockpit.
The default layout of the list view shows only a subset of the total properties that can be shown in
the list
Main area, displaying three sections and a toolbar, used to create and edit mappings.
The target section Configuring the Target Section of the Mapping Tab
The mapping section Configuring the Main Area of the Mapping Tab
Customization Options Documentation
Tip
Automatic Storing of UI Configuration
The Cockpit framework allows you to automatically store your configured user interface (UI)
without using XML. For more information, see Storing UI Configuration
Context
You cannot start an import job without using a mapping. If you do not have a mapping specified,
the Start import job button is inactive and you cannot import the desired file. The Start Import
Job button is only activated once you have defined a mapping.
To create a mapping:
Running Import Jobs
Use the SAP Import Cockpit to run your jobs.
Context
You can start your job directly after creating a mapping. To do so:
Procedure
1. Click the Run job now button in the top right corner.
You can also check the history of all executed jobs since you have logged in in the Execution
history box in the navigation area of the Import Cockpit:
importcockpit Extension
The importcockpit extension provides the SAP Commerce Import Cockpit Module, which
enables you to import data into SAP Commerce using a CSV source file without specifying an ImpEx
import script. You simply define the type of data to be imported and the target attributes each source
data column corresponds to.
To ensure proper mapping and avoid omitting mandatory attributes, a set of mapping validation rules
are applied
The section below outlines creating the mapping and is used for the background information:
Activity Number Explanation
2 Attaching a source CSV file, that is the data that will be imported.
This section looks at the call to start the media generation processes:
Activity Explanation
Number
9 A call is made to generate the import ImpEx script and involves several sequential calls that
result in the ImpEx file used later in the process to import the generated CSV data. See Import
Impex Header Generation in the SAP Commerce Import Cockpit for more details.
10 A call is made to generate the import data files and involves several sequential calls that result
in, depending on the related SAP Commerce item type that the data represents, one or more
CSV files. That is, if you are importing Product data with prices, then two CSV files are
generated. One for product and the other for the price information. See Import Data File
Generation in the Import Cockpit for more details.
Activity Explanation
Number
11 An Importer, making use of our cron job with related media and
an ImpExImportReader is created.
12 The doImport method is called, which extracts the media from out job and imports the
generated CSV files using the generated ImpEx file.
ImpExMediaGenerationService
o ImportCockpitMappingService
o HeaderGeneratorOperation
o FileGeneratorOperation
ImpExTransformationService
o ImportCockpitMediaService
o DataGeneratorOperation
o FileGeneratorOperation
Context
Once the import job and a mapping object for that job are created, classification attributes mapping
becomes similar to the normal attributes mapping. There is, however, an exception: the classification
attributes are not listed in the target attribute list on the right side. Instead, they need to be selected
from within a dialog box that pops up. For more information, see the following:
Synchronization
Between SAP
Commerce Installations
The SAP Commerce-to-SAP Commerce synchronization consists in transferring data from a source
system to a target system, for example from one SAP Commerce installation to another, using
the Data Hub in between. SAP Commerce-to-SAP Commercesynchronization is possible thanks to
the y2ysync framework.
The y2ysync data flow architecture ensures high performance and supports scalability of SAP
Commerce.
y2ysync Framework
The y2ysync framework shipped with Commerce Platform allows you to develop your own data
synchronization solutions.
Configuration and
Synchronization
y2ysync synchronization heavily relies on the y2ysync and the Data Hub extensions. For that
reason, it requires proper configuration, both on the Commerce Platform side and the Data Hub side.
Item Description
Transformations the data model should undergo during the composition phase
You can generate your Data Hub configuration using the Data Hub Configuration Generator. It
generates an xml based on Y2YStreamConfigurationContainer. Use the
same Data Hub Configuration Generator to upload your configuration to the running Data Hub.
Synchronization
The source system detects items that you created, changed, or deleted since the last
synchronization. It creates files that contain information about the changes - medias. It finally sends
a request to Data Hub to start executing its part of the synchronization process.
Data Hub acts as a middleman between the source and the target systems. Data Hubimports
data from the source system, composes it and publishes to the target system. Depending on
configuration, you can trigger these steps automatically or manually. For more information,
see Target System Definition and Auto Passthrough.
You can make Data Hub import the y2ysync media files again if the previous attempt failed. To
do it, click the Resend to datahub action button available in Backoffice.
the source Platform url
CautionWhen using the automatic passthrough mode, before the first synchronization make
sure that you have set your target system on
theY2YStreamConfigurationContainer object. The first synchronization request
creates the necessary feed and pool, and they must have a specific publication strategy set with the
target system names. Therefore, if the target system was invalid or empty, automatic passthrough
won't take place.
Change Consumption
Modes
The process of marking items as synchronized (consumed) is called change consumption. There
are two change consumption modes that you can use to synchronize data, the synchronous mode
and the asynchronous mode. The asynchronous mode may contribute to improving the overall
performance of the synchronization process
Data Hub uses REST to notify the source Platform that it has completed downloading all medias with
changed items created in the change detection process. The endpoint responsible for receiving this
notification call is exposed
byde.hybris.y2ysync.controller.ConsumeChangesController at
url /changes/
In the synchronous mode Data Hub doesn't proceed with data composition or publication until all
items are consumed. The notification call to the change consumption endpoint is blocking, and the
response is rendered only after all items are consumed. Such behavior may not be desired as it
could slow down the whole synchronization process
The asynchronous mode uses the task engine and returns the acknowledgment immediately. It
allows Data Hub to resume processing data in no time. As a result the composition and the
publication in Data Hub take place simultaneously with change consumption in the source Platform.
The asynchronous processing logic is defined by
the consumeY2YChangesTaskRunner Spring bean.
The change consumption process works in batches and by default uses multiple threads. You can
configure it with the following properties: