SAP File Processing For SAP HANA en
SAP File Processing For SAP HANA en
5 HTTP Destinations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.1 Creating HTTP Destinations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Deleting HTTP Destinations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8 Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8.1 Roles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
This guide is a detailed description of SAP File Processing including the concepts, programming APIs and
operation. System architects will be able to integrate SAP File Processing in business processes and system
landscapes. Administrators will learn how to set up and operate File Processing and developers will find
detailed information on programming APIs.
SAP File Processing is a component of SAP HANA that provides structured information from unstructured files.
The rich set of HTTP APIs enables application programmers to integrate SAP File Processing features in client
applications.
Related Information
SAP HANA Developer Guide for XS Advanced Model (SAP Web IDE)
SAP File Processing exposes an HTTP REST API that is consumed by an application. HTTP is the only channel
to access the API.
SAP File Processing has an API component to communicate with the application. The API uses the SAP HANA
database to store file processing information. The worker component processes the files and uses the SAP
HANA database to store results and uses HTTP file servers to acquire the file content.
Components of SAP File Processing are scaled up and down to respond to varying demands. The API
component is responsible to fulfill HTTP requests. Additional API instances will serve increasing API calls.
One API component is started by default (the master application).
To increase the file processing speed, increase the number of worker components. Three workers start by
default. The number of workers correlates with the number of parallel processed files.
SAP File Processing is able to process large volumes of files. The files are processed asynchronously. A
client application sends URLs to SAP File Processing for processing. The API confirms the request. SAP File
Processing creates a SAP File Processing task for every file URL. The task describes the processing status of
the file and is used to provide results or error messages in case of errors.
The client application harvests the results of the processed tasks. The application uses the harvested data
within the application.
Main Terminology
Term Description
E-Mails, office files, PDFs and many more types can be proc
essed. A list of supported file types is provided as a service
operation (GET api/v4/mimetypes).
Job The job describes how the files have to be processed. The
job is used to separate the processing from the client's per
spective.
Task The task describes the current processing state of the file. A
task has exactly one file.
API / Master The API or the master application is the HTTP server inter
face of SAP File Processing.
Prerequisites
For more information, refer to the SAP HANA Server Installation and Update Guide.
Procedure
1. Use the SAP HANA Database Lifecycle Manager hdblcm to deploy SAP File Processing. The app will be
deployed in the SAP space.
Refer to the SAP HANA Lifecycle Management chapter in the SAP HANA Administration Guide.
Note
As the data of SAP File Processing is separated by deployed instances, it is necessary to provide a
separate instance for every tenant. Deploy separate instances of SAP File Processing for every tenant
you are using. Example: If SAP File Processing is connected to an SAP Business Suite system, every
client requires a separate instance of SAP File Processing.
2. Verify that SAP File Processing is deployed successfully and is up and running:
xs a
Sample Code
Four apps are listed. The master, web and worker apps should be started. The db app already stopped, as
this app set up the database. The SAP File Processing application endpoint URL is the fileprocessor-
web url (https://server:51056 in this example).
SAP File Processing is now installed successfully. However, a user cannot access the application yet.
3. Determine the application name.
The application name is required when the role collection is edited. To determine the application name,
enter the command xs env fileprocessor-web.
Related Information
SAP HANA Developer Guide for XS Advanced Model (SAP Web IDE)
Administration Information Map
SAP HANA Installation and Update Overview
Procedure
In the output, you find the xsa-admin URL under Registered service URLs. Use this URL to access
the XS Advanced Admin web UI as XSA admin user.
2. Logon to the XS Advanced admin web UI with your XSA admin user.
3. Open the entry Application Role Builder and perform the following steps to add a new role collection.
1. On the left, select the entry for Role Collection and select + to add and create a new role collection.
2. Select the role collection, which you just created.
3. Select the Roles tab and add an Application Role.
4. Select Fileprocessor Application (use the application name determined by the command xs env
fileprocessor-web).
Context
After installation and setup of SAP File Processing, you should verify the installation.
Procedure
Note
You have to change the initial password of the user when you logon for the first time. Otherwise you will
receive a "403 not authorized" response.
To change the initial password use the endpoint URL which will redirect you to change the initial
password.
Note
You can use a browser plug-in for a convenient display of the JSON format.
SAP File Processing provides an API to developers. The application smoke test ensures that the basic API is
working as expected.
Prerequisites
Context
SAP File Processing comes with a toolset (swagger and swagger UI) that allows to issue API calls (HTTP REST)
against the SAP File Processing API interactively.
Procedure
Related Information
Prerequisites
We recommend to use a browser with a JSON viewer plug-in, as the service response is in JSON.
Procedure
Use the endpoint URL + /api/v4 to access the API root. The endpoint URL is the URL of the SAP File
Processing web application.
2. Start the swagger UI.
SAP File Processing comes with a REST API test environment - the swagger UI. The swagger UI is used to
issue API calls against SAP File Processing from the browser. Use the swaggerUI URL from api version v4.
Example: https://myserver:5000/swagger-ui.html
3. Expand the service operations of the onPremise section.
The application checks are minimal tests that check database access, job creation and successful file
processing.
Context
Use the operation GET v4/jobs to determine, if the SAP HANA database is available.
Procedure
Click the service operation GET v4/jobs and then on the button Try it out.
Results
You get the response code 200 and the following response body:
Sample Code
{
"links": {
"first": "https://server:51064/api/v4/jobs?top=10&skip=0",
"status": "https://server:51064/api/v4/jobs/status"
},
"totalJobCount": 0,
"jobs": []
}
Procedure
Sample Code
{
"id": "J001",
The expected result: response code 201 and the following response body:
Sample Code
{
"links": {
"job": "https://server:51064/api/v4/jobs/J001"
}
}
2. To verify the new job (jobId = J001) use the operation GET v4/jobs/{jobid}.
Sample Code
{
"links": {
"tasks": "https://server:51064/api/v4/jobs/J001/tasks",
"status": "https://server:51064/api/v4/jobs/J001/status",
"jobs": "https://server:51064/api/v4/jobs"
},
"id": "J001", "description": "My first file processing job",
"isPaused": false,
"active": true,
"maxWorkers": 1,
"changedAt": "2016-09-19T16:00:45.443Z",
"changedBy": "BORSTENSON",
"createdTs": "2016-09-19T16:00:45.000Z",
"taskList": "J001",
"rank": 0,
"processTemplate": {
"id": "defaultOnPremise",
"description": "The standard process template for file processing with
extraction core",
"steps": [
{
"id": "FileLoader",
"description": "Loads the file for processing using HTTP",
"nextStep": "BinaryToText"
},
{
"id": "BinaryToText",
"description": "Extracts the plaintext and text analysis from the
file"
}
]
}
}
Next Steps
The job was created and the file processing can be tested.
Prerequisites
At least one task had been created to test the file processing.
Context
Note
Note that the maximum file size for processing is 500 Megabyte (MB).
Procedure
1. Open a new browser window and use the following URL to display the current status of the job: api/v4/
jobs/J001/status.
2. Choose Start Auto Refresh.
3. Switch to the Swagger UI browser window.
4. Use the operation POST v4/jobs/{jobId}/tasks to create a new task (jobId = J001).
5. Provide the following body:
Sample Code
[
{
"url": "https://someserver/files/CustomerContract_99826.pdf"
}
]
You can also use other resources on the internet if this URL is not available in your network. However if the
URL cannot be accessed, the task reflects this in an error message.
Expected result: response code 207 and the following response body (with a different task ID):
Sample Code
{
"results": [
{
"id": 3042400378264,
"status": 201,
"links": {
"task":
The status of the task will change from Running into Success.
6. Verify the file processing.
Sample Code
{
"id": 3042400378264,
"status": "BinaryToTextSuccess",
"statusCategory": "success",
"links": {
"self": "https://server:51064/api/v4/jobs/J001/tasks/3042400378264",
"plainText": "https://server:51064/api/v4/jobs/J001/tasks/
3042400378264/plaintext",
"thumbnail": "https://server:51064/api/v4/jobs/J001/tasks/
3042400378264/thumbnail",
"icon": "https://server:51064/api/v4/jobs/J001/tasks/3042400378264/
icon",
"textAnalysis": "https://server:51064/api/v4/jobs/J001/tasks/
3042400378264/textanalysis",
"SAPUniView": "https://server:51064/api/v4/jobs/J001/tasks/
3042400378264/suv",
"tasks": "https://server:51064/api/v4/jobs/J001/tasks",
"file": "http://fileloader-data.mo.sap.corp:8080/docs/File%20Search/
Technology/OMG/formal-08-04-08.pdf"
},
"error": {
"code": "TEXT ANALYSIS: OK"
},
"mimeType": "application/pdf",
"fileMD5": "f7ea8e682e0cfc031b23ab16ff14a315",
"language": "en", "sizeFile": 1114572,
"sizePlainText": 198334,
"startTimeStamp": "2016-09-19T16:16:52.902Z",
"attributes": {
"language": "en",
"contentType": "application/pdf",
"extension": "pdf"
},
"subTasksCount": 0
}
7. Use the operation GET v4/jobs/{jobId}/tasks/{taskId}/plaintext to get the plaintext of the file:
Expected result: response code 200 and the plain text of the file.
SAP File Processing uses GET requests to HTTP URLs to load the file content.
HTTP servers may not be available from the server network configuration. SAP File Processing supports the
configuration of HTTP destinations to enable the use of a proxy server and basic authentication.
SAP File Processing will use the destinations for URLs that start with the destination URL.
Related Information
Procedure
Procedure
Context
SAP File Processing comes with a test environment for the API. As an administrator you can invoke the SAP File
Processing API UI.
Refer to the swagger documentation as the UI is based on the Swagger UI. See http://swagger.io for
details.
Procedure
Click the documentation link in the SAP File Processing Administration UI or open the following HTML page
under the system URL: /swagger-ui.html.
The master application and the web application can be scaled depending on your scenario.
Scale the master application to handle API load. One master application instance is deployed by default.
Scale the worker application to scale the processing speed of the files. Three workers are deployed by default.
There can be a maximum of 10 worker apps.
The system administrator can use the XSA command line client to scale the applications using the scale
command.
Sample Code
xs scale fileprocessor-master –i 2
It uses UAA for authentication and HDI container for data separation.
An XSA administrator creates role collections, users and role assignments in the XSA Administration.
Related Information
8.1 Roles
API The API role should be applied to the technical user that are
used by the client application. This user role provides the
authorization to invoke all exposed HTTP service operations
of the API.
Auditor The Auditor role should be applied to a user with read only
API access. This user role provides read only access to the
exposed HTTP service operations of the API.
The following documentation lists all supported service operations and the used models (data structures).
In addition to the documented HTTP status codes, the API client should handle the following standard HTTP
codes:
• DELETE /v4/destinations/{destinationId}
Deletes a single HTTP destination.
• GET /v4/destinations/{destinationId}
Returns a single HTTP destination.
• GET /v4/destinations/{destinationId}/test
A GET request is issued against the destination url to test the accessibility of the server, the authentication
data and the proxy data.
• GET /v4/destinations
Get all defined HTTP destinations to access HTTP resources using basic authentication with proxy server
support.
• POST /v4/destinations
Creates a new HTTP destination.
• GET /v4/destinations/test
Tests if an URL can be accessed by the system. The system may use a destination if one is defined for the
URL.
• GET /v4
The root service returns general information of the SAP File Processing service. General meta information
of the SAP File Processing REST service. The service provides version information of the component and
supported API version. The service is used by clients to determine general version dependencies.
For information about the capabilities available for your license and installation scenario, refer to the Feature
Scope Description for SAP HANA.
Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:
• Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:
• The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.
• SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.
• Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering an SAP-hosted Web site. By using
such links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.
Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.
Bias-Free Language
SAP supports a culture of diversity and inclusion. Whenever possible, we use unbiased language in our documentation to refer to people of all cultures, ethnicities,
genders, and abilities.
SAP and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP
SE (or an SAP affiliate company) in Germany and other countries. All
other product and service names mentioned are the trademarks of their
respective companies.