0% found this document useful (0 votes)
19 views63 pages

Data Flow Tools & Properties v1.6

The document outlines various tools and configurations for data input, output, unification, quality, transformation, and mathematical operations. Each tool category includes specific functionalities such as reading from files, databases, APIs, and cloud storage, as well as writing to these sources. Additionally, it details the input/output ports and validation processes associated with each tool, providing a comprehensive overview of the data processing capabilities available in version 1.6.

Uploaded by

prashant.kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views63 pages

Data Flow Tools & Properties v1.6

The document outlines various tools and configurations for data input, output, unification, quality, transformation, and mathematical operations. Each tool category includes specific functionalities such as reading from files, databases, APIs, and cloud storage, as well as writing to these sources. Additionally, it details the input/output ports and validation processes associated with each tool, providing a comprehensive overview of the data processing capabilities available in version 1.6.

Uploaded by

prashant.kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Tool Category : Input Version 1.

Sub-category
• Read from file
• Read from db
• Read from API
• Read from S3
• Read from ADLS
Tool : Read from File

Configuration Panel Preview to show top 100 records


Configuration Preview Properties

File allow to list the files available over file storage

File Name Auto-populate

File Format Auto-populate

“ In case of excel, pop-up to ask the name of the sheet “

Delimiter Auto-populate

In case of excel not required

Ignore delimiter in Dropdown (quotes; single quote; auto ; none )


In case of excel not required
Code Page Auto-populate
Tool : Input/output port Tool : Validation

• File

Ports
-> No Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message No of record reads


Input Disable
Output Preview of records moved from output port
Tool : Read from DB

Configuration Panel Preview to show top 100 records


Configuration Preview Properties

Database Type Drop down (Oracle, MSSQL, PostgreSQL, MySQL,MariaDB,MongoDB,Neo4j,Cassandra)

Database Name Drop down ( drop down to list the alias of the database name )

Table Name Drop down list the select the database table
Tool : Input/output port Tool : Validation

• All

Ports
-> No Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message No of record reads


Input Disable
Output Preview of records moved from output port
Tool : Read from API

Configuration Panel Preview to show top 100 records


Configuration Preview Properties

API Source Drop down ( list of API alias defined in the Data Source)
Tool : Input/output port Tool : Validation

• All

Ports
-> No Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message No of record reads


Input Disable
Output Preview of records moved from output port
Tool : S3

Configuration Panel Preview to show top 100 records


Configuration Preview Properties

Name Drop down ( list of s3 connection defined in the data source )

Browse File Browse file path button ( allow to list the files available over file storage )

File Name Non editable field, automatically get the uploaded file name

File Format Dropdown (1. excel 2. csv 3. tsv 4. psv 5. json 6. parquet 7. xml)
Automatically get the file extension but allow to change

“ In case of excel, pop-up to ask the name of the sheet “


Delimiter Dropdown ( ; | , : )
Automatically get the delimiter based on file format but allow to change
In case of excel not required

Ignore delimiter in Dropdown (quotes; single quote; auto ; none )


In case of excel not required
Code Page Drop down
(Unicode UTF-8, Unicode UTF-16, ANSI - Central Europe, ANSI - Latin 1, ANSI – Greek, ANSI –
Arabic, Simplified Chinese)
Tool : Input/output port Tool : Validation

• S3 connection Name
• Browse File
• File Name
• File format
Ports
• Delimiter
-> No Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message No of record reads


Input Disable
Output Preview of records moved from output port
Tool : ADLS

Configuration Panel Preview to show top 100 records


Configuration Preview Properties

Name Drop down ( list of ADLS connection defined in the data source )

Browse File Browse file path button ( allow to list the files available over file storage )

File Name Non editable field, automatically get the uploaded file name

File Format Dropdown (1. excel 2. csv 3. tsv 4. psv 5. json 6. parquet 7. xml)
Automatically get the file extension but allow to change

“ In case of excel, pop-up to ask the name of the sheet “


Delimiter Dropdown ( ; | , : )
Automatically get the delimiter based on file format but allow to change
In case of excel not required

Ignore delimiter in Dropdown (quotes; single quote; auto ; none )


In case of excel not required
Code Page Drop down
(Unicode UTF-8, Unicode UTF-16, ANSI - Central Europe, ANSI - Latin 1, ANSI – Greek, ANSI –
Arabic, Simplified Chinese)
Tool : Input/output port Tool : Validation

• Adls connection name


• Browse File
• File Name
• File format
Ports
• Delimiter
-> No Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message No of record reads


Input Disable
Output Preview of records moved from output port
Tool Category : Output

Sub-category
• Write to file
• Write to db
• Write to S3
• Write to ADLS
Tool : Write to File

Configuration Panel
Configuration Properties

Browse Path Browse file path button ( allow to save the file over file storage ) and define the format

File Name Editable field, although automatically get the name provided while saving the file name

File format Editable drop down (1. excel 2. csv 3. tsv 4. psv 5. json 6. parquet 7. xml)
Tool : Input/output port Tool : Validation

• Browse File
• File Name
• File format
Ports
-> 1 Input port
-> no output Port

Display only after execution of


workflow

Output/ Result

Message No of record writes


Input Preview of records reads at input port
Output disable
Tool : Write to DB

Configuration Panel
Configuration Properties

Database Type Drop down (Oracle, MSSQL, PostgreSQL, MySQL,MariaDB,MongoDB,Neo4j,Cassandra)

Database Name Drop down ( drop down of list of the database names )

Table Name Drop down list the select the database table + New Table

Pop-up window allow to define


the name of the new table and
allow to map the fields.

Fields Name automatically get


from the input connection
Tool : Input/output port Tool : Validation

• Browse File
• File Name
• File format
Ports
-> 1 Input port
-> no output Port

Display only after execution of


workflow

Output/ Result

Message No of record writes


Input Preview of records reads at input port
Output disable
Tool : Write to S3

Configuration Panel
Configuration Properties

S3 Connection Drop down –list of S3 connection, defined in data source

Browse Path Browse file path button ( allow to save the file over file storage ) and define the format

File Name Editable field, although automatically get the name provided while saving the file name

File format Editable drop down (1. excel 2. csv 3. tsv 4. psv 5. json 6. parquet 7. xml)
Tool : Input/output port Tool : Validation

• Browse File
• File Name
• File format
Ports
-> 1 Input port
-> no output Port

Display only after execution of


workflow

Output/ Result

Message No of record writes


Input Preview of records reads at input port
Output disable
Tool : Write to ADLS

Configuration Panel
Configuration Properties

ADLS Connection Drop down –list of ADLS connection, defined in data source

Browse Path Browse file path button ( allow to save the file over file storage ) and define the format

File Name Editable field, although automatically get the name provided while saving the file name

File format Editable drop down (1. excel 2. csv 3. tsv 4. psv 5. json 6. parquet 7. xml)
Tool : Input/output port Tool : Validation

• Browse File
• File Name
• File format
Ports
-> 1 Input port
-> no output Port

Display only after execution of


workflow

Output/ Result

Message No of record writes


Input Preview of records reads at input port
Output disable
Tool Category : Unify

Sub-category
• Join
• Set Operator
• Record ID
• Sort
• Filter rows
• Filter Attribute
Tool : Join

Configuration Panel
Configuration Properties

Join Type Drop down (Inner Join ; Left Outer Join ; Right Outer Join ; Full outer join )

Port 1 Drop down display the list of 1st source attributes

Operator Display equal operator

Port 2 Drop down display the list of 2nd source attributes


Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 2 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input 1 Preview of records reads at input port 1
Input 2 Preview of records reads at input port 2
Output Preview of records results after join
Tool : Set operator

Configuration Panel
Configuration Properties

Operator Type Display list of set operators ( Union, Union all, Intersect, Minus )

Config Drop down to display list ( Auto config by Name ; Auto Config by Position; Manual config )

No panel required for auto config but for manual config below is the reference
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 2 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input 1 Preview of records reads at input port 1
Input 2 Preview of records reads at input port 2
Output Preview of results
Tool : Record ID

Configuration Panel
Configuration Properties

Record ID Field Editable Text field with default value “Record ID”

Starting Value Numeric drop down start from 0 to n ; Default “1”

Position Drop down display “First Column” and “Last column” ; Default first column
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool : Sort

Configuration Panel
Configuration Properties

Fields List of source fields name

Order Drop down display “Ascending or Descending” order

Allow to change the sequence of defining the fields and +/- any field
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool : Basic Filter ( rows)

Configuration Panel
Configuration Properties

Field Name < list of source fields>


Operator Is null, is not null, is empty, is not empty, =,<,>,<=,>=,contains,does
not contain
Value < editable text >

And/or + ( Add )
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
->1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool : Filter Attribute

Configuration Panel
Configuration Properties

Metadata preview with “checkbox” List metadata preview of the source connection with “checkbox”
Field Name ; Data Type; Size; allowing user to select and un-select the fields to carry to the next
tool
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool Category : Quality

Sub-category
• Advance Match
• Dedupe
Tool : Advance Match

Configuration Panel
Configuration Properties

Match Algorithm Drop down ( Phonetic ; Soundex ; Metaphone

Match score Text box

Join Type Drop down (Inner Join ; Left Outer Join ; Right Outer Join ; Full outer join )

Port 1 Drop down display the list of 1st source attributes

Operator Display equal operator

Port 2 Drop down display the list of 2nd source attributes


Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 2 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool : Dedupe

Configuration Panel
Configuration Properties

Select Fields Drop down ( multi select ) - Default “All”

Remove duplicate row based on the field selection


Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool Category : Transform

Sub-category
• Filter
• Regex
• Replace
Tool : Replace

Configuration Panel
Configuration Properties

Select Field Drop down source field - multi select


Find Text field
Replace Text field
Option Single selection option list
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool : Regex

Configuration Panel
Configuration Properties

Select Field Drop down source field - single selection


Expression Text field [^0-9*]
Output field name Text field
Match index Integer value
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool Category : Maths

Sub-category
• Addition
• Subtraction
• Division
• Multiplication
Tool : Addition

Configuration Panel
Configuration Properties

Addition with Drop down list (Field , Value)


Select field If selected “field” from above drop down
List all the fields received from previous port/node
Select Value If selected “Value” from above drop down
Allow to insert the value
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool : Subtraction

Configuration Panel
Configuration Properties

Subtraction with Drop down list (Field , Value)


Select field If selected “field” from above drop down
List all the fields received from previous port/node
Select Value If selected “Value” from above drop down
Allow to insert the value
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool : Multiplication

Configuration Panel
Configuration Properties

Multiplication with Drop down list (Field , Value)


Select field If selected “field” from above drop down
List all the fields received from previous port/node
Select Value If selected “Value” from above drop down
Allow to insert the value
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool : Division

Configuration Panel
Configuration Properties

Division with Drop down list (Field , Value)


Select field If selected “field” from above drop down
List all the fields received from previous port/node
Select Value If selected “Value” from above drop down
Allow to insert the value
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool Category : Group Function

Sub-category
• AVG
• COUNT
• MAX
• MIN
• STDDEV
• SUM
• VARIANCE
• MEAN
Tool : Same for all other group functions

Configuration Panel
Configuration Properties

Select group fields Drop down list with multiple select option
List all the fields received from previous port/node
Apply on Drop down list all the fields received from previous port/node
Reference
Tool : Input/output port Tool : Validation

• Apply on

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool Category : Script

Sub-category
• Python
• PySpark
• SQL
Tools:

Sub-category
• Add attribute
• Condition
Tool : Add Attribute

Configuration Panel
Configuration Properties

Field Name Text box (Max size 30)


Data Type Drop down list all available datatype ( Generic data type )
Set Default Value Radio button ( Yes / No ) Default “No”
Value If “Set Default Value” is “Yes” then ask for value - Text box
Reference
Tool : Input/output port Tool : Validation

• Apply on

Ports
-> 1 Input port
-> 1 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool : Condition

Configuration Panel
Configuration Properties

Select Port Drop down ( Port 1 & Port 2 )

Field Name < list of source fields>


Operator Is null, is not null, is empty, is not empty, =,<,>,<=,>=,contains,does
Port 1
Config not contain
Value < editable text >
And/or + ( Add )

Port 2
Config Default Value ( Radio button auto selected )
Reference
Tool : Input/output port Tool : Validation

• ALL

Ports
-> 1 Input port
-> 2 output Port

Display only after execution of


workflow

Output/ Result

Message Success / failure message with the processing time


Input Preview of records reads at input port
Output Preview of results
Tool Category : Analytical Function ( Low
Priority)
Sub-category
• FIRST VALUE
• LAST_VALUE
• NTH_VALUE
• LEAD
• LAG
• RANK
• DENSE_RANK
• CUME_DIST
• PERCENT_RANK
• PERCENTILE_CONT
• PERCENTILE_DISC

You might also like