0% found this document useful (0 votes)
537 views59 pages

Snaplogic Advanced Training

The document discusses Snaplogic's integration architecture and data plane concepts. It describes how Snaplogic pipelines can run on customer data centers or in the cloud, and how groundplex and cloudplex nodes are used. It also covers enhanced account encryption, modularizing pipelines, and reusing pipelines.

Uploaded by

vikas gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
537 views59 pages

Snaplogic Advanced Training

The document discusses Snaplogic's integration architecture and data plane concepts. It describes how Snaplogic pipelines can run on customer data centers or in the cloud, and how groundplex and cloudplex nodes are used. It also covers enhanced account encryption, modularizing pipelines, and reusing pipelines.

Uploaded by

vikas gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

SNAPLOGIC ELASTIC

INTEGRATION
<Instructor Name>, <Team Name>

ENTERPRISE CLOUD INTEGRATION


Enhanced Account
Snaplogic Encryption
Architecture
Snaplogic Architecture
Control Plane
• Runs on the AWS cloud as a multi-tenant service
• Stores pipeline definitions and metadata information
• Handles triggered and scheduled pipeline executions
Data Plane
• Runs in the SnapLogic cloud or on customer data center
• Single tenant, dedicated instance for the customer
• Executes the pipelines
Client Apps
• Designer : Create and manage pipelines
• Manager : Manage assets
• Dashboard : Monitor pipelines and data plane
Snaps
• Public, Premium and Private
• Endpoint connectivity
Snaplogic Architecture (Contd …)

Customer A Customer B
Client Client

Customer A Cloudplex
SnapLogic Control Plane
Node 1 Node2

Customer A Groundplex Customer B Groundplex Dev Customer B Groundplex Prod

Node 1 Node2 Node 1 Node2 Node 1 Node2


Data Plane: Concepts
Cloudplexes
● Runs in the Snaplogic AWS cloud
● Managed by Snaplogic, no SSH access for customers
● Directly accessible from control plane
● Can access only cloud endpoints

Groundplexes
● Runs outside Snaplogic cloud, either in customer data
center or on cloud instances managed by customer
● Installed and managed by customer (usually with Snaplogic
support)
● Only outbound access available, not accessible directly from
control plane (assumed to be behind firewall)
● Can access on-premise service endpoints and cloud
endpoints
Data Plane

Components
Monitor
– Starts the JCC process
– Monitors JCC health, restarts if required

JCC (Java Component Container)


– Prepares and executes pipelines
– Suggest/validate pipelines

Groundplex
• Requires only Outbound access, no inbound access
• Creates Outbound Websocket connections to the Cloud Proxy
over HTTPS (wss://)
• The control plane uses websocket connections to send inbound
requests to Groundplex
• Makes regular HTTPS requests to control plane
• Can be configured to go through HTTP proxy
Install GroundPlex

Steps:
• Get the groundplex properties from Snaplogic Support
• Download the respective RPM from
http://doc.snaplogic.com/snaplex-downloads
• Extract the RPM to a folder (ex: c:/opt/snaplogic)
• Update keys.properties and global.properties
• Start JCC (ex: “./bin/jcc.bat start”)

Other commands:
./bin/jcc.bat stop (stop JCC)
./bin/jcc.bat status (JCC status)
EnhancedUltra
Account Encryption
Tasks
Features

▪Always on and consuming in nature


▪Need one input and one output view
▪Low latency processing
▪Load balancing
▪High availability
▪Reliability
Architecture
Creating an ultra task
Ultra tasks limitations

▪One document at a time


▪Aggregate, sort snaps not supported
▪Pipeline parameters cannot be changed
at runtime
▪Batching should be disabled or batch size
set to 1
Enhanced Account
SnapLogic Encryption
Public API
SnapLogic Public API (Overview)

•Features
▪Provides access to pipeline runtime info and ability to
manage users/groups programmatically
•Pipeline Monitoring API
▪Ability to retrieve all pipeline runtime for organization
▪Query for specific executed pipeline for organization
▪Retrieve pipeline log for organization
▪User and Groups API
▪Programmatically access user/group info. Any user can
access their own info, but only admins can modify
SnapLogic Public API (Pipeline Monitoring)

•Pipeline Monitoring API – Org All Pipelines


▪Sample Query:
▪GET https://elastic.snaplogic.com/api/1/rest/public/runtime/orgname

•Pipeline Monitoring API – Org Specific Pipeline


▪Sample Query:
▪GET
http://elastic.snaplogic.com/api/1/rest/public/runtime/<org>/<ruuid>?level=su
mmary

•Pipeline Monitoring API – Log


▪Sample Query:
▪GET
http://elastic.snaplogic.com/api/1/rest/public/log/orgname?log_type=slserver
SnapLogic Public API (User and Groups API)

•User API Supports Operations:


–Get, Post, Put, Delete (acts on users)
–Get (users in organization)
▪Sample Query:
▪GET
https://elastic.snaplogic.com/api/1/rest/public/groups/organization/memb
ers
•Groups API Supports Operations
–Get, Put, Delete (groups)
–Patch (add/remove user in group)
–Get (retrieve organization’s groups)
SnapLogic API Hand’s-On

•Copy the Shared/Menu.csv to your project folder


•Create a child pipeline which reads the Menu.csv file
and does the following:
–Saves the output as JSON file with the pipeline ruuid
as part of file-name.
•Create a child pipeline which scans the project folder for
output file with pipeline ruuid.
–Call REST get to retrieve executed pipeline
information
–Output information to new file
•Create Master pipeline which calls both children
pipelines as sub-pipelines.
Enhanced Account
Enhanced Account Encryption
Encryption
Enhanced Account Encryption

▪Org level subscription feature and only supported on


Google chrome v37 and above
▪Encrypts account credentials with public/private key
▪Only supported on groundplex due to private key
Enhanced Account Encryption

▪Verify the same private key has been added to all


groundplexes
▪Open Manager as admin
▪Open Settings and click Configure Encryption
▪Select Enhanced Encryption
Enhanced Account Encryption

•After enabling Enhanced Encryption the values in


encrypted fields will not longer be visible, but can be
updated
▪The fields that are encrypted depend on the Enhanced
Encryption sensitivity.
▪Encrypted fields will look like the following

▪Reverting back to lower level of encrypting does not


automatically decrypt existing accounts
EnhancedJSONPath
Account Encryption
Using JSONPath
A JSONPath expression enables you
to specify the parts of a JSON
document that are to be operated
on by a Snap.

23
JSONPath Syntax
• $ — The root of the document.
• .(dot) — Select a field in the parent
object.
Example, $.parent.child
• ?(<expr>) — Filters records based on the
specified expression.
Example, $.children[?(value.age > 18)]
• [] — Child operator or array index.
Examples, $.parent.[<child with spaces>]
$.children.[1]
$.children.[-1]

24
JSONPath Extended Syntax
• .sort_asc(<expr>) — Sorts an array in
ascending order based on the specified
expression.
Example, $.children.sort_asc(value.age)
• .group_by(<expr>) — Groups values
based on the result of the specified
expression and returns an object with
fields for each expression result and a
list of the objects that matched the
value.
Example,
$.children.group_by(value.gender)

25
Using JSONPath
Link to tool that can come handy:
http://www.jsonquerytool.com/#/JS
ONPath

Sample pipeline demo

26
Enhanced Account
Modularizing PipelinesEncryption
and Patterns
Modularizing Pipelines
• You can make Pipelines modular by
distributing Pipeline functionality
across multiple Pipelines.
• Modularizing Pipelines helps in load
balancing.
• There are four ways of modularizing
Pipelines:
– The Task Execute Snap
– The REST Get or REST Post Snaps
– The ForEach Snap
– Pipeline Execute Snap

28
The Task Execute Snap
• It enables you to execute a
triggered task.
• Features:
– Enables you to pass parameters
– Enables you to use batching
– Enables load balancing

29
Using the Task Execute Snap

30
The REST Get and REST Post Snaps
• REST Get enables you to execute
the HTTP Get Method.
• REST Post enables you to execute
the HTTP Post Method.
• Features:
– Enables you to pass parameters
– Enables load balancing
• REST Post also enables you to use
batching.

31
Using the REST Get Snap

32
The ForEach Snap
• It invokes a Pipeline for each of
the incoming documents
• Features:
– Enables you to pass parameters
– Enables batching
– Enables load balancing
– Supports two execution modes—SYNCHRONOUS and
FIRE_AND_FORGET
– Enables you to restrict maximum instances
– Enables automatic retries

33
Using the ForEach Snap

34
Reusing Pipelines
• You can make the Pipeline
development process more
efficient by reusing Pipelines.
• There are two ways of reusing
Pipelines:
– Creating Nested Pipelines
– Creating Patterns

35
Nested Pipelines
• You can add a Pipeline as a part of
another Pipeline. These Pipelines
are also known as Nested Pipeline.
• Advantages:
– Low overhead
– Pass data directly using input views
• Features:
– Guaranteed and Best effort delivery
modes
– Enables automatic retries

36
Creating Nested Pipelines (1 of 2)

37
Creating Nested Pipelines (2 of 2)

38
Viewing Child Pipelines

39
Patterns
• A pattern is a reusable Pipeline.
• To create a Pattern:
– Ensure you have access to the
Pattern project where you want to
create a pattern.
– Click the Add a Pipeline button.
– Add Snaps to it. (Do not configure
properties which change across
Pipelines)
– Save the Pipeline.

40
Using Patterns
• If you do not have full access to a
Pattern. Pipelines can be created using
Patterns through a step-by-step wizard.
In most cases, you will need to supply
account information for the specified
platform and the data or files to use in
the process.
• If you have full access to a Pattern, the
Pattern opens in the editor, you can
create a copy of the patterns and use the
copy. If you make any changes you make
to a Pattern, they will be saved in the
Pattern.

41
Example: Using Patterns without Full
Access (1 of 2)

42
Example: Using Patterns without Full
Access (2 of 2)

43
Best Practices
• Modularize your Pipelines.
• Ensure that Snaps are supported
for the Subscription Features.
• Organize Pipelines using Projects.

44
Knowledge Check – True Or False
The number of input and output
views in a child Pipelines can be
modified from the Views tab of the
“Child Pipeline Snap”.
• True
• False

45
Knowledge Check – True Or False
You must configure all Properties of
Snaps in a Pattern before you save it
as a Pattern project.
• True
• False

46
Enhanced
ChangeAccount Encryption
Data Capture
Change Data Capture

CDC is a pattern used to determine data changed from


last time.
Sometime it’s refers to traditional ways (log scraping
etc)

With new software such as SFDC, Workday or other


SAAS applications the traditional ways has limited or
no use.

Snaplogic approach is to enable Query based CDC,


that assumes the existence of a 'last updated'
timestamp field in the source data that we can
compare against.
CDC Samples

Synchronize Salesforce to SQL Server


• Maintains the last read timestamp in a file on SLDB and reads it
first.
• Makes a copy of the timestamp and passes into SFDC account
read
• Top path, the pipeline then captures the current timestamp,
formats and writes back to the timestamp file as the new most
recent CDC time.
• Bottom path, the timestamp is compared with a last_modified
field in the SFDC account object which captures the last time
when a given object changed.

Synchronize SQL Server to Salesforce


• Exact steps as above, but in reverse order.
These pipelines can be run either in a periodic polling mode, or can
be triggered based on some event in the source system.
Enhanced Account
Single Sign Encryption
On
SSO in Snaplogic

• Supports single sign-on through SAML. If using an Okta, Ping,


OpenAM Identity Provider (IDP) to perform SAML, then SnapLogic
can be configured to authenticate users against your IDP.
• Configuration SSO requires the exchange of metadata between the
SnapLogic and IDP.
• Any new users still need to be added to SnapLogic to authorize
them to use SnapLogic.
• A user can belong to multiple organizations and still use SSO.
SSO only authenticates but does not control to which organization
a user has membership
SSO Administration
Enhanced Account
Big Data Encryption
SnapLogic in the Modern Data
Fabric
Consume

DELIVER DELIVER
Store & Process

HANA

Data Warehouses & Data Data Integration and


Marts Transformation Big Data and Data Lakes

INGEST INGEST
Source

z z z z
On Prem Relational Cloud Web NoSQL Internet of
Applications Databases Applications Logs Databases Things

54
Frictionless ingest
• Loose schema coupling - Schema on Read
• Avoid high price of processing data that’s not needed
• Store data in raw/original format
• Standard File types
• Text
• Structured Text – CSV, JSON(container format Avro)
• Binary(container format Sequence)
• Hadoop File Types(Splittable and Agnostic compression)
• Sequence(File based data structure)
• Avro (Serialization format)
• RC/ORC/Parquet (Columnar format)

55
SnapLogic Elastic Integration for Big
Data
Develop and Manage Runs in the cloud, stores only metadata

Visual Designer Manager Dashboard

Connect Library of pre-built Snaps

Parquet ORC Cassandra Redshift HBase HDFS …


New!
Process Select the optimal processing framework based on data volume or latency
New!
Standard SnapReduce Spark
Mode Pipelines Mode Pipelines Mode Pipelines

Execute Respects data gravity, runs where data lives

Snaplex – Cloud or Ground Hadooplex


Nodes + Coordinator Nodes + Coordinator + Translator

56
Hadoop Architecture – Elastic Scalability of Hadooplex

5 2 5

Resource Node Manager Node Manager


Manager 1

YA Application Master
R
N
3

Name Node

YARN 4
Node Manager Node Manager

Hadooplex Node Hadooplex Node RM – Yarn Resource Manager

NN – Hadoop Name Node

NM – Yarn Node Manager

DN – Hadoop Data Node


Democratizing Data Integration for
Spark
• Create MapReduce or Spark
pipelines by choosing from a picklist
• Translates pipelines for Spark
processing without coding

✓ Saves time
✓ Makes Spark accessible to non-experts
✓ Takes the data engineering off the plate
of your data scientists

58
▪Thank You
▪For More Information visit
▪www.snaplogic.com

▪Follow Us on
▪Twitter: snaplogic Facebook: Snaplogic

You might also like