GSR Azure High Level Architecture
GSR Azure High Level Architecture
Mentioned below are the steps that explain the above mentioned architecture.(Servernames and
usernames are marked in red color)
1. The data is getting extracted from two CooperSurgical sources i.e. Navision and DataFlo ERP.
2. Azure Data Factory is the ETL tool that is being used to extract the data from Navision and
DataFLo ERP and is used to load the data into Azure Cloud
3. For Navision we are using the SQL Server ODBC connector to connect to Azure Data Factory to
ingest data from Navision ERP to Azure Data Lake (Storage Account).Read only Account
coopersurgical1\svc_gsrreadonly is being used to connect from Navision to load data into Azure
Data Lake via Data Factory ETL Tool.
6. For DataFlo ERP ,the DataFlo CSV files are being placed at the SFTP Host named Cooper1 and we
are using the SFTP User called azure to load the CSV files from SFTP Host to Azure Lake via
DataFactory ETL Tool.
7. As a prerequisite step an integration run time client software has been configured by CSI Infra
team on a dedicated server in Denmark for the ETL tool i.e. Data Factory to continue the data
ingestion from Navision and DataFlo ERP.The name of the integration runtime client is
CSIDEVIntegrationRuntime.Ticket #104531 can referred for related instructions on this.
8. The only job of the Integration Runtime Client service mentioned in the preceding step is to
ensure connectivity to the source and to allow the ETL tool DataFactory to be able to connect to
source Navision and DataFlo ERP systems and ingest data from the source.
9. All the data coming from Navision and DataFlo ERP is ingested in folder called L1 which resides
in the container named sourcedata in the Azure Data Lake .The path of the sourcedata container
in DataLake is https://csigsrdevdatalakestorage.blob.core.windows.net/sourcedata .
10. At the DataLake layer following nomenclature has been used .sourcedata is the name of the
container that is holding all the data ingested from the source.L1 is the subdirectory that holds
the raw source data .L2 subdirectory contains data transformation and L3 subdirectory contains
processed data.
11. Finally we are loading the processed data from L3 folder from Azure Data Lake to Azure Synapse
Datawarehouse.The servername of the Synapse database in Dev Environment is
gsrdevsqldw.database.windows.net and we are using the sqldevadmin Synapse SQL Account to
load the data from Data Lake to Synapse.Again Data Factory will be used to load the data here.
Sqldevadmin account is the SQL account that was created at the time of creation of Synapse
Datawarehouse.
12. In addition to the preceding step once a login is made to the SQL Account the following SQL
Command needs to be issued in the Synapse Datawarehouse .This command allows the data to
be loaded from external sources such as Azure Data Factory ETL tool
13. For the higher Environments including QA , UAT and PROD steps 11 and 12 needs to be
followed and the CooperInfra team needs to share with us the Synapse Datawarehouse Server
name and the SQL username and password that was created at the time of creation of Synapse
Datawarehouse.This step is needed because our ETL team needs to repoint connection entries
for Synapse Datasources mentioned in Step 11 in order for them to efficiently load data in
Synapse Datawarehouse.
14. For all purposes inbound connections on Synapse port 1433 needs to be enabled for SQL and
Power BI connectivity for the scope of this project.Later in the future if Synapse Studio feature is
being used then inbound connections to port 80,443 and 53 needs to be enabled.