Running Databricks Migrations Code Analyzer
Running Databricks Migrations Code Analyzer
1. PREREQUISITES
IMPORTANT NOTE:
Execution of Code Analyzer ( and SQL Splitter) needs the following prerequisites.
So make sure to go through this section to get the prerequisites taken care of.
If you have downloaded the MAC version, please pay special attention to section C
below.
2) This Analyzer package is a zip file which contains Analyzer Binaries (both linux
and windows versions) for Code Analyzer and SQL Splitter.
3) If you would like to do integrity checks on the downloaded zip file, please let
your Databricks representative know and we can provide this information for
you.
2.Instead use a Share folder (with proper access control) to upload the
generated report(s) and give access to Databricks technical point of contact.
a. You will receive a shared folder URL in the same email with Code
Analyzer download link.
b. If you have an existing secured file sharing process, set by your
organization, use it to share generated report files. Otherwise, you can
use the same shared-folder (in step a ) to upload the xlsx files.
C) MAC version requires an extra step to allow the running of the
analyzer (and more extra steps if you have M3+ chip on Mac)
1. After downloading the analyzer/splitter zip file and unzipping it. Navigate in
the mac terminal to the location of the unzipped analyzer contents (you may
want to place the analyzer in a folder outside of “Downloads”).
2. Perform an “ls -l” and you will notice that the “analyzer” unzipped file is not
executable. Change this by running “chmod +x analyzer”.
3. Re-run “ls -l” to verify you now see this:
4. Attempt to run the following command “./analyzer” which will bring up a
pop-up window saying the tool developer is not verified (this is expected).
DO NOT move it to trash, click Cancel.
5.
6. Navigate to System Settings -> Privacy and Security. Scroll down to where
you see the following and click “Open Anyway”.
7. Another window might open to verify that you want to open it, click open.
8. Re-run the command “./analyzer” to verify that you now have access to
run analyzer tool via command line. If it works, it will respond with the
successful output showing you which flags/arguments are available/required
by the tool.
9. If you have an M3, M3 pro, M3 max you also need to install Rosetta
(https://support.arduino.cc/hc/en-us/articles/7765785712156-Error-bad-CPU-t
ype-in-executable-on-macOS)
Please confirm that the general_sql_specs.json is in the same directory as the analyzer
executable file which should be included in the download already. If other config files are
needed, please seek out your Databricks representative to help in acquiring any other specific
config files needed based on the source system.
Sample commands to run the For Datastage:
analyzer -t DATASTAGE -d “<folder with ds xml files>” -r <path to xlsx
report file>
Informatica Powercenter
analyzer -t INFA -d “<folder with xml files>” -r <path to xlsx report
file>
Informatica Cloud
analyzer -u ic2dws.json -t INFACLOUD -d “<folder with zip files>” -r
<path to xlsx report file>
Important Note
Analyzing SQL code requires following steps for accurate analysis. Make sure to follow the
given instructions below.
● Typically DDL statements (like create table, create views, create procedure and
create functions etc…) are extracted for analysis using various
utilities/commands and they are kept in a few large files. These files need to be
split before running the analyzer.
● Whereas the DML statements , queries and data load scripts etc..are maintained
as part of application code they are kept in a large number of small files. These
files cannot be split for running them through the analyzer.
● Follow below steps to get your code ready for running it through the Analyzer
1) Keep SQL DDL files and Rest of the SQL code in separate root folders. As an
example..
mkdir analyzer_input
mkdir analyzer_input/sql_ddl
cp -R sql_other analyzer_input/sql_other
SSIS
Note:
** Paths, if they have spaces - say in the folder directory - use double quotes” .
E.g. “C:\Users\xyz\Downloads\analyzer-package\SQL Server”
(double quotes - because “SQL server” directory has space) - else you will get an error.
DataStage
The easiest way to export metadata is through the GUI, one folder at a time. In order to do
so, please right click on a folder to export and select the option “Export”. Please ensure that
the XML format is specified for the export and that all the jobs within the folder are selected
(they are by default)
Overview
To run the Analyzer or converters on Informatica XMLs, the XML file first need to be
extracted out of the PowerCenter repository. Typically, it is easier to deal with the
conversion of a relatively granular level, so extracting the artifacts at the workflow level
is advisable.
● Metadata Extraction
To extract the metadata out of PowerCenter repository, use the following
commands:
● Connect to repository
pmrep connect <list of credentials>
● Get the list of folders
pmrep listobjects -o FOLDER
● For each folder, get the list of workflows
pmrep listobjects -o WORKFLOW -f <your folder name>
● Workflow extraction
Create a batch script with the following command template for each folder.
Note: Excel can be used to create the script with the following command.
pmrep objectexport -n workflow_name -o WORKFLOW -f
folder_name -b -r -m -s -u path-to-output-file
The following comes from this article (How to read metadata in Informatica Cloud (IICS)? -
ThinkETL 1)
● Select all the Mapping Configuration tasks you want to read the metadata from
and export them as a single file.
● Exporting Mapping task fetches the associated mapping also.
● Make sure you select the check box as shown below to include all dependent
assets.
● Next Click on MyImport/Export Logs from the left pane. Go to Export Tab. Find
the name with which you exported the code. Click download.
● The entire tasks and its dependencies are downloaded as a single zip file. In
our example the file name will be IICS_Demo_Export.zip
Talend
To export all jobs in bulk, right click on Job Designs and select “Export Items”. In the popup,
select “Include All Dependencies”
Also here is a link on the topic: Talend export and import a job - Stack Overflow 15
Note: while Talend jobs can be exported as a single zip file, when running analyzer or any
converter utilities please unzip the file(s). Both the analyzer and converters will look for .item
and .properties files in non-zipped folders.
Typically, client environments make use of source code repositories, such as Git, SVN,
Perforce and others. It would be preferred to get the code from such a repository,
potentially a combination of production branches and dev/qa branches- whichever makes
sense. This is the preferred method of getting the code, as it is stored in its original form,
unobstructed by any database-injected code snippets.
The same is true for general shell scripts and shell script wrappers with embedded SQL
code.
If such a repository is not available, SQL-based objects, such as procedures, UDFs, macros,
table and view DDLs, can be extracted using either native code export utilities. SQL scripts
and BTEQ code that lives outside of the database on a file system can just be taken as is.
For example, in the case of Snowflake, you can use the below statement to extract DDLs. It
extracts definitions of schemas, tables, views, functions, stored procs, tasks, etc. in that
database. Please repeat the step for all production databases.
You will have to use the analyzer splitter option to split DDLs automatically (see next
section on how to use the splitter).
select get_ddl('database','<database name>');
Please note that some SQL exporter utilities may create files with a single long line, with all
the statements appended on the same line. This would not be an acceptable import into
the analyzer.
( Ask the Teradata/Oracle DBA to export out the Table DDL, Views DDL, Packages, Stored
procedures, Functions, etc. to a folder. And then run the analyzer on it - so that we can get
an analyzer results such as below: )
● Preferred way to export the DDLs is each database object into individual files ( by
selecting the “One script file per object” option in the Set Scripting Options step of
Generate Script wizard. Ref screen-shots below)
● If you already have all the DDL statements in a single file, Analyzer Package comes
with a SQL splitter program which you can use to split one large file with all DDL
statements into individual files. This needs to be executed before running the
analyzer command. Check “Run the Analyzer section” for SQL code.
Note: In the above step select all required object types. Above screenshot is for
illustration purpose only
Azure Synapse (Serverless)
To extract metadata like Table, View and Stored Procedures DDL you can use Microsoft
SQL Server Management Studio.
For a Serverless database “Generate Scripts” Context Menu option is not available at
Database level in the studio ( as of version 19.1). So we need to use the “Object Explorer
Details” view and select required objects to export the corresponding DDL to a file as in
below screenshots.
● You’ll need to export the DTSX packages. For details on how to obtain it see: Save
and Run Package (SQL Server Import and Export Wizard) - SQL Server Integration
Services (SSIS) | Microsoft Learn 35
ODI
Alteryx
● Analyzer needs the .yxmd files. These can be obtained by Select File > Export to
download your workflow to your local machine in .yxmd format.
● Instructions for export can be found in the following articles: SAP Help Portal 4
SAP Help Portal 2
How is complexity calculated in the analyzer?
If any of the following conditions are true, then mark the job as MEDIUM complexity:
If any of the following conditions are true, then mark the job as COMPLEX complexity:
If any of the following conditions are true, then mark the job as VERY COMPLEX complexity:
If the analyzer encounters a SQL procedure or function body inside a SQL file, it will
categorize the script as “ETL”.
Teradata MLOAD and FLOAD scripts follow the same rules as above.
Informatica Code Analysis
At the beginning of mapping analysis, mark mapping with complexity level of LOW
If any of the following conditions are true, then mark the mapping as MEDIUM complexity:
If any of the following conditions are true, then mark the mapping as COMPLEX complexity:
If any of the following conditions are true, then mark the mapping as VERY COMPLEX
complexity:
DataStage Analysis
At the beginning of job analysis, mark job with complexity level of LOW
If any of the following conditions are true, then mark the job as MEDIUM complexity:
If any of the following conditions are true, then mark the job as VERY COMPLEX complexity:
Talend Analysis
At the beginning of job analysis, mark job with complexity level of LOW
If any of the following conditions are true, then mark the job as MEDIUM complexity:
If any of the following conditions are true, then mark the job as COMPLEX complexity:
If any of the following conditions are true, then mark the job as VERY COMPLEX complexity:
If any of the following conditions are true, then mark the mapping as MEDIUM complexity:
If any of the following conditions are true, then mark the mapping as COMPLEX complexity:
If any of the following conditions are true, then mark the mapping as VERY COMPLEX
complexity:
If any of the following conditions are true, then mark the mapping as MEDIUM complexity:
If any of the following conditions are true, then mark the mapping as COMPLEX complexity:
If any of the following conditions are true, then mark the mapping as MEDIUM complexity:
If any of the following conditions are true, then mark the mapping as COMPLEX complexity:
If any of the following conditions are true, then mark the mapping as VERY COMPLEX
complexity:
If any of the following conditions are true, then mark the script as MEDIUM complexity:
If any of the following conditions are true, then mark the script as COMPLEX:
If any of the following conditions are true, then mark the script as VERY COMPLEX:
If any of the following conditions are true, then mark the mapping as MEDIUM complexity:
If any of the following conditions are true, then mark the mapping as COMPLEX complexity:
If any of the following conditions are true, then mark the mapping as VERY COMPLEX
complexity:
Splitter Instructions:
Purpose - Splits large SQL files with multiple objects into individual .sql files
sqlsplit
-h this message
######## OPTIONS ########
-o output folder