Shashi ETL Developer
Shashi ETL Developer
ETL DEVELOPER
Summary:
9+ years of IT industry experience encompassing a wide range of roles, skill sets, and industry verticals.
Technical and Functional experience implementing projects in AGILE, KANBAN & and WATERFALL
methodologies.
Engaged in data replication within RDBMS and RDBMS to Hadoop using Attunity replicator.
Strong and proven capabilities in ETL architecture design, end-to-end workflow development, data
migration, and Enterprise Data Warehouse framework implementation.
Experience in MongoDB document based data base system, and, none relational data modeling, and
CRUD operations.
Extensive experience in different phases of projects from initiation, design, development, and
implementation of software applications in Data warehousing.
Extensive knowledge of various MongoDB database design patterns and physical architectures for
different use cases.
Experienced in Design, Development and Implementation of large - scale projects in Financial, Shipping and
Retail industries using Data Warehousing ETL tools (Pentaho) and Business Intelligence tool.
Exclusively delivered multiple projects in the banking domain and insurance domains.
Highly proficient in the Development, Implementation, Administration, and Support of ETL processes for
Large-scale Data warehouses using Informatica, and DataStage.
Experience using ETL tools like Informatica Power Center 10.2/9.6/9.1/8.6, Informatica Power Exchange
10.2/9.6, Informatica Intelligent Cloud Services (IICS).
Extensive experience in Building, publishing customized interactive reports and dashboards, report
scheduling using Tableau Desktop and Tableau Server.
Expertise in Data Warehouse, Data-mart, ODS, OLTP, and OLAP implementations teamed with Project
Scope, Analysis, Requirements Gathering, Data Modeling, Effort Estimation, ETL Design, Development,
System Testing, Implementation, and Production Support.
Knowledge of Master Data Management concepts and methodologies and ability to apply this knowledge in
building MDM solutions with ETL.
Used Informatica BDM IDQ 10.1.1 (Big Data Management): To inject the data from AWS S3 raw to S3 refine
and from refine to Redshift.
Installed and configured Pentaho BI Server on different operating systems like Red Hat, Linux and Windows
Server.
Experience in Database Development, Data Warehousing, Design, and Technical Management.
Good understanding of database and data warehousing concepts (OLTP & OLAP).
Experience on Cloud Databases and Data warehouses (SQL Azure and Confidential Redshift/RDS).
Significant ETL testing expertise involving Informatica PowerCenter versions 9.1, 8.6.1, and 8.5
encompassing Designer, Workflow Manager, Workflow Monitor, and Server Manager.
Robust expertise in Dimensional Modeling, encompassing both Star and Snowflake Schemas, adeptness in
recognizing Facts and Dimensions.
Hands-on experience in tuning mappings, identifying, and resolving performance bottlenecks in various
levels like sources, targets, mappings, and sessions and at database.
Experience with Informatica Version Upgrades, Hot Fix management, and troubleshooting.
Extensive experience with data modeling techniques, and logical and physical database design.
Experience in uploading data into AWS-S3 bucket using information amazonS3 plugin.
Proficient in the Integration of various data sources with multiple relational databases like Oracle11g
/Oracle10g/9i, MS SQL Server, DB2, Teradata (BTEQ, Fload, MLoad), Netezza, VSAM files and Flat Files into
the staging area, ODS, Data Warehouse and Data Mart.
Strong Knowledge of Big Data architectures and distributed data processing frameworks: Hadoop, Spark,
Kafka, Hive.
Worked on SQL tuning in Exadata Production and Performance Testing environments by using statistics,
hints, SQL tuning set, etc.
Experience in creating ETL transformations and jobs using Pentaho Kettle Spoon designer and Pentaho Data
Integration Designer and scheduling them on Pentaho BI Server.
Define virtual warehouse sizing for Snowflake for different type of workloads.
Develop highly scalable, fault tolerant, maintainable ETL data pipelines to handle vast amount of data.
Good working exposure to the latest technologies like Python, Hadoop, AWS, and Cloud computing.
Good communication skills, interpersonal skills, self-motivated, quick learner, and team player.
Highly spirited team player with great organizational skills and ability to adapt to situations.
Effective handling of pressurized environment by managing work & and development leading activities.
Projects Description:
Data Lake - A data science-related initiative by global HR in Bank of America to convert the existing HRISDM
model into a new cloud-based Hadoop system named HaaS integration. To achieve the goal of cloud
computing this project architecture comprises the usage of different tools & programming languages like
HDFS, Python programming, Hive, Kafka, Autosys, ETL - Informatica, Teradata, etc.
Responsibilities: -
Developing Python script or program to push & and pull SOR files into the landing zone and then to Edge
Node to format those data files which is supported by HDFS.
Understanding transformation and techniques especially with prem source to Attunity.
Executed & and delivered projects in Agile, Kanban, and Waterfall methodologies.
Developed a Python script to initiate a web service call that will further extract the operational data in
XML form and load it into the SQL tables.
Engaged in application design and schema less data modeling team discussions for big data in MongoDB
environment.
Designed and developed Power BI graphical and visualization solutions with business requirement
documents and plans for creating interactive dashboards.
Used Pentaho Data Integration/Kettle to design all ETL processes to extract data from various sources
including live system and external files, cleanse and then load the data into target data warehouse.
Developed complex Tableau Dashboard reports by gathering requirement on direct interactions with
Business/Operation teams.
Real time data processing with CDC replication using Attunity and Streamsets tools.
Used Informatica to parse out the XML data into the DataMart structures that are further utilized for the
reporting needs.
Design, build and maintain Big Data ETL workflows/pipelines to extract, transform and load data from
relational SQL server to a MongoDB test sever using ODBC connector.
Advanced knowledge on Confidential Redshift and MPP database concepts.
Performing pre-validation steps on landed files in the landing zone by verifying naming standards,
data record count, header or trailer checks, etc.
Utilized Power BI (Power View) to create various analytical dashboards that depicts critical KPIs such
as legal case matter, billing hours and case proceedings along with slicers and dicers enabling end-
user to make filters.
Experience in design and developing Application leveraging MongoDB.
Distribute the HDFS formatted or processed files from the Edge node to Data Nodes based on PK i.e.;
partition key.
Maintained and scheduled of Tableau Data Extracts using Tableau Server and the Tableau Command
Utility.
Created user-friendly and a dynamically rendered custom dashboard to visualize the output data using
the Pentaho CDE and CDF.
Incorporate or Implement the ETL logic from HRISDM application into Data Lake using Python scripting.
Effectively using the IICS Data integration console to create mapping templates to bring data into staging.
layer from different source systems like SQL Server, Oracle, Teradata, Salesforce, Flat Files, and Excel Files.
Created data pipelines using Python, PySpark, and EMR services on AWS.
To increase the performance balance the input files of slice count against large files and
loaded into AWS-S3 Refine Bucket and by using copy command achieved the micro-batch load into the Amazon
Redshift.
Extracting and uploading data into AWS S3 buckets using the Informatica AWS plugin.
Perform different change data capture based on file received say full file or delta file.
Loading tables with inserts, updates & deletions from SOR file.
Creating/updating files for backup as Edge node does not hold previous day files.
Extracted BOFA employee profile data from LinkedIn using web scraping with Python programs.
Archival of partitioned files, history files, etc. into data nodes for future retrieval and as part of the
archival process.
Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and
Table calculations.
Experience in working with tools like Attunity and having knowledge on ETL tools like Talend and wrote
Python modules to view and connect teh ApacheCassandrainstance.
Prepare Autosys jils with load balancing by setting load limits to schedule Python programs.
Developing Informatica mappings, workflows, and worklets, to convert requirements into technical code
as part of the HRISDM project.
Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in
Redshift.
Developing DataStage parallel jobs, job sequences as part of profitability, EDA -ADS & GFIR projects.
Performing Unit testing, System integration, and User acceptance testing by creating unit test cases for
above created ETL code.
Created various types of chart reports in Pentaho Business Analytics having Pie Charts, 3D Pie Charts, and
Line Charts, Bar Charts, Stacked Bar Charts and Percentage Bar charts.
Autosys JILs preparation to invoke above created.
Planning the tasks by preparing work breakdowns to distribute to individual team members.
Raising requests for approval for new resource deployments reviewing the deliverable and updating the status to
LOB management.
Utilized Power Query in Power BI to Pivot and Un-pivot the data model for data cleansing and data massaging.
Team Performance review and ramp up & and ramp down the resources based on the budget.
Guiding the team to avoid Global Information Security risks.
Provide & and plan for resolutions for project risks, Status Tracking, and Leadership Reporting and Change Request
Implementation.
Environment: Informatica Power Center, Informatica B2B DT studio, python, Informatica B2B DX Console, ILM, MongoDB,
Snowflake, SQL server, PL/SQL, SQL, Python, Shell, Bash Scripting, Jira.
Client: Change Healthcare, Bridgeton, MO Mar 2020 to May 2022
Role: Senior Informatica ETL Developer
Responsibilities:
Used various Informatica transformations like Source Qualifier, Aggregator, Joiner, Filter, Router, Lookup, Update Strategy
and Sorter to improve the ETL performance.
Spearheaded data integration projects with Informatica, ensuring seamless data flow across systems, and accelerating
data availability by 40%.
Install and register the set of transformations that you use in Power Center workflows to process B2B Data Exchange
documents
Administered user, user groups, and scheduled instances for reports in Tableau.
Installation of MongoDB RPM’s, Tar Files and preparing YAML config files.
Loaded data using SSIS from different sources like Excel, text, CSV etc into SQL Server and Oracle 11i tables using attunity
after complex transformations and applying business rules using C sharp for Tera Data Migration Project.
Worked with ATLASSIAN for defects and Enhancements status tracking.
Designed and developed ETL packages using SQL Server Integration Services (SSIS) to load the data from SQL server, XML
files to SQL Server database through custom C# script tasks.
Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI.
Created and saved Pentaho jobs in enterprise repository and scheduled them to run in production on weekly basis.
Involved in license management SLA's with Tableau support.
Used Atlassian Confluence for documentation repositories, Jira for project management and defects management
Participated in sprint planning meetings, worked closely with manager on gathering the requirements.
Helped in running the B2B files using Informatica and tracking down the status both on B2B DX console and Informatica
Monitor to observe the Statistics of Applied Rows, Affected Rows and Throughputs.
Engaged in the process of Dimensional modeling, specifically utilizing the Star Schema approach for the Data Warehouse
Developed a Web service on the Postgres database using python Flask framework which was served as a backend for the
real-time dashboard.
Established and maintained attunity tasks to handle big data and worked on Hadoop cluster using big data analytic tools
including attunity, HUE, Spark and Kafka
Developed and optimized complex SQL queries, improving database query performance by 30%
Successfully translated complex business rules into Informatica Data Transformation mappings, addressing intricate
requirements for data processing.
Led and actively participated in design and code review sessions, ensuring adherence to coding standards and best
practices.
Responsible for migrating existing MFT profiles and file transfer processes from one SharePoint location to another.
Designed Unix shell scripts to automate data validation and testing procedures, allowing for comprehensive quality
assurance within the Informatica ETL processes.
Utilized Power BI gateway to keep dashboards and reports up to-date with on premise data sources.
Deployed reports on Pentaho BI Server to give central web access to the users.
Conducted root cause analysis for testing failures, leading to the implementation of preventive measures,
Implemented error-handling strategies within Informatica mappings to manage issues related to JSON data processing
Supported migration activities of MFT profiles of various consumer and vendor groups and validated them
Collaborated with external vendors to integrate third-party systems, ensuring seamless data flow across platforms
Mentored new team members for a seamless onboarding experience, providing insights into role responsibilities, best
practices, and project methodologies
Provided 24*7 support for Production Migration.
Environment: Informatica Power Center Power Exchange, Informatica B2B DT studio, Informatica B2B DX Console, ILM,
Python, Snowflake, Redshift, SSIS, SQL server, AWSSQL, Oracle 12c, PL/SQL (Stored Procedures), OBIEE, MongoDB.
Responsibilities:
Involved in understanding the scope of the application, present schema, and data model, and defining relationships
within and between the groups of data.
Interaction with the business users to better understand the requirements and document their expectations,
handling the current process, modifying, and creating the jobs to the updated requirements, handling the load
process to data mart and eventually data warehouse.
Involved in understanding the business requirements and designing and loading data into a data warehouse (ETL).
Developed Design documents and ETL mapping documents.
Loading of data from different source systems into the Oracle Target database.
Analysis of the Source system from where the data needs to be extracted.
Involved in production support, defect remediation, and peer Code reviews.
Coordinating with the offshore team to develop the ETL Code.
Designed and developed mappings using Informatica/SQL processes to load data into tables.
Extensively used the Oracle Connector, Sequential File Stage, Dataset Stage, File set Stage, Modify Stage, Sort, Join,
Remove Duplicates, Look-up, transformation, and other Database plug-ins and the stage provided by the Data
Stage to perform transformation and load the data.
Designed Mappings between sources to operational staging targets, using Star Schema, implemented logic to
Slowly Changing Dimensions.
Designed a job template that provides the Environmental parameters for the subsequent use in the projects.
Extensively worked on IBM Infosphere Change Data Capture to extract the data from Oracle and Db2 sources.
Extensively used Microsoft Visio for documentation and control _m to schedule the jobs.
Created a Schema file pattern to read multiple files with one ETL job.
Responsible for metadata management, creating shared containers for reusability.
Worked with Business users to fix defects and implemented the best practices for conducting best unit testing and
integration testing and documenting the test results for user acceptance.
Developed deployment strategy and coordinated the deployment of applications.
Identified the control M details and verify stage Control M sync with Prod Control M
Prepared the implementation plan and reviewed with all the stake holders for production.
Responsibilities:
Gathered requirements from Business and documented for project development
Installation, Creation and support of Oracle database environments.
Created data maps in Informatica to extract data from Sequential files
Coordinated design reviews, ETL code reviews with teammates
Creating table spaces, tables, views, scripts for automatic operations of the database activities.
Data conversion from flat file to intermediate tables using SQL*Loader, Data mapping
Created control files for SQL*Loader.
Worked with Informatica Power Center for data processing and loading files.
Extensively worked with Informatica transformations.
Designed/developed tables, views, various SQL queries, stored procedures, functions.
Monitor database and system backups, developed Oracle Stored procedures, functions, packages, and triggers
that pull data for reports.
Education:
Bachelor’s in engineering - 2013, LPU, Punjab, India.