0% found this document useful (0 votes)
70 views6 pages

Maneesh Azure

Uploaded by

nitya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views6 pages

Maneesh Azure

Uploaded by

nitya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

SR.

DATA ENGINEER
Name: Maneesh Contact No:
5512922788
Email ID: [email protected]
linkedin.com/in/GManeesh/

SUMMARY:
 Around 10+ years of IT experience with Data Engineer and coding with analytical programming using SQL, Python,
Snowflake.
 Experience in developing very complex mappings, reusable transformations, sessions, and workflows using Informatica
ETL tool to extract data from various sources and load into targets.
 Experience in implementing various Big Data Analytics, Cloud Data Engineering, and Data Warehouse/Data Mart, Data
Visualization, Reporting, Data Quality, and Data virtualization solutions.
 Private Cloud Environment – Leveraging Azure and Puppet to rapidly provision internal computer systems for various
clients.
 Excellent experience in Designing, Developing, Documenting, Testing of ETL jobs and mappings in Server and Parallel jobs
using Data Stage to populate tables in Data Warehouse and Data marts.
 Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds.
 Hands on experience in GCP, Big Query, GCS, cloud functions, Cloud dataflow, Pub/Sub, cloud shell, GSUTIL, BQ
command- line utilities, Data Proc.
 Experience in creating and executing Dasnota pipelines in GCP and Azure platforms.
 Hands on experience on Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks
Notebooks, Delta Lake with Python, Delta Lake with Spark SQL.
 Extracted data from HDFS using Hive, presto and performed data analysis using Spark with PySpark and feature selection
and created nonparametric models in Spark.
 Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up Azure and GCP with
Databricks, Databricks Workspace for business Analytics, Manage Clusters in Databricks, Managing the Machine Learning
Lifecycle.
 In depth knowledge of Azure cloud service like Compute, Network, and Storage and Identity& Access management.
 Leveraged Azure Service Fabric Mesh for deploying containerized microservices, ensuring seamless orchestration and
management of containerized workloads.
 Integrated Azure Service Fabric with other Azure services, such as Azure Storage, Azure Key Vault, and Azure Monitor, to
enhance application functionality and security.
 Experienced in integration of various data sources (DB2-UDB, SQL Server, PL/SQL, Oracle, Teradata, XML and MS-
Access) into data staging area.
 Deep expertise in Big Data technologies and the Hadoop ecosystem, including HDFS, Hive, Impala, Pig, YARN, Spark,
Python, PySpark.
 Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.
 Experience in Developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and
aggregation from multiple file formats for analyzing and transforming the data to uncover insights into the customer usage
patterns.
 Used various file formats like Avro, Parquet, Sequence, Json, ORC and text for loading data, parsing, gathering, and
performing transformations.
 Good experience in Hortonworks and Cloudera for Apache Hadoop distributions.
 Designed and created Hive external tables using shared meta-store with Static & Dynamic partitioning, bucketing, and
indexing.
 Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context,
Spark-SQL, Data Frame, pair RDD's.
 Extensive hands-on experience tuning spark Jobs.
 Scheduled Spark and Hive jobs using Apache Airflow to automate data processing tasks, ensuring timely execution and
delivery of data products.
 Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test and deploy.
 Experience in creating fact and dimensional model in MS SQL server and Snowflake Database utilizing Cloud base
Informatica ETL tool.
 Experienced in working with structured data using HiveQL and optimizing Hive queries.
 Familiarity with libraries like PySpark, Pandas, Star base, Matplotlib in python.
 Writing complex SQL queries using joins, group by, nested queries.
 Experience in HBase to load data using connectors and write queries using NOSQL.
 Experience with solid capabilities in exploratory data analysis, statistical analysis, and visualization using R, Python, SQL,
and Tableau.
 Running and scheduling workflows using Oozie and Zookeeper, identifying failures and integrating, coordinating, and
scheduling jobs.
 Created shell script to run data stage jobs from UNIX and then schedule this script to run data stage jobs through scheduling
tool.
 In - depth understanding of Snowflake cloud technology.
 Hands on experience on Kafka and Flume to load the log data from multiple sources directly in to HDFS.
 Widely used different features of Teradata such as BTEQ, Fast load, Multifood, SQL Assistant, DDL and DML commands
and very good understanding of Teradata UPI and NUPI, secondary indexes and join indexes.

TECHNICAL SKILLS

Big Data Technologies Hadoop, Map Reduce, HDFS, Sqoop, Hive, HBase, Flume, Kafka, Yarn, Apache Spark. Big
data.
Setting up AWS and, Spark Data Frame API, Data Extraction and Transformation and Load
(Databricks & Hadoop, Databrick Administration.
Databases Oracle, MySQL, SQL Server, MongoDB, Dynamo DB, Cassandra, Snowflake.

Programming Languages Python, Pyspark, Shell script, Perl script, SQL

Tools PyCharm, Eclipse, Visual Studio, SQL*Plus, SQL Developer, SQL Navigator, SQL Server
Management Studio, Eclipse, Postman.
Version Control SVN, Git, GitHub, Maven
Operating Systems Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, OS
Visualization/ Reporting Tableau, ggplot2, matplotlib

Google Cloud Platform: GCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud Functions, Cloud
Pub/Sub.
Azure Azure Data Factory, Azure Virtual Machines, Azure Blob Storage, Azure Databricks, Azure
Data Lake, Azure SQL,

PROFESSIONAL EXPERIENCE

Client: UBS, Remote OCT 2021 -


PRESENT
Role: Sr. Azure Data Engineer
Responsibilities
 Leveraged Azure Data Lake Storage Gen2 extensively for ingesting structured data from various sources and facilitating data
movement to other systems like Azure Synapse Analytics (formerly SQL Data Warehouse) or Azure Databricks for reporting
and analytics purposes.
 Utilized Azure Databricks Streaming APIs for real-time data processing and transformations, enabling the creation of a
unified data model from Azure Event Hubs or Azure Stream Analytics inputs.
 Conducted comprehensive architecture and implementation assessments of various Azure services including Azure
HDInsight, Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure Data Factory, and Azure Event Hubs.
 Implemented solutions for data ingestion from diverse sources and processing using Azure HDInsight for Hadoop, Azure
Data Lake Analytics for big data processing, and Azure SQL Data Warehouse for data-at-rest analytics.
 Designed and deployed CI/CD pipelines for data workflows utilizing Azure Data Factory and Azure Functions for data
transformation and orchestration.
 Conducted data cleansing and transformation tasks using Azure Databricks and Spark for data analysis and processing.
 Proficient in Azure Databricks architecture, including Spark Core and Spark SQL, with hands-on experience in data
manipulation and transformation using Spark SQL and DataFrame API.
 Developed and executed Spark workflows using Python/SQL for extracting data from Azure Data Lake Storage and loading it
into Azure Synapse Analytics with applied transformations.
 Implemented proof of concept deployments in Azure Data Lake Storage and Azure Synapse Analytics to validate data
processing workflows.
 Engineered ETL data pipelines using Azure Databricks, Azure Data Factory, and Azure SQL Database for data extraction,
transformation, and loading tasks.
 Developed complex T-SQL queries and stored procedures for data extraction, transformation, and loading (ETL) processes
within Azure SQL Database and Azure Synapse Analytics, ensuring efficient data manipulation and robust reporting
capabilities.
 Optimized data pipelines for ETL processes using Azure Databricks, ensuring compliance with data integrity, quality, and
security standards such as GDPR.
 Utilized Azure Kubernetes Service (AKS) and Azure Container Instances for building and managing containerized
environments in CI/CD systems.
 Installed and configured SQL Server on Azure Virtual Machines, Visual Studio, and integrated Informatica ETL tools for
data integration tasks.
 Implemented data models and schemas with Azure Purview for metadata management and governance, ensuring adherence to
established data governance standards and practices.
 Expertise in implementing DevOps practices through Azure DevOps for version control, continuous integration, and
continuous deployment.
 Leveraged Azure Database Migration Service and Azure Data Migration Service for seamless migration of data from on-
premises databases to Azure cloud services.
 Loaded data from relational databases to Azure Data Lake Storage using Azure Data Factory and Azure Databricks.
 Responsible for estimating and managing the Azure Databricks cluster size, monitoring cluster performance, and
troubleshooting issues.
 Designed and implemented scalable, highly available, and fault-tolerant systems on Azure using Terraform, ensuring optimal
resource management and infrastructure reliability.
 Utilized Azure SQL Database as a primary data store and query engine, creating external tables for data processed by Azure
HDInsight.
 Established Azure SQL Database as a centralized Hive Metastore for multiple Azure HDInsight clusters, ensuring data
consistency and availability.
 Implemented data ingestion pipelines from various source systems using Azure Data Factory and Azure Databricks.
 Hands-on experience in performance tuning of Spark and Hive jobs on Azure Databricks and Azure HDInsight.
 Proficient in writing Spark scripts in Python and SQL for data processing and analysis on Azure Databricks.
 Configured Spark executor memory settings for optimal performance, developed unit tests for Spark jobs, and performed
tuning based on job metrics.
 Created CI/CD pipelines on Azure DevOps to automate software delivery processes.
 Utilized Azure Stream Analytics for real-time data processing and analysis, enabling timely insights from streaming data
sources.
 Implemented automated testing frameworks and unit tests for Spark and Hive jobs on Azure Databricks and Azure
HDInsight.
 Developed shell scripts for logging and monitoring activities, storing logs in Azure Storage for data integrity.
 Optimized ETL pipelines using Azure Data Factory and Azure Databricks for improved data ingestion and transformation
performance.
 Implemented partitioning and bucketing strategies in Azure SQL Database and Azure Synapse Analytics for enhanced query
performance.
 Designed and deployed ETL pipelines on Azure Data Lake Storage in Parquet file format using Azure Data Factory and
Azure Databricks.
 Created Azure Resource Manager (ARM) templates for content delivery with cross-region replication in Azure Storage.
 Used Azure DevOps Repos for version control and replication of programming logics and scripts across Azure Databricks
clusters.
 Leveraged Azure Synapse Analytics for columnar data storage, advanced compression, and parallel processing of data.
 Worked with Snowflake cloud data warehouse and Azure Data Lake Storage for integrating data from multiple source
systems, including loading nested JSON formatted data into Snowflake tables.
 Migrated a quality monitoring program from Azure Virtual Machines to Azure Functions and created logical datasets for
quality monitoring on Snowflake warehouses.

Environment: Azure Data Factory, Azure Virtual Machines, Resource Manager, Azure Blob Storage, Azure Databricks, Azure
Data Lake, Hive, Azure SQL, Cosmos DB, Azure Shell, Power BI, Cloudera 7.1.7

Client: Visa, Austin, TX Apr 2020 – Sep


2021
Role: GCP Data Engineer
Responsibilities:
 Migrating an entire oracle database to BigQuery and using of power bi for reporting.
 Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
 Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
 Experience in moving data between GCP and Hadoop using BigQuery.
 Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
 Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from
BigQuery.
 Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over
large Datasets.
 Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning, clustering and
skewing.
 Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities.
 Worked with google data catalog and other google cloud API’s for monitoring, query and billing related analysis for
BigQuery usage.
 Worked on creating POC for utilizing the ML models and Cloud ML for table Quality Analysis for the batch process.
 Knowledge about cloud dataflow and Apache beam.
 Good knowledge in using cloud shell for various tasks and deploying services.
 Created BigQuery authorized views for row level security or exposing the data to other teams.
 Implemented Data Exploration to analyze patterns and to select features using Python SciPy.
 Supported MapReduce Programs running on the cluster.
 Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop
written programs.
 Implemented data governance practices leveraging Unity Catalog, ensuring adherence to established standards, metadata
management, and lineage across the project's ecosystem in Google Cloud Platform (GCP).
 Collaborated with cross-functional teams to design and maintain data models and schemas, incorporating Unity Catalog data
governance practices, and ensuring data integrity and quality throughout the migration process from Oracle database to
BigQuery.
 Created BigQuery authorized views in alignment with Unity Catalog data governance policies, ensuring row-level security
and controlled access to sensitive data for different teams.
 Migrating an entire oracle database to BigQuery and using of power bi for reporting. Build data pipelines in airflow in GCP
for ETL related jobs using different airflow operators.
 Experienced in Google cloud components, Google container builders and GCP client libraries and cloud SDK’S.
 Developed, debugged and optimized PostgreSQL queries for reporting and app development.
 Got involved in migrating on prem Hadoop system to using GCP (Google Cloud Platform).
 Experienced in working Services like Data Lake, Data Lake Analytics, SQL Database, Synapse, Data Bricks, Data factory,
Logic Apps and SQL Data warehouse and GCP services Like Big Query, Dataproc, Pub sub etc.
 Enforced complex business rules while populating the new transaction code data to different data marts like CSDM, USDM
and RBDM.
 Automate the process to send the data quality alerts to slack channel and email using Databricks and Python. This will alert
users if there are any issues with data.
 Worked on various modules for CSDM (Credit Card Services Data Mart) and USDM (U.S. Data mart) and RBDM module
that reported production issue, which were part of SCR (Small Change Request).
 Experienced in migrating legacy systems into GCP technologies.
 Created HBase tables to store various data formats of data coming from different portfolios.
 Worked on improving performance of existing Pig and Hive Queries.
 Perform analysis and engineering for High Availability and Disaster Recovery among current and future Data Center,
Virtualization and Cloud deployments.
 Used Git 2.x for version control with Data Scientists team and Data Engineer colleagues.
 Used Agile methodology and SCRUM process for project developing.
 KT with the client to understand their various Data Management systems and understanding the data.
 Designed the business-critical disaster recovery and high availability environments using a run anywhere methodology.
 Performing QA on the data extracted, transformed and exported to excel.
 Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and
performed Gap analysis.
Environment: R, Python (SciPy, NumPy, Pandas, Disaster recover, StatsModel, Plotly), MySQL, Excel, Google Cloud Platform,
Tableau 9.x, D3.js, GCP,Pyspark, GCPs Data Proc BigQuery, SVM, Random Forests, Machine learning, A/B experiment, Git 2.x,
Agile/SCRUM.,

Client: Thermo fisher, Rochester, NY Aug 2019 – Mar


2020
Role: Data Engineer
Responsibilities:
 Worked independently in the development, testing, implementation, and maintenance of systems of moderate-to-large size
and complexity.
 Used the Agile methodology to build the different phases of Software development life cycle (SDLC).
 Extensively worked using Azure services along with wide and in depth understanding of each one of them.
 Create and developed data load and scheduler process for ETL jobs using Informatica ETL package.
 Responsible for ETL (Extract, Transform, and Load) processes to bring data from multiple sources into a single warehouse
environment.
 Provided the technical support for debugging, code fix, platform issues, missing data points, unreliable data source
connections and big data transit issues.
 Investigated data sources to identify new data elements needed for data integration.
 Worked with delivery of Data & Analytics applications involving structured and un-structured data on Hadoop based
platforms.
 Created complex SQL queries and scripts to extract, aggregate and validate data from MS SQL, Oracle, and flat files using
Informatica and loaded into a single data warehouse repository.
 Carried out various mathematical operations for calculation purpose using python libraries.
 Designed and deployed Azure solutions utilizing VMs, Azure Blob Storage, Azure SQL Database, and Azure Synapse
Analytics.
 Collaborated on integrating Snowflake cloud data warehouse with Azure Blob Storage for streamlined data processing and
analytics, enhancing the project's ability to manage diverse datasets.
 Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
 Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data through
Hadoop, MapReduce, Pig and Hive.
 Leveraged Snowflake cloud data warehouse and Azure Blob Storage for orchestrating data workflows, establishing a robust
foundation for the project's analytics and business intelligence solutions.
 Implemented Continuous Integration and Continuous Delivery (CI/CD) pipelines on Azure, utilizing Azure DevOps Repos
for version control and efficient deployment of Snowflake-related logics and scripts.
 Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and
processed.
 Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
 Proficiency in wrangling data and creating data pipelines using fast, efficient Python code.
 Used GIT for the version control and deployed project into Azure.
 Leveraged Azure Data Factory for data transformation, validation, and cleansing.
 Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
 Designed and developed data management system using MySQL.
 Worked with Analysts on various requirements gathered in JIRA.
 Designed and documented Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object Oriented Design) using Visio.
 Created complex program unit using PL/SQL Records, Collection types.
 Developed Sqoopscripts to extract the data from MYSQL and load into HDFS.
 Effectively loaded real time data from various data sources into HDFS using Kafka.
 Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
 Created several Databricks Spark jobs with Pyspark to perform several tables to table operations.
 Developed Azure Functions using Python with Blob Storage triggers to automate workflows.
 Prepared Dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies in Tableau.
Environment: Hadoop3.0, Agile, Azure, Python 3.7, MS Visio, JIRA, MySQL, HDFS, Kafka1.1, GIT, EC2, S3, Spark 2.4,
OLTP, ODS, MongoDB, Tableau

Client: Global Logic, Hyderabad, India Jan 2017 – Jul


2019
Role: Data Engineer
Responsibilities:
 Performed Data Analysis using SQL queries on source systems to identify data discrepancies and determine data quality.
 Performed extensive Data Validation, Data Verification against Data Warehouse and performed debugging of the SQL-
Statements and stored procedures for business scenarios.
 Designed and developed Tableau dashboards using stack bars, bar graphs, scattered plots, and Gantt charts.
 Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
 Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts.
 Handled performance requirements for databases in OLTP and OLAP models.
 Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL
stored procedures and triggers.
 Designed and implemented ETL process to load data from different sources, perform data Mining, and analyze data using
visualization/reporting tools to leverage the performance of OpenStack.
 Effectively performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log
files.
 Experience with Apache big data Hadoop components like HDFS, MapReduce, YARN, Hive, HBase, Sqoop, and Nifi.
 Performed data completeness, correctness, data transformation and data quality testing using SQL.
 Involved in designing Business Objects universes and creating reports.
 Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the
business.
 Prepared complex T-SQL queries,views and stored procedures to load data into staging area.
 Hands on experience with scripting languages like Unix, Linux.
 Wrote UNIX shell scripts to invoke all the stored procedures, parse the data and load into flat files.
 Created reports analyzing large-scale database utilizing MicrosoftExcelAnalytics within legacy system.
Environment: SQL, PL/SQL, OLAP, OLTP, UNIX, MS Excel, T-SQL. Hadoop 3.0

Client: Honey Well, India Jan 2013 – Dec


2016
Role: ETL Developer
Responsibilities
• Worked on Informatica Designer Components -Source Analyzer, Warehouse Designer, Mapping Designer & Mapplet
Designer, Transformation Developer, Workflow Manager and Workflow Monitor.
• Gathered functional specifications from Business Analysts and prepared ETL technical specs for ETL mappings.
• Developed and supported Extraction, Transformation and Load process (ETL) using Informatica Power Center to populate
the tables in Data warehouse.
• Extracted and transformed source data from different databases including Oracle, Siebel CRM, SAP, Agile and flat files into
Oracle.
• Developed Session tasks and Workflows to execute the mappings and using mapping and session parameters.
• Proficient in SQL, PL/SQL, Procedures/Functions, Triggers and Packages in Oracle
• Implemented Data Cleansing logic in the mappings to load Quality Data into the Data Warehouse
• Created complex mappings in Informatica to load the data from various sources into the Data Warehouse, using different
transformations like Joiner, Aggregator, Update Strategy, Rank, Router, Lookup - Connected & Unconnected, Sequence
Generator, Filter, Sorter, Source Qualifier, Stored Procedure transformation etc.
• Scheduling the ETL jobs through Scheduler tool Control-M.
• Developed reusable transformations and mapplets.
• Tuned the mappings to get better performance.
• Developed Slowly Changing Dimension Mappings of type I, II and III.
• Worked as Production Support resource for fixing the issues in Production environment.
• Involved in Deployment and Implementation plan among the environments.
• Worked on Debugging and Performance Tuning of targets, sources, mappings and sessions.
• Developed and implemented Data Cleanup procedures, transformations, Scripts, Stored Procedures and execution of test
plans for loading the data successfully into the targets.
• Handled large volumes of data warehouse database, Data Backup and Recovery.
• Involved in Database Design, Entity-Relationship modeling, Dimensional modeling like Star schema and Snowflake schema.
Environment: Oracle10g, SQL, PL/SQL, SQL*PLUS, Informatica Power Center 8.1.1, SQL Developer, TOAD, SAP, SIEBEL
CRM, E-R Modeling, Dimensional Modeling, and Star Schema, Snowflake Schema, Erwin.

You might also like