Maneesh Azure
Maneesh Azure
DATA ENGINEER
Name: Maneesh Contact No:
5512922788
Email ID: [email protected]
linkedin.com/in/GManeesh/
SUMMARY:
Around 10+ years of IT experience with Data Engineer and coding with analytical programming using SQL, Python,
Snowflake.
Experience in developing very complex mappings, reusable transformations, sessions, and workflows using Informatica
ETL tool to extract data from various sources and load into targets.
Experience in implementing various Big Data Analytics, Cloud Data Engineering, and Data Warehouse/Data Mart, Data
Visualization, Reporting, Data Quality, and Data virtualization solutions.
Private Cloud Environment – Leveraging Azure and Puppet to rapidly provision internal computer systems for various
clients.
Excellent experience in Designing, Developing, Documenting, Testing of ETL jobs and mappings in Server and Parallel jobs
using Data Stage to populate tables in Data Warehouse and Data marts.
Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds.
Hands on experience in GCP, Big Query, GCS, cloud functions, Cloud dataflow, Pub/Sub, cloud shell, GSUTIL, BQ
command- line utilities, Data Proc.
Experience in creating and executing Dasnota pipelines in GCP and Azure platforms.
Hands on experience on Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks
Notebooks, Delta Lake with Python, Delta Lake with Spark SQL.
Extracted data from HDFS using Hive, presto and performed data analysis using Spark with PySpark and feature selection
and created nonparametric models in Spark.
Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up Azure and GCP with
Databricks, Databricks Workspace for business Analytics, Manage Clusters in Databricks, Managing the Machine Learning
Lifecycle.
In depth knowledge of Azure cloud service like Compute, Network, and Storage and Identity& Access management.
Leveraged Azure Service Fabric Mesh for deploying containerized microservices, ensuring seamless orchestration and
management of containerized workloads.
Integrated Azure Service Fabric with other Azure services, such as Azure Storage, Azure Key Vault, and Azure Monitor, to
enhance application functionality and security.
Experienced in integration of various data sources (DB2-UDB, SQL Server, PL/SQL, Oracle, Teradata, XML and MS-
Access) into data staging area.
Deep expertise in Big Data technologies and the Hadoop ecosystem, including HDFS, Hive, Impala, Pig, YARN, Spark,
Python, PySpark.
Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.
Experience in Developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and
aggregation from multiple file formats for analyzing and transforming the data to uncover insights into the customer usage
patterns.
Used various file formats like Avro, Parquet, Sequence, Json, ORC and text for loading data, parsing, gathering, and
performing transformations.
Good experience in Hortonworks and Cloudera for Apache Hadoop distributions.
Designed and created Hive external tables using shared meta-store with Static & Dynamic partitioning, bucketing, and
indexing.
Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context,
Spark-SQL, Data Frame, pair RDD's.
Extensive hands-on experience tuning spark Jobs.
Scheduled Spark and Hive jobs using Apache Airflow to automate data processing tasks, ensuring timely execution and
delivery of data products.
Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test and deploy.
Experience in creating fact and dimensional model in MS SQL server and Snowflake Database utilizing Cloud base
Informatica ETL tool.
Experienced in working with structured data using HiveQL and optimizing Hive queries.
Familiarity with libraries like PySpark, Pandas, Star base, Matplotlib in python.
Writing complex SQL queries using joins, group by, nested queries.
Experience in HBase to load data using connectors and write queries using NOSQL.
Experience with solid capabilities in exploratory data analysis, statistical analysis, and visualization using R, Python, SQL,
and Tableau.
Running and scheduling workflows using Oozie and Zookeeper, identifying failures and integrating, coordinating, and
scheduling jobs.
Created shell script to run data stage jobs from UNIX and then schedule this script to run data stage jobs through scheduling
tool.
In - depth understanding of Snowflake cloud technology.
Hands on experience on Kafka and Flume to load the log data from multiple sources directly in to HDFS.
Widely used different features of Teradata such as BTEQ, Fast load, Multifood, SQL Assistant, DDL and DML commands
and very good understanding of Teradata UPI and NUPI, secondary indexes and join indexes.
TECHNICAL SKILLS
Big Data Technologies Hadoop, Map Reduce, HDFS, Sqoop, Hive, HBase, Flume, Kafka, Yarn, Apache Spark. Big
data.
Setting up AWS and, Spark Data Frame API, Data Extraction and Transformation and Load
(Databricks & Hadoop, Databrick Administration.
Databases Oracle, MySQL, SQL Server, MongoDB, Dynamo DB, Cassandra, Snowflake.
Tools PyCharm, Eclipse, Visual Studio, SQL*Plus, SQL Developer, SQL Navigator, SQL Server
Management Studio, Eclipse, Postman.
Version Control SVN, Git, GitHub, Maven
Operating Systems Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, OS
Visualization/ Reporting Tableau, ggplot2, matplotlib
Google Cloud Platform: GCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud Functions, Cloud
Pub/Sub.
Azure Azure Data Factory, Azure Virtual Machines, Azure Blob Storage, Azure Databricks, Azure
Data Lake, Azure SQL,
PROFESSIONAL EXPERIENCE
Environment: Azure Data Factory, Azure Virtual Machines, Resource Manager, Azure Blob Storage, Azure Databricks, Azure
Data Lake, Hive, Azure SQL, Cosmos DB, Azure Shell, Power BI, Cloudera 7.1.7