0% found this document useful (0 votes)
117 views4 pages

AnilNamdev GCP Cloud Engineer

Uploaded by

Raj jat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views4 pages

AnilNamdev GCP Cloud Engineer

Uploaded by

Raj jat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

ANIL NAMDEV

Senior Data Engineer

Email Id :- [email protected]

Executive Summary

• Over 8.5 years IT experience as a Senior Data Engineer on GCP Cloud and as well as on-premise Hadoop.
• Experience on Migrating SQL database to GCP and controlling and granting database access and migrating on
premise databases to GCP.
• Solid experience of GCP services such as Compute Engine, Kubernetes Engine, App Engine, Cloud Storage,
Cloud SQL, BigQuery, Pub/Sub, IAM
• Experience in Developing Spark applications using Spark - SQL for data extraction, transformation and
aggregation from multiple file formats for analysing & transforming the data to uncover insights into the
customer usage patterns.
• Technical expertise entails Data Warehouse, Enterprise Data Platform (EDP) and Big Data.
• Experience in building large scale batch and data pipelines with data processing frameworks in GCP cloud
platform using PySpark on Data Proc.
• Expertise with Amazon Web Services EC2, EMR, S3, Athena, Glue, Containers, API, IAM, VPC, Redshift, Serverless
components, AWS gateway, SQL, NOSQL, ML (basic), Analytics etc.
• Experienced in managing Azure Data Lake Storage (ADLS), Databricks Delta Lake and an understanding of how
to
• Experience in PySpark, Spark SQL to handle big data transformation and processing in GCP
• Migrating On premise databases to GCP.
• Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming,
Driver Node, Worker Node, Stages, Executors and Tasks.
• Designing and developing relational database objects; knowledgeable on logical and physical data modelling
concepts; some experience with Snowflake
• Experience in using agile methodologies including extreme programming, SCRUM and Test-Driven Development
(TDD).
• Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources,
perform transformations; perform read/write operations, save the results to output directory into HDFS.
• Hands on experience working on creating delta lake tables and applying partitions for faster querying.
• Experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name
Node Data Node and MapReduce programming paradigm.
• Good understanding of HDFS Designs, Daemons, HDFS High Availability (HA).
• In-depth Knowledge on Hadoop Cluster architecture and monitoring the cluster with understanding of Data
Structure and Algorithms.
• Hands on experience with data ingestion Spark streaming, flume and workflow management tools Oozie.
• Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-
versa.

Technical Skills
Programming Languages: Java, Pyspark, Scala, SQL
Big Data Ecosystems: Spark, Hadoop, HDFS, MapReduce, Hive, Impala, Sqoop, Oozie
Database MySQL, Oracle, Hive
NoSQL Databases: CosmosDB, Apache HBase, Cassandra
Streaming Frameworks: Kafka Streaming, Spark Streaming
ETL Tools Deloitte US
GCP Services
Azure Storage, Azure Data Factory, Azure DevOps, Databricks, Key Vaults,
GUI Design & UML Modelling Compute Engine, Kubernetes Engine, App Engine, Cloud Storage, Cloud SQL,
SDLC Methodologies BigQuery, Pub/Sub, IAM
Reporting Tools: Project Agile, Scrum, Waterfall, Kanban
Management Tools IDE: Power BI, Tableau, Excel
MS Project, MS SharePoint
Projects Databricks, Visual Studio Code, Jupyter, Anaconda, PuTTY

Senior Data Engineer (Dec 2022 - Present)

Responsibilities:

• Experience in building and architecting multiple Data pipelines, end-to-end ETL, and ELT processes for Data
ingestion and transformation in GCP.
• Devised simple and complex SQL scripts to check and validate Dataflow in various applications.
• Performed Data Analysis, Migration, Cleansing, Transformation, Integration, Data Import, and Data Export
through Python.
• Technical expertise in large-scale data lake and BI systems, with extensive hands-on knowledge of Apache
Spark and MPP data storage.
• Worked on JSON schema to define tables and column mapping from GCP Storage to Google Big Query and
used GCP Data Pipeline for configuring data loads from Storage to Big Query.
• Orchestrated data processing workflows by defining infrastructure components, such as compute instances
and storage resources, using Terraform.
• Demonstrated passion for working with large volumes of data and thriving in highly complex technical
contexts
• Performed data engineering functions: data extract, transformation, loading, and integration in support of
enterprise data infrastructures - data warehouse, operational data stores, and master data management.
• Responsible for data services and data movement infrastructures. Good experience with ETL concepts,
building ETL solutions, and Data modelling.
• Architected several DAGs (Directed Acyclic Graph) for automating ETL pipelines.
• Hands-on experience in architecting the ETL transformation layers and writing spark jobs to do the
processing.
• Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, writing SQL
queries, and writing applications)
• Imported data from AWS S3 into Spark RDD and performed actions/transformations on them.
• Created Partitions, Bucketing, and Indexing for optimization as part of Hive data modelling.
• Involved in developing Hive DDLs to create, alter, and drop Hive tables.
• Worked on different RDDs to transform the data coming from other data sources and transform data into
required formats.
• Created data frames in SPARK SQL from data in HDFS and performed transformations, analysed the data,
and stored the data in HDFS.
• Worked with Spark Core, Spark Streaming, and Spark SQL modules of Spark for faster processing of data.
• Developed Spark code and SQL for faster testing and processing of real-time data.
• Worked on Talend ETL to load data from various sources to Data Lake.
• Managed a team of Data engineers and Jr Data Engineers.
• Foster the professional growth of team members through mentorship, training, and career development
opportunities.
• Used Spark for interactive queries, processing of streaming data, and integrating with popular NoSQL
databases for a massive volume of data.
• Consumed the data from Kafka queue using Spark.
• Collaborate with data architects to design and maintain scalable and efficient data architectures and
infrastructure.
• Collaborate with stakeholders to define project goals, timelines, and resource requirements. Involved in
regular stand-up meetings, status calls, and Business owner meetings with stakeholders.

Centene Corp
Data Engineer (May 2019 – Dec 2022)

Responsibilities:

 Created and maintained a Data Pipeline architecture that is optimal.


 Responsible for loading data into GCP buckets from the internal server and the Snowflake data warehouse.
 Built the framework for efficient data extraction, transformation, and loading (ETL) from a variety of data
sources.
 Enjoys working in a fast-paced team, collaborates effectively with cross-functional teams to achieve common
goals
 Architecting, Designed and Developed End-to-End DW system using Talend, GCP Storage and Snowflakes at
enterprise scale.
 Using Amazon Web Services (Linux/Ubuntu), launch Amazon EC2 Cloud Instances and configure launched
instances for specific applications.
 Worked closely with the analytics team and developing high quality data pipelines in snowflake.
 Worked extensively on moving data from Snowflake to Cloud storage for the TMCOMP/ESD feeds.
 Used AWS Athena extensively to import structured data from storage bucket into multiple systems,
including RedShift, and to generate reports. For constructing the common learner data model, which obtains
data from Kinesis in near real time, we used Spark-Streaming APIs to perform necessary conversions and
operations on the fly.
 Worked extensively on SQL, Informatica, Mload, Fload, FastExportas needed to handle different scenarios.
 Used Python programming and SQL queries, for data file transformations from multiple sources.
 Created DAGs to automate the process using by Python schedule jobs in Airflow.
 Automated the creation of data-related services like databases, data lakes, and compute clusters using
Terraform scripts.
 Incorporated security best practices into Terraform scripts, ensuring the provisioning of data infrastructure
adheres to organizational security policies.
 Conducted Advanced activities such as text analytics and processing were performed using Spark's in-
memory computing capabilities.
 Created RDDs and data frames that are supported by Spark SQL queries that mix Hive queries with Scala
and Python programmable data manipulations.
 Analyzed Hive data using the Spark API in conjunction with the EMR Cluster Hadoop YARN.
 Worked on Enhancements to existing Hadoop algorithms using Spark Context, Spark-SQL, Data Frames, and
Pair RDDs
 Assisted with the creation of Hive tables and the loading and analysis of data using Hive queries.
 Conducted exploratory data analysis and data visualizations using Python (matplotlib, numpy, pandas,
seaborn).
Cisco Systems
Data Engineer (Feb 2016 – May 2019)

Responsibilities:

• Performed data cleaning, filtering and transformation to develop new data insights.
• Practiced database query languages and technologies (Oracle, SQL, and Python) to retrieve data.
• Gathered of business needs for data insights and analysis, creation of supportive visualizations and
preparation of data together with data engineers and architects.
• Provided platform and infrastructure support, including cluster administration and tuning.
• Acted as Liaise between Treasury lines of business and Technology team and other Data Management
analysts to communicate control status and escalate issues according to defined process
• Created and managed big data pipeline, including Pig/MapReduce/hive
• Installed and configuring Hadoop components on multiple clusters
• Worked collaboratively with users and application teams to optimize query and cluster performance
• Involved capacity management, cluster set up, structure planning, scaling and administration
• Collaborated with PMO Leads to provide updates on project timelines, deliverables and obstacles /
challenging
Education

GCP Cloud Associate Engineer Certified

Azure Data Engineer Certified

Bachelor’s in Technology, RGPV University

You might also like