0% found this document useful (0 votes)
42 views8 pages

Ankit Data Engineer Resume

Ankit Kumar is an experienced IT professional with over 5 years in data engineering, specializing in Big Data technologies and cloud services such as AWS and Azure. He has extensive hands-on experience with various databases, data migration, ETL processes, and data visualization tools, having worked with clients like Univar Solutions and Country Financial. His technical skills include Python, SQL, Spark, and various cloud platforms, enabling him to design and implement complex data pipelines and analytics solutions.

Uploaded by

ankit singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views8 pages

Ankit Data Engineer Resume

Ankit Kumar is an experienced IT professional with over 5 years in data engineering, specializing in Big Data technologies and cloud services such as AWS and Azure. He has extensive hands-on experience with various databases, data migration, ETL processes, and data visualization tools, having worked with clients like Univar Solutions and Country Financial. His technical skills include Python, SQL, Spark, and various cloud platforms, enabling him to design and implement complex data pipelines and analytics solutions.

Uploaded by

ankit singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ANKIT KUMAR

[email protected]
+91 7703802034

PROFESSIONAL SUMMARY

• Over 5 years of professional experience in IT, working with various Legacy


Database systems, which include work experience in Big Data technologies as
well.
• Experienced with Big Data link technologies such as GCP, Amazon
Warehouse Services (AWS), Microsoft Azure, Cassandra, HIVE, No-SQL
databases (like HBase, MongoDB), SQL databases (like Oracle, SQL,
Postgres SQL, My SQL server, Snowflake).
• Hands-on Experience with Amazon Web Services (Amazon EC2, Amazon
S3, Amazon RDS, Amazon Elastic Load Balancing, Amazon SQS, AWS
Identity and access management, Amazon SNS, Cloud Watch, Amazon
Elastic Block Store (EBS), Amazon CloudFront, VPC, DynamoDB,
Lambda, Redshift, and other services of AWS.
• Developed and deployed various Lambda functions in AWS with in-built AWS
Lambda Libraries.
• Experience in analyzing data using Big Data Ecosystem including HDFS,
Hive, HBase, Zookeeper, PIG, Sqoop, and Flume.
• Strong experience on migrating other data bases to Snowflake.
• Knowledge and working experience on big data tools like Hadoop, Azure
Data Lake, AWS Redshift
• Good experience of software development in Python (libraries Beautiful
Soup, NumPy, SciPy, Panda’s data frame, Matplotlib, network,
urllib2, MySQL dB for database connectivity) and IDEs - sublime text,
Spyder, PyCharm, Visual Studio Code
• Job Workflow Scheduling and monitoring using tools like Apache Airflow,
Oozie, Corn and IBM Tivoli.
• In depth Knowledge of Snowflake Database, Schema and table structure.
• Good Experience in Cloudera, Hortonworks, Apache Hadoop Distribution.
• Experience in workflow scheduling with Airflow, AWS Data Pipelines,
Azure, SSIS, etc.
• Experience on Migrating SQL database to Azure data Lake, Azure data
lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data
warehouse and controlling and granting database access and Migrating On
premise databases to Azure Data Lake store using Azure Data factory.
• Acumen on Data Migration from Relational Database to Hadoop
Platform using Sqoop.
• Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors,
and Functions on SQL, SQLite and PostgreSQL database.
• Good understanding of Big Data Hadoop and Yarn architecture along with
various Hadoop Demons such as Job Tracker, Task Tracker, Name Node,
Data Node, Resource/Cluster Manager, and Kafka (distributed stream-
processing).
• Experience in Text Analytics, Data Mining solutions to various business
problems and generating data visualizations using Python.
• Experience in Developing Spark applications using Spark - SQL in
Databricks for data extraction, transformation, and aggregation from
multiple file formats for analyzing & transforming the data to uncover insights
into the customer usage patterns.
• Good understanding of Spark Architecture including Spark Core, Spark SQL,
Data Frames, Spark Streaming, Driver Node, Worker Node, Stages,
Big Data Ecosystem HDFS, MapReduce, Pyspark, Hive, Airflow, Sqoop, HBase.
Hadoop Microsoft Azure - Databricks, Data Lake, Blob Storage,
Distributions Azure Data Factory, SQL Database, SQL Data Warehouse.
Amazon AWS - EMR, EC2, EBS, RDS, S3, Athena, Glue,
Elasticsearch, Lambda, SQS, DynamoDB, Redshift, ECS.
Apache Hadoop 2.x/1.x
Scripting Languages Python, JavaScript, R, PowerShell Scripting, HiveQL,perl
Cloud Environment Amazon Web Services (AWS), Microsoft Azure.
No SQL Database DynamoDB, HBase

Database MySQL, Oracle, Teradata, MS SQL SERVER.


ETL/BI Snowflake, SSIS, Power BI
Operating Systems Linux (Ubuntu, Centos, RedHat), Windows,Unix
Version Control Git, Bitbucket
Others Jupyter Notebook, Kubernetes, Jenkins, Jira
Executors and Tasks.
• Strong experience and knowledge of NoSQL databases such as MongoDB
and Cassandra.
• Experience in development and support knowledge on Oracle, SQL, PL/SQL.
• Solid Excellent experience in creating cloud-based solutions and architecture
using Amazon Web services (Amazon EC2, Amazon S3, Amazon RDS)
and Microsoft Azure.

Technical Skills:

Work Experience:

Client: Univar Solutions


Location: The woodlands, Tx
Role: Data Engineer
Dec2021-Till now
Responsibilities:

• Collaborating with the business for requirement gathering for data warehouse &
reporting.
• Extracting, transforming, and loading data from different sources to Azure Data
Storage Services using Azure data factory, t-SQL to perform data lake analytics.
• Working on data transformations for ML OPs - adding calculated columns,
managing relationships, creating different measures, merging & appending
queries, replacing values, split columns, grouping by, Date & Time Column.
• Data Ingestion to Azure Services - Azure Data Lake, Azure Storage, Azure SQL,
Azure DW, and processing the data in Azure Databricks.
• Creating batches and sessions to move data at specific intervals and on-demand
using Server Manager.
• Blended multiple data connections and created multiple joins across the various
data sources for data preparation.
• Extracted data from Data Lakes, EDW to relational databases for analyzing
and getting more meaningful insights using SQL Queries and PySpark.
Developed PL/SQL scripts to extract data from multiple data sources and
transform them into a format that can be easily analyzed.
• Developing Python scripts to do file validations in Databricks and automated the
process using ADF.
• For data processing developed JSON Scripts for deploying the Pipeline in Azure
Data Factory (ADF) using the SQL activity.
• Supported production data pipelines including performance tuning and
troubleshooting of SQL, Spark, and Python scripts.
• Developing audit, balance, and control framework using SQL DB audit tables to
control the ingestion, transformation, and load process in Azure.
• Creating tables in Azure SQL DW for data reporting and visualization for business
requirements.
• Creating visualization reports, dashboards, and KPI scorecards using Power BI
desktop.
• Designing, developing, and deploying ETL solutions using SQL Server
Integration Services (SSIS).
• Connecting various applications to the existing database, and create databases,
and schema objects including indexes and tables by writing various functions,
stored procedures, and triggers.
• For Query optimization and fast query retrieval performed Normalization and
De-Normalization of existing tables, with the effective use of Joins & indexes.
• Creating alerts on data integration events (success/failure) and monitored
them.
• Collaborating with product managers, scrum masters, and engineers to
develop Agile practices and documentation initiatives to bring experience for
retrospectives, backlog, and meetings.

Client: Country Financial


Location: Bloomington, Illinois
Role: Sr. Spark Developer
Sep2020-Dec2021

Responsibilities:
• Designed a data pipeline to automate the ingestion, processing, and delivery
by processing batch and streaming data using Spark, AWS EMR Clusters,
Lambda, and Databricks.
• Developed Airflow automation and Python scripts for batch data processing,
ETL, and data warehouse ingestion using AWS Lambda Python functions, Elastic
Kubernetes Service (EKS), and S3.
• Data ingestion into a data lake(S3) and used AWS Glue to expose the data to
Redshift.
• Configured EMR cluster for data ingestion and used dbt (data build tool) to
transform the data in Redshift.
• Run batch processing to calculate the risk associated and to generate several
feeds to other systems such as Discounted cash flow (DCF), PNL, and Europe
credit platform for Pricing strategy.
• Wrote & tested SQL code for transformations using the data build tool.
• Designed and developed a data architecture to load data from AWS S3 to
Snowflake via Airflow by creating DAGs and processed data for Data Visualization
Tools.
• Worked on creating data pipelines with Airflow to schedule AWS jobs for
performing incremental loads and used Flume for weblog server data.
• Scheduled Airflow jobs to automate the ingestion process into the data lake using
Apache Airflow in a cluster.
• Evaluate snowflake Design considerations for any change in the application.
• Developed PL/SQL procedure to load data into a data warehouse.
• Wrote Python scripts and used Airflow DAGs to automate the process of
extracting weblogs.
• Developed and implemented Hive Bucketing and Partitioning.
• Loaded data into S3 buckets using AWS Glue and PySpark. Snowflake Involved
in filtering data stored in S3 buckets using Elasticsearch and loaded data into
Hive external tables.
• Worked on financial spreading by developing scalable applications for real-time
ingestions into various databases using AWS Kinesis and performed necessary
transformations and aggregation to build the common learner data model and
store the data in HBase.
• Orchestrated multiple ETL jobs using AWS step functions and Lambda and used
AWS Glue to load and prepare data Analytics for customers.
• Worked on AWS Lambda to run servers without managing them and to trigger
run code by S3 and SNS.
• Developed data transition programs from DynamoDB to AWS Redshift (ETL
Process) using AWS Lambda by creating functions in Python for certain events
based on use cases.
• Implemented the AWS cloud computing platform by using RDS, Python,
DynamoDB, S3, and Redshift.
• Worked in Developing Spark applications using Spark - SQL in Databricks for
data extraction, transformation, and aggregation from multiple file formats for
analyzing & transforming the data to uncover insights into the customer usage
patterns.
• Worked with various formats of files like delimited text files, clickstream log
files, Apache log files, Avro files, JSON files, and XML Files.
• Mastered in using different columnar file formats like RC, ORC, and Parquet
formats.
• Performed Database activities such as Indexing, and performance tuning.
• Collected data using Spark Streaming from AWS S3 bucket in near-real-time and
performed necessary transformations on the fly to build the common learner data
model.
• Responsible for loading and transforming huge sets of structured, semi-
structured, and unstructured data.
• Used AWS EMR clusters for creating Hadoop and spark clusters and these clusters
are used for submitting and executing Python applications in production.
• Designed and developed end-to-end ETL processing from Oracle to AWS using
Amazon S3, EMR, and Spark.
• Worked on CI/CD solution, using Git, Jenkins, Docker, and Kubernetes to set
up and configure big data architecture on AWS cloud platform.
• Written SQL Scripts and PL/SQL Scripts to extract data from the database to meet
business requirements and for Testing Purposes.

Client: Intuit
Location: Plano, Texas
Role: Bigdata developer
JUNE2019-Aug2020

Responsibilities:

• Written Map-Reduce code to process all the log files with rules defined in
HDFS (as log files generated by different devices have different xml rules).
• Involved in porting the existing on-premises Hive code migration to GCP
(Google Cloud Platform) Big Query.
• Involved in migration an Oracle SQL ETL to run on Google cloud platform
using cloud Data processing & Big Query, cloud pub/sub for triggering the
Apache Airflow jobs.
• Developed and designed application to process data using Spark.
• Developed MapReduce jobs, Hive & PIG scripts for Data warehouse migration
project.
• Developed stored procedures/views in snowflake and use in Talend for loading
Dimensions and Facts.
• Developed and designed system to collect data from multiple portals
using Kafka and then process it using spark.
• Developing MapReduce jobs, Hive & PIG scripts for Risk & Fraud Analytics
platform.
• Developed Data ingestion platform using Sqoop and Flume to ingest Twitter
and Facebook data for Marketing & Offers platform.
• Developed and designed automate process using shell scripting for data
movement and purging.
• Developed programs in JAVA, Scala-Spark for data reformation after
extraction from HDFS for analysis.
• Developed ETL pipelines in and out of data warehouse using combination of
Python and
Snowflake’s Snow SQL.
• Participated in the development improvement and maintenance of snowflake
database application.
• Written Hive jobs to parse the logs and structure them in tabular format to
facilitate effective querying on the log data.
• Importing and exporting data into Impala, HDFS and Hive using Sqoop.
• Responsible to manage data coming from different sources.
• Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient
data access.
• Developed Hive tables to transform, analyze the data in HDFS.
• Involved in creating Hive tables, loading with data and writing hive queries
which will run internally in map way.
• Developed Simple to Complex Map Reduce Jobs using Hive and Pig.
• Involved in running Hadoop Jobs for processing millions of records of text data.
• Developed the application by using the Struts framework.
• Created connection through JDBC and used JDBC statements to call stored
procedures.
• Developed Pig Latin scripts to extract the data from the web server output files
to load into HDFS.
• Developed the Pig UDF’S to pre-process the data for analysis.
• Implemented multiple Map Reduce Jobs in java for data cleansing and pre-
processing.
• Moved all RDBMS data into flat files generated from various channels to HDFS
for further processing.
• Developed job workflows in Oozie to automate the tasks of loading the data
into HDFS.
• Handled importing of data from various data sources, performed transformations
using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata
into HDFS using Sqoop.
• Writing the script files for processing data and loading to HDFS.

Client: Novartis
Location: Parsippany, New Jersey
Role: Hadoop Developer

Responsibilities:

• Writing the script files for processing data and loading to HDFS.
• Processed data into HDFS by developing solutions.
• Analyzed the data using Map Reduce, Pig, Hive and produce summary results from
Hadoop to downstream systems.
• Developed data pipeline using flume, Sqoop and pig to extract the data
from weblogs and store in HDFS.
• Used Pig as ETL tool to do transformations, event joins and some pre-
aggregations before storing the data onto HDFS.
• Build pipelines using Unix to connect different tools and it is used to extract data
from a database ,transform the data and load the data into a data warehouse.
• Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
• Exported the analyzed data to the relational database MySQL using Sqoop for
visualization and to generate reports.
• Created HBase tables to load large sets of structured data.
• Managed and reviewed Hadoop log files.
• Involved in providing inputs for estimate preparation for the new proposal.
• Worked extensively with HIVE DDLs and Hive Query language (HQLs).
• Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
• Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
• Created Map Reduce Jobs to convert the periodic of XML messages into a
partition Avro Data.
• Used Sqoop widely to import data from various systems/sources (like MySQL)
into HDFS.
• Created components like Hive UDFs for missing functionality in HIVE for
analytics.
• Used different file formats like Text files, Sequence Files, Avro.
• Cluster co-ordination services through Zookeeper.
• Assisted in creating and maintaining technical documentation to launching
HADOOP Clusters and even for executing Hive queries and Pig Scripts.
• Assisted in Cluster maintenance, cluster monitoring, adding, and removing
cluster nodes and Trouble shooting.
• Installed and configured Hadoop, Map Reduce, HDFS, developed multiple Map
Reduce jobs in java for data cleaning and pre-processing.

Client: Concentrix technology


Location: Bangalore, India
Role: Data Analyst
JUNE -2019

Responsibilities:

• Designed & built reports, processes, and analyses with a variety of business
intelligence tools &
Technologies.
• Transformed data into meaningful insights from various data sources to support
the development of global strategy and initiatives.
• Involved in requirements gathering, source data analysis, identified business
rules for data migration, and for developing data warehouse/data mart.
• Collected data using SQL Script, created reports using SSRS and used Tableau for
data visualization and custom reports analysis.
• Created reports in tab
• Performed Exploratory Data analysis (EDA) to find and understand interactions
between different fields in the dataset, handling missing values, detecting
outliers, data distribution, and extracting important variables graphically.
• Worked on python library - NumPy, Pandas, SciPy for data wrangling and analysis,
while visualization libraries of Python using Matplotlib for graphs plotting.
• Performed data collection, cleaning, wrangling, analysis, and building machine
learning models on the data sets in both R and Python.
• Used Agile methodologies to emphasize face-to-face communication and that
iteration are passing through full SDLC.

You might also like