0% found this document useful (0 votes)
20 views6 pages

1

Uploaded by

itconsultantus10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

1

Uploaded by

itconsultantus10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Sr.

Data Engineer

---------------------------------------------------------------------------------------------------------------------
----------
Professional Summary:
 Extensive experience in Information Technology with 10+ years of Hadoop/Bigdata
processing.
 Comprehensive working experience in implementing Big Data projects using
Apache Hadoop, Pig, Hive, HBase, Spark, Sqoop, Flume, Zookeeper, Oozie.
 Experience working on Hortonworks / Cloudera / Map R.
 Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics,
Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and
granting database access and Migrating On premise databases to Azure Data Lake store
using Azure Data factory.
 Hands-on experience in designing and implementing data engineering pipelines and
analyzing data using AWS stack like AWS EMR, AWS Glue, EC2, AWSLambda, Athena,
Redshift, Scoop andHive.
 Comprehensive experience in importing and exporting data using Sqoop between RDBMS
to HDFS.
 Had good understanding and hands on with Spark tools like RDD, Data frame, Dataset
and spark SQL.
 Excellent working knowledge of HDFS Filesystem and Hadoop Demons such as
Resource Manager, Node Manager, Name Node, Data Node, Secondary Name
Node, Containers etc.
 In depth understanding of Apache spark job execution Components like DAG, lineage
graph, DAG Scheduler, Task scheduler, Stages and task.
 Strong Experience in implementing Data warehouse solutions in Confidential Redshift.
Worked on various projects to migrate data from on premise databases to Confidential
Redshift, RDS and S3.
 Experience with developing User Defined Functions (UDFs) in Apache Hive using Java,
Scala, and Python.
 Skilled in Hadoop Architecture and ecosystem which includes HDFS, Job Tracker, Task
Tracker, Name Node, Data Node, YARN.
 Experience working on Spark and Spark Streaming.
 Experience with new Hadoop 2.0 architecture YARN and developing YARN Applications on it
 Worked on Performance Tuning to Ensure that assigned systems were patched, configured
and optimized for maximum functionality and availability. Implemented solutions that re-
duced single points of failure and improved system uptime to 99.9% availability
 Experience with distributed systems, large-scale non-relational data stores and
multi-terabyte data warehouses.
 Extensively worked on Python and build the custom ingest framework.
 Firm grip on data modeling, data marts, database performance tuning and NoSQL
map-reduce systems
 Extensive snowflake cloud data warehouse implementation on AWS.
 Experience in managing and reviewing Hadoop log files
 Real time experience in Hadoop/Big Data related technology experience in Storage,
Querying, Processing and analysis of data
 Worked on Data serialization formats for converting complex objects into sequence bits by
using Avro, Parquet, JSON, CSV formats.
 Expertise in extending Hive and Pig core functionality by writing custom UDFs and
UDAF’s.
 Designing and creating Hive external tables using shared meta-store instead of derby
with partitioning, dynamic partitioning and buckets.
 Worked with different File Formats like TEXTFILE, SEQUENCE FILE, AVROFILE, ORC, and
PARQUET for Hive querying and processing.
 Proficient in NoSQL databases like HBase.
 Experience in importing and exporting data using Sqoop between HDFS and Relational
Database Systems.
 Knowledge in Kafka installation & integration with Spark Streaming.
 Demonstrated a full understanding of the Fact/Dimension data warehouse design model,
including star and snowflake design methods.
 Hands-on experience building data pipelines using Hadoop components Sqoop, Hive,
Pig, MapReduce, Spark, Spark SQL.
 Loaded and transformed large sets of structured, semi structured and unstructured data in
various formats like text, zip, XML and JSON.
 Experience in designing both time driven and data driven automated workflows using Oozie.
 Good understanding of Zookeeper for monitoring and managing Hadoop jobs.
 Monitoring Map Reduce Jobs and YARN Applications.
 Strong Experience in installing and working on NoSQL databases like HBase, Cassandra.
 Work experience with cloud infrastructure such as Amazon Web Services (AWS) EC2 and
S3.
 Used Git for source code and version control management.
 Experience with RDBMS and writing SQL and PL/SQL scripts used in stored procedures.
 Proficient in Java, J2EE, JDBC, Collection Framework, JSON, XML, REST, SOAP Web services.
Strong understanding in Agile and Waterfall SDLC methodologies.
 Experience working both independently and collaboratively to solve problems and deliver
high-quality results in a fast-paced, unstructured environment.

Education Details:

Bachelor of Technology –Computer Science (2010, Osmania University, Hyderabad)


Master of Science in Data Analytics , Clark University, MA. (2022)

Certifications:
 DP 203: Data Engineering on Microsoft Azure
 AZ-900: Microsoft Azure Fundamentals

Client: Molina healthcare, Bothell, WA Aug 2023 – Present


Role: Sr. Data Engineer

Responsibilities:
 Worked closely with stake holders to understand business requirements to design quality tech-
nical solutions that align with business and IT strategies and comply with the organization's ar-
chitectural standards.
 Developed multiple applications required for transforming data across multiple layers of Enter-
prise Analytics Platform and implement Big Data solutions to support distributed processing
using Big Data technologies.
 Responsible for data identification and extraction using third-party ETL and data-
transformation tools or scripts. (e.g., SQL, Python)
 Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse An-
alytics (DW)&Azure SQL DB).
 Install and configure Apache Airflow for azure storage container and Snowflake data ware-
house and created dags to run the Airflow.
 Worked on Shell scripting in Linux environment.
 Built an ETL framework for Data Migration from on premise data sources such as Hadoop,
Oracle to Azure cloud using Apache Airflow, Apache Sqoop and Apache Spark.
 Part of databridge development team using python as programming language.
 Worked on flattening the JSON data so that it can be useful to downstream teams.
 Developed and managed Azure Data Factory pipelines that extracted data from various data
sources, transformed it according to business rules, using python scripts that utilized Pyspark
and consumed APIs to move data into an Azure SQL database.
 Created a new data quality check framework project in Python that utilized pandas.
 Implemented source control and development environments for Azure Data Factory pipelines
utilizing Azure Repos.
 Created Hive/Spark external tables for each source table in the Data Lake and written HiveSQL
and Spark SQL to parse the logs and structure them in tabular format to facilitate effective
querying on the log data.
 Designed and developed ETL & ETL frameworks using Azure Data Factory and Azure Data
Bricks.
 Flattening and transforming huge amounts of nested data in parquet and delta forms using
Spark SQL and the newest join optimization methods, then loading them into Hive, DeltaLake,
and Snowflake tables.
 Created generic data bricks NOTEBOOKs for performing data cleansing.
 Created Azure Data factory pipelines to refactor on-prem SSIS packages into Data factory
pipelines.
 Working with Azure BLOB and Data Lake storage for loading data into Azure SQL Synapse
(DW).
 Ingested and transformed source data using Azure Data flows and Azure HDInsight.
 Created Azure Functions to ingest data at regular intervals.
 Created Data Bricks notebooks for performing complex transformations and integrated them
as activities in ADF pipelines.
 Loading data into snowflake tables from internal stage using snowSQL.
 Written complex SQL queries for data analysis and extraction of data in required format.
 Created Power BI DataMart’s and reports for various stakeholders in the business.
 Created CI/CD pipelines using Azure DevOps.
 Enhanced the functionality of existing ADF pipeline by adding new logic to transform the data.
 Worked on Spark jobs for data preprocessing, validation, normalization, and transmission.
 Optimized code and configurations for performance tuning of Spark jobs.
 Worked with unstructured and semi structured data sets to aggregate and build analytics on
the data.
 Work independently with business stakeholders with strong emphasis on influencing and col-
laboration.
 Daily participation in Agile based Scrum team with tight deadlines.

Environment: Azure Synapse Analytics, Azure Data Factory, Azure Data bricks, Hadoop, SQL
Server, Delta Lake, Power BI,SnowSql, snowflake, Oracle 12c/11g, SQL scripting, PL/SQL, Python,
Unix Shell, Jira, Confluence.

Client: First Republic Bank, San Francisco, CA Aug 2022– July 2023
Sr. Data Engineer

Responsibilities:
 Developed scalaSpark pipelines which transforms the raw data from several formats to
parquet files for consumption by downstream systems.
 Developed scripts using Spark which are used to load the data from Hive to Amazon RDS
(Aurora) at a faster rate.
 Experience developing Scala applications for loading/streaming data into NoSQL databases (MongoDB)
and HDFS.
 Used AWS Glue services like crawlers and ETL jobs to catalog all the parquet files and make
transformations over data according to the business needs.
 Developed and managed pipelines that extracted data from various data sources,
transformed it according to business rules, using scala scripts that utilized scalaspark and
consumed APIs to move data into an AWS SQL database
 Worked with AWS services like S3, Glue, EMR, SNS, SQS, Lambda, EC2, RDS and Athena to
process data for the downstream customers.
 Created libraries and SDKs which will be helpful in making JDBC connections to hive
database and query the data using Play framework and various AWS services.
 Created views on top of data in Hive which will be used by the application using Spark SQL.
 Applied security on data using Apache Ranger to set row level filters and group level policies
on data.
 Experience building reusable ETL components using Postgres and snowflake.
 Normalized the data according to the business needs like data cleansing, modifying the
datatypes and various transformations using Spark, Scala and AWS EMR.
 Worked on creating the CI/CD pipelines using tools like Jenkins and Rundeck which will be
responsible for scheduling the daily jobs.
 Developed Sqoop jobs which will be responsible for importing the data from Oracle to AWS
S3.
 Developed a utility which transforms and exports the data from AWS S3 to AWS glue and
sends alerts and notifications to downstream systems (AI and Data Analytics) once the data
is ready for usage.
 Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs,
Python and Scala.
 Used import and export from internal stage (snowflake) from external stage (AWS S3).
 Developed pipelines for auditing the metrics of all applications using AWS Lambda, Kinesis
Firehoses.
 Worked extensively on writing triggering Snowpipe, Snowflake data loads automatically using Amazon
SQS (Simple Queue Service) notifications for an S3 bucket.
 Developed end to end pipeline which exports the data from parquet files in S3 to Amazon
RDS.
 Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks
running on Amazon SageMaker.
 Worked on optimizing performance of Hive queries using Hive LLAP and various other
techniques.

Environment: Spark, Scala, Hadoop, Hive, Sqoop, Redshift, Lambda, RDS, Play framework,
Apache Ranger, S3, EMR, EC2, SNS, SQS, Lambda, Sagemaker, Zeppelin, snowflake, Kinesis,
Athena, Jenkins, Rundeck and AWS Glue.

Client: Reliance Industries Limited, Hyd. Nov 2018 to nov 2021


Role: Data Engineer
Responsibilities:
 Involved in the high-level design of the Hadoop architecture for the existing data structure
and Problem statement and setup the Multi-Node cluster and configured the entire Hadoop
platform.
 Extracted files from MySQL, Oracle, and Teradata through Sqoop and placed in HDFS
Distribution and processed.
 Designed and built a Data Discovery Platform for a large system integrator using Azure
HdInsight components. Used Azure data factory and data Catalog to ingest and maintain
data sources. Security on Hd Insight was enabled using Azure Active directory.
 Worked with various HDFS file formats like Avro, Parquet, ORC, Sequence File, Json and
various compression formats like Snappy, bzip2,Gzip.
 Developed efficient MapReduce programs for filtering out the unstructured data and
developed multiple MapReduce jobs to perform data cleaning and preprocessing.
 Developed the Hive UDF's to pre-process the data for analysis and Migrated ETL operations
into Hadoop system using Pig Latin scripts and Python Scripts.
 Used Hive to do transformations\event joins, filtering and some pre-aggregations before
storing the data into HDFS.
 Flattening and transforming huge amounts of nested data in parquet and delta forms using
Spark SQL and the newest join optimization methods, then loading them into Hive,
DeltaLake, and Snowflake tables.
 Developed bash scripts to automate the data flow by using different commands like awk,
sed, grep, xargs, exec and integrated the scripts with YAML.
 Developed Hive queries for data sampling and analysis to the analysts.
 Loaded data into the cluster from dynamically generated files using Flume and from
relational database management systems using Sqoop.
 Developed Bash Script and python modules to convert mainframe fixed width source file to
delimited file.
 Experienced in running Hadoop streaming jobs to process terabytes of formatted data using
Python scripts.
 Created workflows on Talend to extract data from various data sources and dump them into
HDFS.
 Designing ETL data pipeline flow to ingest data from RDMS source to HDFS using Shell
script.
 Created HBase tables from Hive and Wrote HiveQL statements to access HBase table’s data.
 Used Hive to perform data validation on the data ingested using scoop and flume and the
cleansed data set is pushed into HBase.

Environment: Hadoop (Cloudera), HDFS, Map Reduce, Hive, Scala, snowflake, Pig,
Sqoop,Azure, DB2, UNIX Shell Scripting, JDBC.

Client: Hexaware Technologies,India Oct 2015 to Aug 2018


Role: Hadoop Engineer
Responsibilities:
 Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RedShift which
provides fast and efficient processing of Big Data.
 Created data lake on amazon s3
 Implemented scheduled downtime for non-prod servers for optimizing AWS pricing.
 Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc
analysis
 Very good understanding of Partitions, bucketing concepts in Hive and designed
both Managed and External tables in Hive to optimize performance
 Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and
aggregation and how does it translate to MapReduce jobs.
 Developed UDFs in Java as and when necessary to use in PIG and HIVE queries
 Experience in using Sequence files, RC File, AVRO and HAR file formats.
 Developed Oozie workflow for scheduling and orchestrating the ETL process
 Worked on Performance Tuning to Ensure that assigned systems were patched, configured
and optimized for maximum functionality and availability. Implemented solutions that
reduced single points of failure and improved system uptime to 99.9% availability.
 Written MapReduce programs in Python with the Hadoop streaming API.
 Worked on a live 90 nodes Hadoop cluster running CDH4.4
 Worked with highly unstructured and semi structured data of 90 TB in size (270 TB)
 Extracted the data from Teradata into HDFS using Sqoop.
 Worked with Sqoop (version 1.4.3) jobs with incremental load to populate Hive External
tables.
 Extensive experience in writing Pig (version 0.10) scripts to transform raw data from several
data sources into forming baseline data.
 Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
 Worked on Hive for exposing data for further analysis and for generating transforming files
from different analytical formats to text files.
 Imported data from MySQL server and other relational databases to Apache Hadoop with the
help of Apache Sqoop.
 Creating Hive tables and working on them for data analysis to meet the business
requirements.

Environment: Hadoop, MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g,
DB2, Teradata, MySQL, Eclipse, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, SOLR.

Client: MetLife, Hyderabad, India May 2012 to Sep 2015


Role: SQL Developer

Responsibilities:
 Experience developing Scala applications for loading/streaming data into NoSQL databases
(MongoDB) and HDFS.
 Perform T-SQL tuning and optimizing queries for and SSIS packages.
 Designed Distributed algorithms for identifying trends in data and processing them effec-
tively.
 Creating an SSIS package to import data from SQL tables to different sheets in Excel.
 Used Spark and Scala for developing machine learning algorithms that analyze clickstream
data.
 Used Spark SQL for data pre-processing, cleaning, and joining very large data sets.
 Performed data validation with Redshift and constructed pipelines designed over 100TB per
day.
 Co-developed the SQL server database system to maximize performance benefits for clients.
 Assisted senior-level data scientists in the design of ETL processes, including SSIS packages.
 Database migrations from traditional data warehouses to spark clusters.
 Ensure the data warehouse was populated only with quality entries by performing regular
cleaning and integrity checks.
 Used Oracle relational tables and used them in process design.
 Developed SQL queries to perform data extraction from existing sources to check format ac-
curacy.
 Developed automated tools and dashboards to capture and display dynamic data.
 Installed a Linux operated Cisco server and performed regular updates and backup and used
MS excel functions for data validation.
 Coordinated data security issues and instructed other departments about secure data trans-
mission and encryption.
Environment: T-SQL, MongoDB, HDFS, Scala, Relational Databases, SSIS, SQL, Linux, Data
Validation, MS Excel, Agile Methodology.

You might also like