0% found this document useful (0 votes)
64 views6 pages

Guru Data Resume

This document provides a resume for a data engineer with over 7 years of experience working with big data technologies like Spark, Hadoop, Hive, and cloud platforms. The resume lists technical skills and work experience at Bank of America and Lyft developing data pipelines, performing ETL, and building analytics dashboards.

Uploaded by

akumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views6 pages

Guru Data Resume

This document provides a resume for a data engineer with over 7 years of experience working with big data technologies like Spark, Hadoop, Hive, and cloud platforms. The resume lists technical skills and work experience at Bank of America and Lyft developing data pipelines, performing ETL, and building analytics dashboards.

Uploaded by

akumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Guru Sai

Phone Number: (469) 320-1096


Email: [email protected]
Data Engineer | AWS | AZURE | Bigdata | DevOps | Python| SQL

Professional Summary
• 7+ years of expertise as a data engineer in a variety of businesses and areas like banking and
payments.
• Worked with a variety of technological stacks, including Big Data, and Clouds such as Hive, Spark,
Oozie, Kafka, HBase, Couchbase, Scala, Python, Azure, and AWS.
• Expert in working on projects like migration, application development, ETL developments, and
building data ingestion pipelines.
• I also have experience working in cloud technologies such as Azure, AWS and the specific ser-
vices I have used are Databricks, Azure Delta Lake Gen 2, Azure Data Factory, HD Insight, S3,
Glue, Step Functions, Athena, EMR, EC2, etc.
• I am a data engineer with leadership abilities who are experienced, results-oriented, resource-
ful, and problem-solving.
• Highly motivated Bigdata/Hadoop engineer with 7+ years of expertise in analytics, design, pro-
gramming, integration, and testing.
• Experience working with Hadoop distributions such as Hortonworks, Cloudera, and MapR.
• Using Python and Scala, built spark data pipelines with various optimization techniques.
• Experience with Bigdata, Python, and Scala technologies for data processing pipelines, data in-
gestion, cloud apps, and Dev-ops.
• Worked with tools like Sqoop, Talend, and Spark to transfer data between HDFS and RDBMS.
• Expert in ingesting data for incremental loads from various RBMS tools using Apache Sqoop.
• Using Apache Kafka developed scalable applications for real-time ingestions into various data-
bases.
• Developed Pig Latin scripts and MapReduce jobs for large data transformations and Loads.
• Worked on using optimized data formats like ORC, Parquet, and Avro.
• Experience in building optimized ETL data pipelines using Apache Hive and Spark.
• Implemented various optimizing techniques in Hive and Spark scripts for data transformations.
• Experience in building ETL scripts in Impala and Kudu for faster access to reporting layer.
• Experience in NoSQL databases like HBase and Cassandra and Mongo DB.
• Developed various automation flows using Apache Oozie and Airflow.
• Worked in various integration tools like Talend, and Nifi.
• Experience in working with various cloud distributions like AWS, Azure, and GCP.
• Created various ETL applications using Databricks Spark distributions and Notebooks.
• Implemented streaming applications to consume data from Event Hub and Pub/Sub.
• Developed various scalable big data applications in Azure HDInsight for ETL services.
• Experience in using Azure cloud tools like Azure data factory, Azure Data Lake, and Azure Synap-
sis.
• Developed scalable applications using AWS tools like Redshift, and DynamoDB.
• Worked on building pipelines using snowflake for extensive data aggregations.
• Involved in Migrating Objects from Teradata to Snowflake.
• Experience with GCP tools like Big Query, Pub/Sub, Cloud SQL, and Cloud functions.
• Using Power BI custom dashboards for reporting purposes and daily incremental data applica-
tions.
• Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka
as a data pipe-line system.
• Experience in building continuous integration and deployments using Jenkins, Drone, and Travis
CI.
• Expert in building containerized apps using tools like Docker, Kubernetes, and Terraform.
• Experience in building metrics dashboards and alerts using Grafana and Kibana.
• Expert in Java & Scala developed tools like Maven, Gradle, and SBT for application development.
• Experience in working with Unit testing using Junit, Scala Test, Spock, and Easy Mock.
• Experience in working with tools like GitHub, GitLab, and SVN for code repositories.

Technical Skills

Bigdata Ecosystem HDFS, Map Reduce, YARN, Hive, Pig, HBase, Impala, Zookeeper, Sqoop,
Oozie, Tez, Flume, Spark, Solr.
Cloud Environment AWS, Azure, and GCP.
NoSQL HBase, Cassandra, Mongo DB.
Databases Oracle 11g/10g, Teradata, DB2, MS-SQL Server, MySQL, MS-Access.
Programming Languages Scala, Python, SQL, Java, PL/SQL, Linux shell scripts.
BI Tools Tableau, Power BI, Apache Superset.
Alerting &Logging Grafana, Kibana.
Automation Airflow, NiFi, Oozie

Professional Experience:
Bank of America, Dallas, Tx May 2023 – Now
SR. Data Engineer
Responsibilities:
• Implemented Spark Scala applications for data transformations and to optimize query execution.
• Created unit tests for existing and new Python code using PyTest framework.
• Developed multi-threaded programs by leveraging threading or multiprocessing modules of
Python.
• Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection,
permission checks and analysis.
• Created Hive queries that helped market analysts spot emerging trends by comparing fresh data
with reference tables and historical metrics.
• Extensively involved working on Hive, created the Hive tables and loaded data consuming event
data from Kafka using Spark Streaming.
• Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server.
• Recommended improvements to facilitate team and project workflow.
• Leveraged Agile methodologies to move development lifecycle rapidly through initial prototyp-
ing to enterprise-quality testing and final implementation.
• Automated processes using scripting languages such as JavaScript(TypeScript), Python or Bash.
• Enhanced performance of MapReduce jobs by automating optimization based on job log analysis
and parameter tuning.
• Wrked on developing a schema discovery API for front-end tasks using TypeScript, which in-
volved creating a dynamic data definition language (DDL) generation system.
• Developed and executed a suite of unit tests for a TypeScript codebase using Mockito to mock
dependencies and ensure code integrity and reliability.
• Developed Admin API to manage and inspect topics, brokers, and other Kafka objects.
• Published interactive data visualizations dashboards, reports /workbooks on Looker.
• Engineered continuous integration and continuous deployment (CI/CD) pipelines utilizing Jenkins
and Docker to streamline the build automation process for various Python applications, orches-
trating the deployment of the entire project as distribution files and overseeing the complete de-
ployment lifecycle for select applications.
• Created Splunk dashboard to capture the logs for end-to-end process of data ingestion.
• Engineered advanced Ansible templates and orchestrated playbooks for the seamless deploy-
ment of code from JFrog Artifactory distribution assets onto a distributed network of edge com-
puting nodes/Vip's.
• Making Sure the project is deployed and working on Multiple lanes like prd/preprod/dev/uat
and lower lane setup for a project called AUTOSOURCING.
• Leveraged the Python programming language to develop bespoke automation scripts for routine
operations such as file sourcing, email sourcing, and extracting data from upstream sources to
populate target tables.
• Maintained the Data Lake in Hadoop by building data pipeline using Sqoop, Hive and PySpark.
• Created CI and CD pipelines with Jenkins and Docker to automate the build process of applica-
tions.
• Used SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and
handled structured data using SparkSQL.
• Developed Producer API and Consumer API to publish and subscribe to stream of events in one
or more topics.
• Design and manage Kubernetes clusters to orchestrate containerized workloads efficiently.
• Developed Spark Applications by using Scala and Implemented Apache Spark data processing
project to handle data from various RDBMS and Streaming sources.
• Crafted a Bash script to monitor email inboxes for incoming files, process these files based on
specific file and sheet names, and alert the technical team about the successful data transfer to
target tables, while also generating error logs in case of processing failures.
• Used Docker and Kubernetes to manage micro services for development of continuous integra-
tion and continuous delivery.
• Created complex data pipelines for ingestion of structured and unstructured data into HDFS.
• Installed and configured the Hadoop framework on Linux servers.
• Implemented Kerberos authentication to enhance security protocols in a Hadoop cluster.

Lyft, San Francisco, CA Oct. 21 – April 2023


SR. AWS Data Engineer
Responsibilities:
• Implemented Spark Scala applications for data transformations and to optimize query execution.
• Responsible for developing Python wrapper scripts that will extract specific date range using
Sqoop by passing custom properties required for the workflow.
• Develop various analytical queries on top of the Hive database to generate reports for business
needs.
• Worked on the migration of RDMS data into Data Lake applications.
• Implemented Optimization techniques while using Hive like Partitioning and Bucketing etc.
• Develops applications to import and export between cloud services like Amazon S3.
• Developed ingestion pipelines for pulling data from AWS S3 buckets to HDFS for further analyt-
ics.
• Developed Lambda functions to trigger ETL jobs across AWS tools.
• Creating Athena glue tables on existing CSV data using AWS crawlers.
• Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from
multiple source systems which includes loading nested JSON formatted data into snowflake table
Developed microservices to consume data from the REST endpoint and load into Kafka topic.
• Working on AWS Data Pipeline to set up a pipeline to ingest data from Spark and migrate to
Snowflake Database.
• Developed MapReduce jobs in java for data cleaning and preprocessing.
• Importing and exporting data into HDFS and Hive using Sqoop.
• Experienced in building streaming jobs to process terabytes of xml format data using Flume.
• Worked on batch data ingestion using Sqoop from various sources like Teradata, and Oracle.
• Worked on various pig Latin scripts for data transformations and cleansing.
• Involved in creating Hive tables, loading with data, and writing hive queries.
• Implemented multiple Impala scripts for exposing it into Tableau.
• Administered and Provided L3 database support for 50+ production Oracle 11g databases.
• Developed Spark Streaming application to consume from Kafka topic and write into Hive table.
• Automated data pipelines using a scheduler like Apache Oozie.
• Experience in using CDC tools like IBM CDC for an incremental updates.
• Worked on Dimensional and Relational Data Modeling using Star and Snowflake Schemas,
OLTP/OLAP system, and Conceptual, Logical, and Physical data modeling using Erwin.
• Created Jenkin pipelines for Continuous Integration and deployment for destination endpoints.
• Developed spark-optimized data pipelines using Scala and python.
• Expert in building bash, shell scripting, and Python for various functionalities.
• Also worked on L3 production support for existing products.
• Develop utilities using scala and python to minimize lines of code in developments.

Environment: CDH, HDFS, Hive, Spark, Scala, Python, Talend, Sqoop, HBase, Oozie, Kibana, ELK, Shell,
Python, Kafka, Spark Mlib, SQL, AWS

Elevance Health, Indianapolis, IN Apr. 19 – Oct. 21


SR. DATA ENGINEER
Responsibilities:
• Work on developing frameworks to enhance data using Big Data technologies like Hadoop.
• Expert in ingesting batch data from different sources into the cloud using spark.
• Implemented lambda architecture using streaming batch tools to ingest data in real-time.
• Expert in building consumer applications using spark streaming in Python and Scala.
• Using SQL, Hive, Spark, and other Big Data technologies to do data analysis to resolve business-
related challenges for the business and customers on data particular to different domains.
• Developed Hive ETL jobs for various data transformations and Cleansing.
• Developed Spark Streaming application using Scala and python for processing data from Kafka.
• Implemented Hive and spark optimizations to improve production runtime.
• Experience in writing ETL scripts using Databricks Spark in Azure for various transformations.
• Experience in using tools like Azure Data Factory, Azure SQL, Cosmos DB, and Azure HDInsight.
• Experience in working on building hybrid Data Lake in on-prem and Azure cloud.
• Designed and implemented database solutions in Azure SQL data warehouse
• Used Python to run the ansible playbook which will deploy the logic apps to azure.
• Ingested real-time data from Event Hub for our custom message consumer.
• Involved in creating Hive tables, loading with data, and writing hive queries.
• Implemented multiple Impala scripts for exposing it into Tableau.
• Administered and Provided L3 database support for 50+ production Oracle 11g databases.
• Upgraded oracle database from 11g to 12c and 10g to 11g on RHEL6 and Oracle Linux.
• Involved in Cloning and patching of oracle databases 10g, 11g, and 12c.
• Implemented end-to-end job automation in Airflow and oozie for respective actions.
• Developed code coverage for production data pipelines using Sonar.
• Worked on migration data pipelines from on-prem cluster to Azure cloud.
• Developed streaming pipelines for ingesting data from Kafka to Hive.
• Developed streaming dashboards in PowerBi using Stream Analytics to push datasets to Power
Bi.
• Experience in working to ingest data from Restfulapi’s and write to Hive.
• Implemented real-time metrics dashboards using Grafana for logging production issues.
• Developed data modules in microservices using spring boot with java.
• Developed Jenkins pipelines for continuous integration and deployment purpose.
Environment: HDFS, Hive, Spark, Oozie, Python, Scala, Databricks Spark, Azure Cloud, Azure Data
Factory, Azure SQL, Cosmos DB, Azure HDInsight’s, Airflow, Power BI

Lululemon, Sumner,Washington June. 16– Apr. 19

DATA ENGINEER
Responsibilities:
• Developed code for importing and exporting data from RDBMS into HDFS using Sqoop and vice
versa
• Designed multiple layers of data foundations in different databases like Hive and HBase.
• Implemented Partitions, Buckets based on State to further process using Bucket-based Hive
joins.
• Handled ETL Framework in Spark with python and scala for data transformations.
• Created HBase tables from Hive and wrote HiveQL statements to access HBase table data.
• Developed Custom Hive UDFs in Java for decreasing repetitive SQL code in the script.
• Ingested data into Apache Kudu for performing updates for transactional purposes.
• Built Jupyter notebooks using Pyspark for extensive data analysis and exploration.
• Implemented Spark streaming using Scala for real-time computations to process JSON files.
• Worked on Kafka messaging queue for Data Streaming in both batch and real-time applications.
• Built custom endpoints and libraries in NiFi for ingesting data from traditional legacy systems.
• Worked on building various pipelines and integration using NiFi for ingestion and exports.
• Developed end-to-end job automation for various Hadoop jobs using Apache Oozie.
• Developed various scripting functionality using shell and Python for various functionality.
• Developed advanced PL/SQL packages, procedures, triggers, functions, Indexes, and Collections
to implement business logic using SQL Developer. Generated server-side PL/SQL scripts for data
manipulation and validation and materialized views for remote instances.
• Provided Database Administration support (24x7) for the production of oracle 11g Databases.
• Responsible for Upgrading the Database from 10g to 11g Configuring the ASM file system and
administering ASM Instances and Disk Groups in RAC for testing purposes of EBS 12.1.3.
• Pushed application logs and data stream logs to the Kibana server for monitoring and alerting.
• Implemented ELK stack to collect logs from collecting logs from different servers for analysis.
• Implemented data pipelines for machine learning models for prediction purposes in spark.
• Experience in implementing ingestion data pipeline from AWS S3 to HDFS and vice versa.
• Experience working on AWS EMR, S3, and EC2 instances.
• Extensively used AWS Lambda, Kinesis, and Cloud Front for real-time data collection.
• Used the AWS-CLI to suspend an AWS Lambda function processing an Amazon Kinesis stream,
then to resume it again.
Environment: HDFS, Hive, Spark, Oozie, HBase, AWS, Scala, Python, Bash, Kafka, Java, Jenkins, Spark
Streaming, Tez, AWS Athena, Glue

Education:
• Master in Computer Science from Oklahama State University
• Jawaharlal Nehru technological university, India, Bachelors in Computer Science

Certification:

• Big Data Big insights - IBM Bloomix

You might also like