0% found this document useful (0 votes)
24 views8 pages

Ravi Teja AWS Data Engineer

Uploaded by

ramakrishnaa1289
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views8 pages

Ravi Teja AWS Data Engineer

Uploaded by

ramakrishnaa1289
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Name: Ravi Teja

Data Engineer
E-Mail ID: [email protected]
Contact No: (816) 579-2762

SUMMARY:

Technical software professional with an extensive portfolio of projects, with 8 years of experience in
Data Engineering, Data Pipeline Design, Development and Implementation as a Data Engineer using
Python, SQL, and Spark. Seeking a challenging position as a Data Engineer to leverage my 6+ years of
experience in AWS projects to design, develop, and maintain data infrastructure solutions that drive
business insights and growth.

Profile Summary:
 Strong experience in programming languages like Java, Scala and Python.
 Experience in working with Hadoop components like HDFS, Map Reduce, Hive, HBase,
Sqoop, Oozie, Spark, Kafka.
 Strong understanding of Distributed systems design, HDFS architecture, internal working
details of MapReduce and Spark processing frameworks.
 Solid experience developing Spark Applications for performing high scalable data
transformations using RDD, Dataframe and Spark-SQL.
 Strong experience troubleshooting failures in spark applications and fine-tuning spark
applications and hive queries for better performance.
 Good experience utilizing various optimization options in Spark like broadcast joins,
caching (persisting), sizing executors appropriately, reducing shuffle stages etc.,
 Worked extensively on Hive for building complex data analytical applications.
 Strong experience writing complex map-reduce jobs including the development of custom
Input Formats and custom Record Readers.
 Sound knowledge in map side join, reduce side join, shuffle & sort, distributed cache,
compression techniques, multiple Hadoop Input & output formats.
 Good experience working with AWS Cloud services like S3, EMR, EC2, Redshift, Athena,
IAM,Glue metastore, Lambda, cloudwatch, Eventbridge, etc.,
 Proficient in monitoring Step Function executions using AWS CloudWatch, enabling
proactive troubleshooting and performance optimization.
 Developed automated workflows where ECS tasks are triggered by specific events via Event
Bridge, enhancing data pipeline responsiveness and reducing manual intervention.
 Used Sql Concepts, Hive, SQL, Python, and Pyspark to cope with the increasing volume of
data.
 Generated a script in AWS Glue to transfer the data and utilized AWS Glue to run ETL
jobs and run aggregation on PySpark code.
 Proficient in monitoring event patterns and diagnosing issues in real-time using
EventBridge and integrated AWS monitoring tools like CloudWatch.
 Experience with implementation of Snowflake cloud data warehouse and operational
deployment of Snowflake DW solution into production.
 Migrating Legacy applications to Snowflake.
 Knowledge of Snowflake and other Peripheral Data Warehousing.
 Deep understanding of performance tuning, and partitioning for optimizing spark
applications.
 Worked on building real time data workflows using Kafka, Spark streaming and HBase.
 Extensive knowledge on NoSQL databases like HBase, Cassandra and Mongo DB.
 Solid experience in working with csv, text, Avro, parquet, ORC , JSON formats of data.
 Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow.
 Developed python code for different tasks, dependencies, SLA watcher and time sensor for
each job for workflow management and automation using Airflow tool.
 Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading
and storing of data.
 Strong understanding of Data Modelling and experience with Data Cleansing, Data
Profiling and Data analysis.
 Experience in writing test cases in Java Environment using JUnit.
 Proficiency in programming with different IDEs like Eclipse, and Net Beans.
 Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services
like EC2, Cloud Formation, VPC, S3, EMR, Redshift, Athena, Glue Metastore etc.
 Integrated GitLab CI/CD pipelines with AWS Data Pipeline services for seamless automation
and continuous integration of data processes.
 Good knowledge in the core concepts of programming such as algorithms, data structures,
and collections.
 Excellent communication and inter-personal skills, flexible and adaptive to new
environments, self-motivated, team player, positive thinker and enjoy working in
multicultural environment.
 Analytical, organized and enthusiastic to work in a fast paced and team-oriented
environment.
 Expertise in interacting with business users and understanding the requirement and
providing solutions to match their requirement.

TECHNICAL SKILLS:

Hadoop/Big Data: Spark, Hive, HDFS, MapReduce, Sqoop, Oozie, Kafka, Impala,
Zookeeper, Kinesis, Ambari, Yarn,
Programming Java, Scala, Python, Pyspark
languages:
Cloud AWS-EC2, S3, EMR, RDS, Lambda, SNS, CloudWatch, Aurora, Redshift,
IAM,Athena, Glue Metastore, Eventbridge
Database: NoSQL (Hbase, Cassandra, MongoDB) , Teradata, Terraform,Oracle,
DB2, MySQL, Posgtres
IDE Tools: Eclipse, IntelliJ, PyCharm
Development Agile, Waterfall
Approach:
Version Control: CVS, SVN, Git, SBT, Maven
Reporting Tools: Tableau, QlikView, QlikSense

PROFESSIONAL EXPERIENCE:

Early Warning Services Jan 2024 - Present


Sr Data Engineer
Responsibilities:

• Developed series of data ingestion jobs for collecting the data from multiple channels
and external applications in Scala.
• Worked on both batch and streaming ingestion of the data.
• Built Python based Data pipelines from multiple data sources by performing necessary
ETL Tasks
• Imported clickstream log data from FTP Servers and performed various data
transformations using Spark Data frame api and Spark-SQL apis.
• Designed and implemented ETL jobs in AWS Glue, automating data transformations and
integrations between diverse data sources such as RDS, S3, and third-party APIs.
• Implemented Java based Kafka Producer applications for streaming messages to Kafka
topics.
• Built Spark Streaming applications for consuming messages and writing to HBase.
• Worked on troubleshooting and optimizing Spark Applications.
• Worked on ingesting data from Sql-server to S3 using Sqoop with in AWS EMR.
• Migrated Map-reduce jobs to Spark applications built on Scala and integrated with
Apache Phoenix and HBase.
• Orchestrated and scaled Spark and Hadoop clusters using EMR, facilitating distributed
data processing tasks and advanced analytics.
• Designed automated data workflows with AWS Data Pipeline, ensuring consistent,
timely, and fault-tolerant data processing.
• Developed interactive dashboards and reports in QuickSight, providing stakeholders
with actionable business insights.
• Developed daily and monthly ETL processes which were automated using custom UNIX
shell scripts & Python.
• Worked on building ETL pipelines using Python scripting, Pandas Data Frames, and
PySpark
• Involved in loading and transforming large sets of data and analyzed them using Hive
Scripts.
• Implemented SQL queries on AWS with platforms like Athena and Redshift
• Have experience in Querying in AWS Athena where alerts are coming from S3 buckets
and finding the difference in time interval between Clusters of Kafka and Kinesis
• Loaded portion of processed data into Redshift tables and automated the process.
• Worked on various performance optimizations in spark like using broadcast variables,
dynamic allocation, partitioning and built custom Spark UDFs.
• Worked on fine tuning long running hive queries by utilized proven standards like using
Parquet Columnar format, partitioning, vectorized execution etc.,
• Analyzed the data using Spark Data Frames and series of Hive Scripts to produce
summarized results to downstream systems.
• Worked with Data Science team in developing Spark ML applications to develop various
predictive models.
• Expertise on interacting with the project team to organize timelines, responsibilities, and
deliverables to provide all aspects of technical support.

Environment: Hadoop, Spark, Scala, Hive, Sqoop, Python, Oozie, Kafka, AWS EMR, Redshift, S3,
Kinesis, Spark Streaming, Athena, HBase, YARN, JIRA, Shell Scripting, Maven, Git
Vanguard, PA
Sr Data Engineer Nov 2021 - Dec 2023

Responsibilities:
 Utilizing analytical, statistical, and programming skills to collect, analyze and interpret large
data sets to develop data-driven and technical solutions to difficult business problems using
tools such as SQL, and Python.
 Created lambda jobs to trigger EMR cluster for running spark application.
 Worked on designing AWS EC2 instance architecture to meet high availability application
architecture and security parameters.
 Experience in optimizing data pipelines through the effective use of AWS EventBridge,
improving data flow efficiency, and reducing latency.
 Integrated APIs into ETL processes for extracting data from diverse sources, transforming it
into a standardized format, and loading it into target data stores.
 Created AWS S3 buckets and managed policies for S3 buckets and Utilized S3 buckets and
Glacier for storage and backup.
 Worked with different file formats like Parquet files and Impala using PySpark for
accessing the data and performed Spark Streaming with RDDs and Data Frames.
 Performed the aggregation of log data from different servers and used them in downstream
systems for analytics using Apache Kafka.
 Worked on Data Integration for extracting, transforming, and loading processes for the
designed packages.
 Designed and deployed automated ETL workflows using AWS lambda, organized and
cleansed the data in S3 buckets using AWS Glue, and processed the data using Amazon
Redshift.
 Utilized CloudWatch metrics and logs for in-depth performance analysis and
troubleshooting of data pipelines and applications, leading to optimized resource utilization
and reduced downtime.
 Worked within the ETL architecture enhancements to increase the performance using
query optimizer.
 Implemented the data that is extracted using Spark, Hive, and large data sets using HDFS.
 Worked on Streaming data transfer, data from different data sources into HDFS, No SQL
databases.
 Worked on scripting with Python and Pyspark in Spark for transforming the data from
various files like Text files, CSV and JSON.
 Worked on processing the data and testing using Spark SQL and on real-time processing by
Spark Streaming and Kafka using Python.
 Scripted using Python and PowerShell for setting up baselines, branching, merging, and
automation processes across the process using GIT.
 Worked with the implementation of the ETL architecture for enhancing the data and
optimized workflows by building DAGs in Apache Airflow to schedule the ETL jobs and
additional components in Apache Airflow like Pool, Executors, and multi-node
functionality.
 explored new features and updates in Amazon SNS, implementing innovative solutions to
improve system efficiency and user experience.
 Utilized CloudFormation to define and provision infrastructure components required for
deploying and running data pipelines.
 Involved in continuous Integration of application using Jenkins.
 Worked on creating SSIS packages for Data Conversion using data conversion
transformation and producing advanced extensible reports using SQL Server Reporting
Services.

Environment: Python, SQL, AWS EC2, AWS S3 buckets, SNS, Cloudwatch, PySpark, AWS lambda,
AWS Glue, Amazon Redshift, Spark Streaming, Apache Kafka, SSIS, ETL, Hive, HDFS, NoSQL, MySQL,
Teradata, PowerShell, GIT, Apache Airflow.

McKinsey, New York


Data Engineer Mar 2021 – Oct 2021

Responsibilities:

 Developed Spark Applications to implement various data cleansing/validation and


processing activity of large-scale datasets ingested from traditional data warehouse
systems.
 Worked both with batch and real time streaming data sources.
 Developed custom Kafka producers to write the streaming messages from external Rest
applications to Kafka topics.
 Developed spark streaming applications to consume the streaming json messages from
Kafka topics.
 Developed data transformations job using Spark Data frames to flatten JSON documents.
 Worked with the Spark for improving performance and optimization of the existing
transformations.
 Used Spark Streaming APIs to perform transformations and actions on the fly for building
common learner data model which gets the data -from Kafka in Near real time and persist it
to HBase.
 Worked and learned a great deal from AWS Cloud services like EMR, S3, RDS, Redshift,
Athena, Glue.
 Implemented real-time data orchestration systems using Step Functions, ensuring timely
data availability and processing for critical business applications.
 Migrated an existing on-premises data pipelines to AWS.
 Worked on automating provisioning of AWS EMR clusters.
 Used Hive QL to analyze the partitioned and bucketed data, executed Hive queries on
Parquet tables stored in Hive to perform data analysis to meet the business specification
logic.
 Experience in using Avro, Parquet, ORC file and JSON file formats, developed UDFs in Hive.
 Worked with Log4j framework for logging debug, info & error data.
 Experience in configuring and optimizing CI/CD workflows using GitLab CI/CD.
 Generated various kinds of reports using Tableau based on client specification.
 Used Jira for bug tracking and Git to check-in and checkout code changes.
 Responsible for generating actionable insights from complex data to drive real business
results for various application teams and worked in Agile Methodology projects
extensively.
 Worked with Scrum team in delivering agreed user stories on time for every Sprint.

Environment: AWS, S3, EMR, Spark, Kafka, Hive, Athena, Glue, Redshift, Teradata, Tableau, Step
functions.

AT&T, IL
Data Engineer Oct 2019 - Feb 2021
Responsibilities:

 Built data pipelines using Pyspark in AWS EMR.


 Created data pipeline using glue for loading data to hive table after business
transformation.
 Used cloudformation to deploy glue jobs/glue roles in AWS.
 Refactored python codes to PySpark to utilize spark parallelization.
 Used AWS IAM for creating roles, users, groups and implemented MFA to provide
additional security to AWS account and its resources.
 Responsible for enhancements in data pipeline as per user requirements.
 Performance and integration testing for platform upgrade.
 Responsible for creating spark jobs, bash scripts for deploying applications in AWS.
 Developed series of data ingestion jobs for collecting the data from multiple channels
and external applications in python.
 Worked on both batch and streaming ingestion of the data.
 Experience with implementation of Snowflake cloud data warehouse and operational
deployment of Snowflake DW solution into production.
 Implemented real-time data streaming APIs to enable continuous and low-latency data
updates for downstream analytics and reporting.
 Strong SQL skills and expertise in writing efficient queries for Redshift. Proficient in
query tuning, analyzing query plans, identifying bottlenecks, and optimizing SQL code
for improved performance.
 Worked on various performance optimizations in spark like using broadcast variables,
dynamic allocation, partitioning and built custom Spark UDFs.
 Worked on fine tuning long running hive queries by utilized proven standards like
using Parquet Columnar format, partitioning, vectorized execution etc.
 Provided business data to users for data analysis.

Environment: Spark, PySpark, Hive, AWS EMR, S3, JIRA, Bamboo, Bitbucket, Control M, Presto

Aetna, India
Big Data Developer Aug 2018 – July 2019

Responsibilities:
 Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3,
EMR, Redshift and Athena.
 Worked on migrating datasets and ETL workloads with Scala from On-prem to AWS Cloud
services.
 Extensive experience in utilizing ETL Process for designing and building very large-scale
data using
Apache spark.
 Migrating the data from local Teradata data warehouse to AWS S3 data lakes
 Built series of Spark Applications and Hive scripts to produce various analytical datasets
needed for digital marketing teams.
 Worked extensively on building and automating data ingestion pipelines and moving
terabytes
 of data from existing data warehouses to cloud.
 Responsible for Data Ingestion projects to inject the data into data lake using multiple data
sources systems using Talend Bigdata.
 Worked extensively on fine tuning spark applications and providing production support to
various pipelines running in production.
 Developed Python code to gather the data from HBase (Cornerstone) and designs the
solution to implement using PySpark.
 Developed and optimized Python based ETL pipelines in both legacy and
distributedenvironments
 Developed Spark with Python based pipelines using spark data frame operations to load
data to EDLusing EMR for jobs execution & AWS S3 as storage layer.
 Worked closely with business teams and data science teams and ensured all the
requirements are translated accurately into our data pipelines.
 Worked on full spectrum of data engineering pipelines: data ingestion, data
transformations and data analysis/consumption with Python
 Extracted the data from AWS Aurora Databases for big data processing
 Developed AWS lambdas using Python & Step functions to orchestrate data pipelines.
 Worked on automating the Infrastructure setup, launching and termination EMR clusters
etc.,
 Created Hive external tables on top of datasets loaded in AWS S3 buckets and created
various
 hive scripts to produce series of aggregated datasets for downstream analysis.
 Used Scala data pipelines to perform transformations on the EMR clusters and loading the
transformed data into S3 and from S3 into redshift
 Worked on creating Kafka producers using Kafka Java Producer Api for connecting to
external Rest live stream application and producing messages to Kafka topic.

Environment: AWS S3, EMR, Redshift, Aurora, Athena, Glue, Talend, Spark, Python, Java, Hive,
Kafka

Astellas Pharma, India


Software Developer Jan 2016 – July 2018

Responsibilities:

 Involved in understanding the functional specifications of the project.


 Assisted the development team in designing the complete application architecture.
 Designed and implemented RESTful APIs using Spring Boot, adhering to industry best
practices for API design.
 Designed and implemented microservices architecture using Spring Boot, facilitating
modular and independently deployable services.
 Involved in developing JSP pages for the web tier and validating the client data using
JavaScript.
 Developed connection components using JDBC.
 Designed Screens using HTML and images.
 Cascading Style Sheet (CSS) was used to maintain uniform look across different pages.
 Involved in creating Unit Test plans and executing the same.
 Did the documents/code reviews and knowledge transfer for the status updates of the
ongoing project developments
 Deployed web modules in Tomcat web server.

Environment: Java, JSP, J2EE, Servlets, Java Beans, HTML, JavaScript, JDeveloper, Tomcat
Webserver, Oracle, JDBC, XML.

You might also like