Hanumantha Rao Resume-1 (4391)
Hanumantha Rao Resume-1 (4391)
Hanumantha Rao Resume-1 (4391)
Email: [email protected]
Mobile: +1-(904)-942-9686
Professional Summary:
1. Over 11+ years of IT experience in Banking sector and worked in various Data processing tools and cloud
technologies.
2. Having 5+ years of experience in Big Data and Cloud Technologies.
3. Built end to end Data Pipelines leveraging several AWS Analytical services.
4. Contribution in establishing and implementing Data Engineering standards and best practices for enterprise data
pipelines.
5. Built Python based frameworks for interacting with data bases like Redshift/Snowflake.
6. Good Knowledge and experience on Python and its Data Structures like List, Tuple, Dictionaries. Good
Knowledge and experience on Python and its Data Structures like List, Tuple, Dictionaries.
7. Worked on Cloud Technologies like AWS and have good hands on AWS services namely S3, EMR, Red Shift,
Glue, Lambda, Spectrum and Athena.
8. Worked in design and development of solutions for Big Data using the Hadoop eco system technologies such as
HDFS, Hive, Sqoop, Apache Spark, Python, Apache Nifi.
9. Worked on Azure Databricks and implemented a POC in Azure.
10. Excellent Knowledge on Spark and its internal Architecture and capable of processing large sets of Structured,
Semi-structured and very good in handling complex data processing as well.
11. Extensive knowledge on serialization- using AVRO, PARQUET & ORC File formats and worked with various
compression techniques namely snappy and gzip.
12. Experience in developing Spark applications and applying transformation on RDD, DataFrame and very good in
writing Spark DSL and Spark SQL Operations.
13. Created External & Managed Tables in HIVE to store structured data into HDFS and processed it further,
created partitioning, bucketing and applied optimized techniques while loading the data to hive tables.
14. Good experience in incremental imports, partitioning and bucketing concepts in Hive and Spark SQL needed
for optimization.
15. Prior experience on ETL Datastage as developer helped me to handle alot of business scenarios and automated
most of the dependency works to avoid manual work.
16. Worked as Datastage Administrator which helped me in understanding architectures of projects layer by layer
and writing shell scripts for automations.
17. Worked in Agile Methodologies and waterfall model as well.
18. Have good problem solving and Analytical skills.
Technical Competencies:
Work Experience:
Responsibilities
Understanding the business and gathering required information to get a clear picture from end to end.
Involving in daily scrum calls to distribute the work to the team and clear the gaps in understanding requirement.
Participating in all sprint planning calls in understanding the business expectations in each quarter.
Checking the proposed Eco System is sufficient to achieve the business expected outcome or not.
Responsible for building scalable distributed data solutions using Hadoop, managing and scheduling Jobs on a
Hadoop cluster.
Analyzing all Ingestion flows to check whether we are handling all sorts of scenarios as we are handling different
file formats namely parquet, avro, csv, etc. coming from different sources.
Metadata validation check between business provided template and received input file to avoid any data
mismatches or displacements in table.
Developed Ingestion scripts to move the data from source to HDFS with audit maintenance as well for file and
rdbms sources using file and Sqoop commands
Creating Hive Managed Tables and External Tables to store data in Staging and permanent tables.
Writing validations using UNIX shell scripts to validate business date and total record count.
Creating External tables and applying dynamic partitions for easy access of data.
Applying business logics once the data is available in common layer and inserting data into hive tables in final
layer using PySpark.
Performing unit testing after completion of all development and then check in the code for migrating it to UAT
for all sorts of testing.
Helping QA team if they stuck with issues or need any clarifications to complete all sorts of test cases.
Raising CRQ for deploying our code to Production and letting all teams know about our changes and release
dates.
After successful run for few days, we will hand over the KT to Production teams for daily monitoring purpose.
Responsibilities
Understanding the business and discussing with line of business to get all the requirements.
Involving in scrum calls to post our Team's over all questions and get it clarified from line of business.
Categorizing the tables storage in AWS Redshift into hot, warm and cold tables by using last accessed/modified
dates and unloading required tables to S3.
Checking metadata compatibilities and equivalents between Redshift and Snowflake and creating configuration
file to cover all scenarios.
Writing automated script to create tables in snowflake, which are equivalent to same table in Redshift.
Creating python based framework for interacting with databases like Snowflake/Redshift.
Once the data is available in S3 in parquet format then we will move that data to snowflake.
Automating the data validations as a reconciliation process after data migration from Redshift to Snowflake.
Responsibilities
Understanding the business and gathering required information to get a clear picture from end to end.
Involving in daily scrum calls to distribute the work to the team and clear the gaps in understanding requirement.
Participating in all sprint planning calls in understanding the business expectations in each quarter.
Writing automated scripts to check if data file is available in AWS S3 to start consuming the data
Consuming the real time data using AWS Kinesis through Apahe Nifi and flattening the json data on the fly to a
data frame.
Once the unstructured/complex data has been flattened then we need to store the same in S3.
Spin on EMR with business logic for processing the combined data from both S3 files with spark and loading the
data to staging tables.
Once the data is available in staging table then same data will be inserted into Redshift.
Ensuring data validations are in place from start to end with all audit data captured.
Using Athena for running Adhoc queries to run and validate the final data available in Redshift.
Leveraged GLUE data catalog for maintain in between hops of data pipelines by publishing tables, we can even
query clean data/raw data which helps in reconciliation.
Helping QA team if they stuck with issues or need any clarifications to complete all sorts of test cases.
Deploying our code to Production and monitoring for few days.
After successful run for few days, we will hand over the KT to Production teams for daily monitoring purpose.
Responsibilities
Gathering requirements form clients and doing required analysis and actively participating in sprint calls to
discuss about pending clarifications.
Developing Datastage jobs as per the business logics and checking whether data is generating as expected by
business and cross verifying the same periodically.
Developing frameworks with Unix shell scripts for automations to reduce manual work in doing validations.
Doing Unit testing for all the modules and then migrating the same to QA region for complete testing.
Helping the QA team if they face any issues and providing required inputs and explaining the logic behind the
complex scenarios to complete all test cases created by QA team.
Moving the code to production and giving support for initial phase after release for few days and then handing
over the KT to production and support teams after buffer period.
Responsibilities
As an administrator we have performed all the below activities for smoother execution of development activities
without any interruptions in data processing and handled server availability issues as well.
I hereby declare that the above-furnished details are true to the best of my knowledge and assuring you my services to the
satisfaction.