Pavani Senior Data Engineer Professional Summary
Pavani Senior Data Engineer Professional Summary
Pavani Senior Data Engineer Professional Summary
TECHNICAL SKILLS:
Big Data Ecosystem HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Kafka, Spark
Flume, Cassandra, Impala, Oozie, Zookeeper, MapReduce, Amazon Web Services (AWS),
EMR
Cloud Technologies AWS, Azure, Google cloud platform (GCP)
IDE’s IntelliJ, Eclipse, Spyder, Jupyter.
Operating Systems: Linux, Unix, Windows 8, Windows 7, Windows Server 2008/2003
Programming Python, Scala, Linux shell scripts, Java Scripting, PL/SQL, Java, Pig Latin, HiveQL
languages
Databases Oracle, MySQL, DB2, MS-SQL Server, MongoDB, HBASE
Web Dev. Technologies HTML, XML, JSON, CSS, JQUERY, JavaScript
Java Technologies Core Java, Servlets, JSP, JDBC, Java Beans, J2EE
Business Tools Tableau, Power BI
PROFESSIONAL EXPERIENCE
Role: Senior AWS Data Engineer
Client: Fiserv, Brookfield, WI Jan 2023 to Present
Responsibilities:
Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed
AWS Lambda code from Amazon S3 buckets.
Created a Lambda Deployment function, and configured it to receive events from your S3 bucket.
Designed the data models to be used in data intensive AWS Lambda applications which are aimed to
do complex analysis creating analytical reports for end-to-end traceability, lineage, definition of Key
Business elements from Aurora.
Writing code that optimizes performance of AWS services used by application teams and provide
Code level application security for clients (IAM roles, credentials, encryption, etc.).
Using SonarQube for continuous inspection of code quality and to perform automatic reviews of
code to detect bugs. Managing AWS infrastructure and automation with CLI and API.
Creating AWS Lambda functions using python for deployment management in AWS and designed,
investigated and implemented public facing websites on Amazon Web Services and integrated it with
other applications infrastructure.
Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is
accessible via Lambda function.
Responsible for Building Cloud Formation templates for SNS, SQS, Elastic search, Dynamo DB,
Lambda, EC2, VPC, RDS, S3, IAM, Cloud Watch services implementation and integrated with Service
Catalog.
Regular monitoring activities in Unix/Linux servers like Log verification, Server CPU usage, Memory
check, Load check, Disk space verification, to ensure the application availability and performance by
using cloud watch and AWS X-ray. implemented AWS X-Ray service inside Confidential, it allows
development teams to visually detect node and edge latency distribution directly from the service
map Tools.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like
S3, ORC/Parquet/Text Files into AWS Redshift.
Automate Datadog Dashboards with the stack through Terraform Scripts.
Developed file cleaners using Python libraries and made it clean.
Experience in building Snow pipe, In-depth knowledge of Data Sharing in Snowflake Database,
Schema and Table structures.
Exploring DAG's, their dependencies and logs using Airflow pipelines for automation with a creative
approach.
Designed and implemented a fully operational production grade largescale data solution on
Snowflake.
Utilized Python Libraries like Boto3, NumPy for AWS.
Used Amazon EMR for map reduction jobs and test locally using Jenkins.
Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
Create external tables with partitions using Hive, AWS Athena and Redshift.
Developed the PySprak code for AWS Glue jobs and for EMR. Install and configured Splunk clustered
search head and Indexer, Deployment servers, Deployers. Designing and implementing Splunk -
based best practice solutions. Designed and Developed ETL jobs to extract data from Salesforce
replica and load it in data mart in Redshift.
Responsible for Designing Logical and Physical data modelling for various data sources on
Confidential Redshift.
Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS
resources.
Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages
and updated the status into DynamoDB table.
Technologies: Python, Power BI, AWS Glue, Athena, SSRS, SSIS, AWS S3, AWS Redshift, AWS EMR, AWS
RDS, DynamoDB, SQL, Tableau, Distributed Computing, Snowflake, Spark, Kafka, MongoDB, Hadoop, Linux
Command Line, Data structures, PySpark, Oozie, HDFS, MapReduce, Cloudera, HBase, Hive, Pig, Docker,
Tableau.
Technologies: PL/SQL, Python, Azure-Data factory, Azure Blob storage, Azure table storage, Azure SQL
server, Apache Hive, Apache Spark, MDM, Netezza, Teradata, Oracle 12c, SQL Server, Teradata SQL
Assistant, Teradata Vantage, Microsoft Word/Excel, Flask, Snowflake, DynamoDB, Athena, Lambda,
MongoDB, Pig, Sqoop, Tableau, Power BI and UNIX, Docker, Kubernetes.
Client: Honeywell, India Apr 2017 to Dec 2020
Role: Data Engineer
Responsibilities:
Experienced in building and architecting multiple Data pipelines, end to end ETL and ELT
process for Data Ingestion and transformation in AWS and Spark.
Leveraged cloud and GPU computing technologies for automated machine learning and analytics
pipelines, such as AWS
Participated in all phases of data mining; data collection, data cleaning, developing models,
validation, visualization, performed Gap analysis provide feedback to the business team to
improve the software delivery.
Data Mining with large datasets of Structured and Unstructured data, Data Acquisition, Data
Validation, Predictive modeling, Data Visualization on provider, member, claims, and service
fund data.
Involved in Developing a RESTful API's (Microservices) using Python Flask framework that is
packaged in Docker and deployed in Kubernetes using Jenkins Pipelines.
Experience in building and architecting multiple Data pipelines, end to end ETL, and ELT processes
for Data ingestion and transformation in Pyspark.
Created reusable Rest API's that exposed data blended from a variety of data sources by reliably
gathering requirements from businesses directly.
Worked on the development of Data Warehouse, a Business Intelligence architecture that
involves data integration and the conversion of data from multiple sources and platforms.
Responsible for full data loads from production to AWS Redshift staging environment and
worked on migrating EDW to AWS using EMR and various other technologies.
Experience in Creating, Scheduling, and Debugging Spark jobs using Python. Performed Data
Analysis, Data Migration, Transformation, Integration, Data Import, and Data Export through
Python.
Gathered and processed raw data at scale (including writing scripts, web scraping, calling
APIs, writing SQL queries, and writing applications).
Creating reusable Python scripts to ensure data integrity between the source
(Teradata/Oracle) and target system.
Migrated on-premise database structure to Confidential Redshift data warehouse.
Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket
and then into HDFS and delivered high success metrics.
Implemented for authoring, scheduling, and monitoring Data Pipelines using Scala and spark.
Developed and designed system to collect data from multiple platforms using Kafka and then
process it using spark.
Created modules for spark streaming in data into Data Lake using Spark and Worked with
different feeds data like JSON, CSV, XML and implemented Data Lake concept.
Technologies: Python, Power BI, AWS Glue, Athena, SSIS, AWS S3, AWS Redshift, AWS EMR, AWS RDS,
DynamoDB, SQL, AWS Lambda, Scala, Spark
Role: Hadoop Developer
Client: Soft labs Group, India Aug 2015 to Mar 2017
Responsibilities:
Worked closely with business, transforming business requirements to technical requirements part of
Design Reviews & Daily project scrums and wrote custom MapReduce programs by writing Custom
input formats.
Created Sqoop jobs with incremental load to populate Hive External tables.
Involved in the development of real time streaming applications using PySpark, Kafka on distributed
Hadoop Cluster.
Worked on Partitioning, Bucketing, Join Optimizations, and query optimizations in Hive.
Worked closely with business, transforming business requirements to technical requirements.
Designed and developed Hadoop MapReduce programs and algorithms for analysis of cloud scale
classified data stored in Cassandra.
Optimized the Hive tables using optimization techniques like partitioning and bucketing to provide
better performance with HiveQL queries.
Evaluated data import-export capabilities, data analysis performance of Apache Hadoop framework.
Involved in installation of HDP Hadoop, configuration of the cluster and the eco system components
like Sqoop, Pig, Hive, HBase and Oozie.
Created reports for BI team using Sqoop to export data into HDFS and Hive.
Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational
Database system and vice versa.
Created RDD’s in Spark technology.
Extracted data from data warehouse (Tera Data) on the spark RDD’s.
Experience on Spark with Scala/Python.
Working on stateful Transformation in Spark streaming.
Worked on Batch processing and Real-time data processing and Spark Streaming using Lambda
architecture.
Worked on Spark SQL UDF’s and Hive UDF’s.
Technologies: Spark, Kafka, Hadoop, Linux Command Line, Data structures, PySpark, Oozie, HDFS,
MapReduce, Cloudera, HBase, Hive, Pig, Docker