0% found this document useful (0 votes)
4 views

Data_Engineer

Jahnavi is a seasoned Data Engineer with over 9 years of experience in data engineering and analysis, specializing in AWS and Azure cloud technologies. She has a strong background in designing and optimizing data pipelines, ETL processes, and data warehousing solutions, utilizing tools like Spark, Redshift, and various NoSQL databases. Currently, she works as a Senior AWS Data Engineer at Cigna Healthcare, focusing on scalable data solutions and big data analytics.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data_Engineer

Jahnavi is a seasoned Data Engineer with over 9 years of experience in data engineering and analysis, specializing in AWS and Azure cloud technologies. She has a strong background in designing and optimizing data pipelines, ETL processes, and data warehousing solutions, utilizing tools like Spark, Redshift, and various NoSQL databases. Currently, she works as a Senior AWS Data Engineer at Cigna Healthcare, focusing on scalable data solutions and big data analytics.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Jahnavi

[email protected]
4707814773

PROFESSIONAL SUMMARY
 I have 9+ years of experience in Data Engineering and Data Analysis/Science, specializing in enterprise-
level data models.
 I possessed extensive hands-on experience with AWS, including EC2, S3, EBS, Glue, Lambda,
Redshift, SNS, SQS, EMR, EKS and RDS.
 I am skilled in Azure cloud components such as HDInsight, Databricks, Data Lake, Blob Storage,
Data Factory, Storage Explorer, SQL DB, SQL DWH, and Cosmos DB.
 I am proficient in designing, implementing, and optimizing end-to-end data pipelines for large-scale data
processing.
 I have strong expertise in Extract, Transform, Load (ETL) processes, ensuring efficient data movement
and transformation across diverse systems.
 I have in-depth knowledge of relational databases, including design, optimization, and administration of
schemas, with expertise in SQL for complex queries and performance tuning.
 I have proven experience working with various NoSQL databases such as MongoDB, Cassandra, and
DynamoDB.
 I am proficient in big data technologies such as Apache Hadoop and Apache Spark for distributed
processing and analytics.
 I have knowledge in converting SQL queries to Spark Transformations utilizing Spark RDDs, Data
Frames, and Scala.
 I have expertise in data modeling and schema design, including star schema and snowflake schemas for
relational and non-relational databases.
 I am skilled in working with real-time data streaming technologies, including Apache Kafka, Spark, and
Apache Flink.
 I have expertise in designing and maintaining data warehouses, leveraging technologies like Amazon
Redshift and Google Big Query.
 I am proficient in workflow orchestration tools like Apache Airflow for scheduling and monitoring
complex data workflows.
 I used Jenkins CI/CD pipelines to streamline deployment procedures, which include software quality
tests and automatic data integrity testing.
 I have strong proficiency in version control systems such as Git and scripting languages such as Python
for building scalable and maintainable data solutions.

TECHNICAL SKILLS

Programming Python, C++, C#, SQL, Scala, R, HiveQL, Shell Scripting

Operating systems Windows, Linux, UNIX, MacOS

Cloud Technologies AWS (Amazon Web Services) [EMR, EC2, S3, Redshift, Glue, Route 53,
Athena, Lambda, DynamoDB]
Azure (Microsoft Azure) [Azure Databricks, Azure SQL Database, Azure
Data Factory, Azure Machine Learning, Azure Data Lake, Azure Functions]

IDEs Visual Studio Code, PyCharm, Eclipse, Jupyter.

Big Data Ecosystem Hadoop, MapReduce, HDFS, Hive, Impala, HBase, Kafka, Spark [Scala,
SQL].

Databases Cassandra, Oracle, Snowflake, MS Access, MySQL, MongoDB, GitHub.

ETL/ DWH Tools Informatica Power Centre, Informatica Power Exchange.

BI/ Reporting Tools Tableau, Power BI from Microsoft.

Project Management Salesforce, ITIL, SDLC, JIRA, Rally, GitHub.


& Collaboration

Scheduling & Apache Airflow, Autosys, Windows Scheduler.


Workflow
Automation

PROFESSIONAL EXPERIENCE

Client: Cigna Healthcare Group, Connecticut Jun 2023 – Present


Role: Sr AWS Data Engineer
Responsibilities:
 Designed and developed Big Data Analytics products on AWS for risk adjustment programs.
 Designed and implemented scalable data solutions on AWS cloud platform to support Cigna Healthcare’s big
data initiatives.
 Leveraged AWS services like Amazon Redshift, EMR, and Kinesis for building scalable data pipelines.
 Implemented data processing frameworks like Spark and Flink running on AWS for distributed computing.
 Designed data storage solutions using AWS S3, DynamoDB, and ElastiCache for efficient data access.
 Developed and maintained ETL (Extract, Transform, Load) processes using AWS services such as Glue,
EMR, and Data Pipeline to ingest and transform large volumes of structured and unstructured data.
 Built data processing pipelines and ETL (Extract, Transform, Load) workflows, Python combined with
AWS S3 allows me to efficiently ingest, process, and store data.
 Set up and managed Kubernetes clusters using Amazon EKS to support scalable and reliable healthcare
applications.
 Planned and executed strategies for scaling Redshift clusters to handle increasing data volumes and user
demands, ensuring efficient resource utilization.
 Implemented security best practices for EKS clusters, including IAM roles, RBAC (Role-Based Access
Control), and network policies to ensure compliance with healthcare regulations like HIPAA
 Integrated with other AWS services such as Lambda functions, EMR (Elastic MapReduce), and Glue for
data processing, Python with AWS S3 provides seamless interoperability.
 Utilized AWS as a cloud platform, I leverage its services such as Amazon S3 for scalable and cost-effective
storage, Amazon Redshift for data warehousing, and AWS Glue for ETL processes. This allows me to
build robust and flexible data pipelines, ensuring seamless integration and analysis of large volumes of data.
 Implemented CI/CD pipelines for EKS deployments using Jenkins, GitLab CI, and AWS Code Pipeline,
ensuring automated and reliable application updates.
 Deployed and managed containerized applications using Kubernetes on EKS, ensuring efficient resource
utilization and scalability to handle large volumes of healthcare data.
 Enhanced AWS Redshift performance by creating and tuning data distribution keys, sort keys, and
compression techniques, and identifying and resolving performance bottlenecks.
 Utilized AWS Lambda for event-driven architectures, triggering functions in response to changes in data or
user activities within healthcare applications.
 Designed and implemented Lambda-based REST APIs using API Gateway, providing scalable and cost-
effective endpoints for healthcare services.
 Optimized data storage and retrieval mechanisms on AWS S3 and other data storage solutions to ensure high
performance and cost-effectiveness.
 Designed and managed complex data workflows, orchestrating tasks, dependencies, and scheduling to
automate data pipelines, ensuring timely and reliable data processing and delivery.
 Automated infrastructure provisioning and management using AWS CloudFormation, Cloud Watch, Cloud
Trail and Terraform.
 Implemented disaster recovery strategies for Big Data applications using AWS services like S3 Glacier and
Route 53.
 Monitored resource utilization, tracked data growth, and made recommendations for capacity planning and
infrastructure scaling to maintain optimal performance.
 Utilized programming languages like Python, PLSQL, and others for data manipulation and automation
within AWS.
 Developed User Defined Functions (UDFs) and pre-processing scripts in python to manipulate raw data. To
optimize data pipeline for efficient analysis and modeling, enhancing overall data quality and reliability.
 Ensured data privacy compliance with regulations like GDPR, HIPAA, and PII using AWS services like
KMS and Macie.
 Implemented continuous integration and deployment (CI/CD) pipelines for automated deployment of data
pipelines and infrastructure as code using AWS CloudFormation and Terraform.
 Collaborated with cross-functional teams, including data scientists, analysts, and software engineers, to
understand data requirements and provide effective data solutions.
 Stay updated with industry best practices, emerging technologies, and AWS services to continuously improve
data engineering processes and drive innovation within Cigna Healthcare’s data ecosystem.
Environment: S3, Spark, Redshift, Lambda, EMR, IAM, AWS serverless architecture, EC2, EKS, PCI DSS,
HIPAA, Glue (data integration), Athena (serverless analytics), and Step Functions (workflow orchestration).

Client: Tango Analytics, Texas Feb 2022 – May 2023


Role: Sr. Data Engineer
Responsibilities:
 Architected and designed highly scalable and resilient data pipelines on AWS infrastructure.
 Designed and developed data ingestion processes from various sources into AWS data lakes and data
warehouses.
 Designed and optimized Redshift schemas and table structures, maintaining comprehensive
documentation of data models, ETL processes, and system configurations.
 Designed and optimized Redshift schemas and table structures, maintaining comprehensive
documentation of data models, ETL processes, and system configurations.
 Designed and implemented scalable data solutions on AWS, leveraging Python and PLSQL for efficient
processing.
 Developed User Defined Functions (UDFs) and pre-processing scripts in python to manipulate raw data.
 Developed and maintained data pipelines for ingestion, processing, and analysis of large datasets.
 Utilized AWS services such as S3, Redshift, Glue, and Athena for storage, processing, and querying of
data.
 Managed container orchestration and microservices architecture using EKS, enhancing the robustness
and scalability of data processing workflows.
 Implemented serverless APIs using Lambda and API Gateway, facilitating seamless data access and
integration for analytics applications.
 Developed and deployed Lambda functions to automate data processing tasks, reducing operational
overhead and improving data pipeline efficiency.
 Conducted performance optimization of Lambda functions by adjusting memory settings, optimizing
code, and utilizing provisioned concurrency.
 Conducted performance tuning and optimization of SQL queries, Spark jobs, and data processing
workflows.
 Specialized in designing and implementing scalable data storage solutions, optimizing data processing
workflows, and ensuring data quality and reliability to support analytical needs and business decision-
making processes.
 Developed custom data processing solutions using programming languages like Python, Scala, or Java.
 Implemented continuous integration and deployment (CI/CD) pipelines for automated deployment of
data pipelines and infrastructure as code using AWS CloudFormation and Terraform.
 Excelled in navigating secured internal network environments, ensuring the confidentiality, integrity,
and availability of sensitive data while implementing robust security measures and protocols to
mitigate potential risks and vulnerabilities.
 Implemented serverless APIs using Lambda and API Gateway, facilitating seamless data access and
integration for analytics applications.
 Designed and implemented real-time streaming data processing solutions using AWS Kinesis and
Apache Kafka.
 Deployed and managed containerized applications using Kubernetes on EKS, ensuring efficient resource
utilization and scalability to handle large volumes of data.
 Implementing data lifecycle management policies to manage data retention, archival, and deletion.
 Designed and implemented data catalog solutions to facilitate data discovery and metadata management.
 Led efforts to automate data validation, monitoring, and alerting processes using AWS CloudWatch,
CloudTrail, AWS Config, and custom scripts.
 Led efforts to automate data pipeline deployment and configuration management using AWS Code
Pipeline and similar tools.
 Set up monitoring and alerting systems for proactive issue identification and Redshift cluster stability,
ensuring timely troubleshooting and resolution of data-related issues.
 Enhanced Redshift performance by creating and tuning data distribution keys, sort keys, and compression
techniques, and identifying and resolving performance bottlenecks.
 Monitored resource utilization, tracked data growth, and made recommendations for capacity planning
and infrastructure scaling to maintain optimal performance.
 Monitored resource utilization, tracked data growth, and made recommendations for capacity planning
and infrastructure scaling to maintain optimal performance.
 Communicated effectively with stakeholders on project updates and insights.
 Collaborated with data scientists and analysts to support their data requirements and analytical
workloads.
 Documented data engineering processes, architectures, and best practices for knowledge sharing.
 Designed and implemented multi-region and multi-account data replication and synchronization
solutions.
 Supported data migration, integration, and transformation efforts as needed. Led efforts to refactor
monolithic data applications into microservices-based architectures.
Environment: Amazon Recognition, Amazon Forecast, Amazon Quick sight, AWS SDKs, Amazon EMR, AWS
Lambda, Amazon Personalize, Amazon Comprehend, Amazon Aurora, Amazon Athena, Amazon Redshift, AWS
Glue, Amazon S3, Amazon Sage Maker, Amazon RDS (Relational Database Service), S3, EKS, Lambda.

Client: TD Bank, New Jersey Aug 2020 – Dec 2021


Role: Data Engineer
Responsibilities:
 Developed Python scripts for ETL automation, ensuring seamless data processing and integration with
Azure SQL Database. Implemented SQL queries and stored procedures to extract, transform, and load data
accurately.
 Monitored and optimized SQL queries and database performance using Azure monitoring tools and SQL
profiling techniques, identifying, and resolving performance bottlenecks and inefficiencies.
 Exploited python libraries like Pandas, NumPy, Matplotlib for data manipulation, analysis, visualization,
enabling insights-driven decisions. Integrate Python scripts with Azure Data Factory, Databricks for
scalable data workflows. Provide SQL and Python training for effective Azure SQL reporting.
 Developed and maintained data models using Azure Data Modeler, an industry-standard modeling tool.
 Supported partners in interpreting and analyzing data using Azure Synapse Analytics.
 Supported data transformation needs with serverless functions or logic apps in Azure.
 Ensured privacy, security, and access controls are properly adhered to using Azure Active Directory.
 Worked with ITS/ARE teams to support code packaging & deployment (CI/CD) into Azure environments
using Azure Service Management.
 Supported QA and development teams by monitoring data loads and performing analysis with Azure
Monitor.
 Ensured metadata and data lineage are captured and compatible with Azure Purview for enterprise data
management.
 Participated in designing & architecture reviews of applications using Azure Boards for collaboration.
 Built effective working relationships with peers and partners using Microsoft Teams.
 Supported stakeholders by providing data visualizations with Power BI.
 Understand data engineering initiatives and capabilities within the Azure ecosystem.
 Supported data acquisition and ingestion using Azure Data Analysis Services for data storage and
management.
 Developed and maintained ETL jobs using Azure Data Factory for data transformation and movement.
 Maintained foundational knowledge of upstream data using Azure Data Catalog for metadata management.
 Ensured data is maintained in compliance with standards using Azure Data Quality Services.
 Adhere to standard security coding practices to ensure application security in Azure.
 Keep current on emerging trends in data analysis and azure technologies through Microsoft certifications.
Environment: Data Acquisition & Ingestion using Azure Data Lake Storage, Azure Data Factory, ETL.

Client: Cyient India Sep 2017 – Dec 2019


Role: ETL Developer/Data analyst
Responsibilities:
 Analyzed business requirements and prepared detailed project specifications.
 Utilized Python for automated data extraction, transformation, and loading, developed SQL queries and
stored procedures.
 Communicated regularly with business and IT leadership to align project goals with organizational objectives.
 Enhanced workflow management and automation using Apache Airflow.
 Led the creation of ETL pipelines using Spark and Hive, ingesting data into S3.
 Extensively used PySpark for data frames, ETL, mapping, transformation, and loading.
 Developed Spark programs for batch processing using AWS EMR clusters and S3.
 Implemented data quality checks and validations for reliable analytical outputs.
 Collaborated with cross-functional teams to gather and analyze data requirements.
 Performed exploratory data analysis to uncover insights and trends.
 Conducted performance tuning and optimization of data processes.
 Performed data cleansing and transformation for analysis preparation.
 Worked with data engineers to design and implement data models.
 Created and maintained documentation for data analysis processes.
 Monitored data pipelines and analytics systems for performance and reliability.
Environment: Agile methodology, PySpark, ETL, EMR, S3 buckets, Hadoop.

Client: Natco Pharma Limited, India Jan 2015 – Aug 2017


Role: Data Analyst
Responsibilities:
 Analyzed pharmaceutical data using Python and SQL for actionable insights.
 Automated data cleaning, transformation, and analysis with Python scripts.
 Collaborated with cross-functional teams to translate data requirements into solutions.
 Identified improvement opportunities through data and process reviews.
 Defined data-driven business requirements with stakeholders.
 Applied business analysis concepts and SDLC methodologies for process improvement.
 Utilized story and task-tracking tools for agile workflows, facilitating efficient collaboration, prioritization,
and progress tracking across multidisciplinary teams, ensuring timely delivery of data engineering projects
and alignment with business objectives.
 Performed statistical analysis and data visualization using Pandas, NumPy, and Matplotlib.
 Established timelines, milestones, and managed risks for stakeholder communication.
 Documented requirements, process flows, and system configurations accurately.
 Conducted data quality assessments to support business needs.
 Ensured high data quality throughout project lifecycles.
 Utilized SQL queries, data modeling, and analysis tools.
 Analyzed large data sets and recommended process improvements.
 Generated reports, dashboards, and presentations for findings and ad-hoc analysis.
 Provided training on Python and SQL techniques while maintaining data integrity and confidentiality.
Environment: Data modelling, SQL, Excel, system architecture, Python, Confluence.

You might also like