0% found this document useful (0 votes)
28 views6 pages

Srikant Data Engineer

Srikant A is a Senior Data Engineer with over 10 years of experience specializing in data solutions using AWS, GCP, and Azure technologies. He has a strong background in building and optimizing data pipelines, ETL processes, and data warehousing, along with expertise in various programming languages and data visualization tools. His professional experience includes roles at Walmart, Mayo Clinic, TD Bank, and Duke Energy, where he implemented scalable data solutions and improved operational efficiencies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Srikant Data Engineer

Srikant A is a Senior Data Engineer with over 10 years of experience specializing in data solutions using AWS, GCP, and Azure technologies. He has a strong background in building and optimizing data pipelines, ETL processes, and data warehousing, along with expertise in various programming languages and data visualization tools. His professional experience includes roles at Walmart, Mayo Clinic, TD Bank, and Duke Energy, where he implemented scalable data solutions and improved operational efficiencies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

SRIKANT A

Senior Data Engineer


Email ID: [email protected]
Contact Details: 980-224-0623
LinkedIn:
www.linkedin.com/in/srikant12

PROFESSIONAL SUMMARY
 Over 10 years of experience as a Data Engineer with a focus on building robust data solutions.
 Proficient in using AWS Redshift and AWS S3 for data warehousing and storage solutions.
 Experienced in designing and managing AWS Data Pipelines and AWS Glue for seamless ETL processes.
 Skilled in deploying and maintaining Hadoop YARN clusters for efficient resource management.
 Strong expertise in SQL Server, Spark, and Spark Streaming for real-time data processing.
 Proficient in Scala and Python for developing scalable data engineering solutions.
 Experienced in integrating and processing data with Kinesis, Hive, and Linux.
 Utilized Sqoop and Informatica for data import/export and data integration tasks.
 Expertise in data visualization tools like Tableau and Power BI for insightful reporting.
 Worked with Cassandra, Oozie, and Control-M for data management and workflow scheduling.
 Proficient in Snowflake, SQL, and data warehousing solutions.
 Implemented data synchronization and ETL processes with Fivetran, EMR, and EC2.
 Managed relational databases such as RDS, DynamoDB, and Oracle 12c.
 Leveraged Google Dataflow, GCP, GCS, and BigQuery for cloud-based data processing.
 Experienced with GCP Dataprep, GCP Dataflow, and GCP Dataproc for data transformation and processing.
 Proficient in Apache Kafka and Azure for building real-time data streaming solutions.
 Skilled in using Azure Event Hubs, Azure Synapse, and Azure Data Factory for comprehensive data engineering
solutions.
Technical Skills:

Category Skills
AWS AWS Redshift, AWS S3, AWS Data Pipelines, EMR, EC2, RDS, DynamoDB
Google Dataflow, GCP, GCS, BigQuery, GCP Dataprep, GCP Dataflow, GCP Dataproc, Cloud
Google Cloud Composer, Cloud Pub/Sub, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data
Platform Catalog
Azure, Azure Event Hubs, Azure Synapse, Azure Data Factory, Azure Databricks, Azure Service Bus,
Azure Azure SQL
Hadoop YARN, Hive, MapReduce, HBase, Sqoop, Oozie, Pig
Hadoop Ecosystem
Data Warehousing SQL Server, Oracle 12c, Teradata, Snowflake
Programming
Languages Scala, Python, Java 1.7
Data Integration Informatica, Talend, Fivetran, SSIS, SSAS, SSRS, DataStage, QualityStage
Data Visualization Tableau, Power BI
Big Data
Technologies Spark, Spark Streaming, Kinesis, Kafka
Database
Management Cassandra, DynamoDB, Oracle 12c, SQL Server 2017, SQL, T-SQL, Federated Queries
Operating Systems Linux, Unix,Windows
Scripting Shell Scripts
Workflow Control-M
Category Skills
Scheduling
Data Tools and
Services Data Catalog, VPC Configuration, VPN Google-Client, Pub/Sub
Other Technologies Microservices, Agile, JSON, JDBC, SFDC

Professional Experience

WALMART Bentovile AR JUNE 2024 to


Present
Senior Data Engineer
Responsibilities:
 Developed and maintained scalable data pipelines using Apache Spark, ensuring efficient data processing and
transformation.
 Designed and implemented complex SQL queries for data extraction, transformation, and analysis in support of
business requirements.
 Leveraged cloud technologies in Google Cloud Platform (GCP), including BigQuery (BQ) and Google Cloud
Storage (GCS), to manage and analyze large datasets.
 Orchestrated and scheduled data pipelines across QA and development environments using Automic, ensuring
timely and reliable execution.
 Improved operational efficiency by automating pipeline monitoring and execution workflows with Automic,
reducing manual intervention and downtime
 Performed root cause analysis for data pipeline failures, implementing robust exception handling and retry
mechanisms in Automic workflows.
 Streamlined CI/CD processes by developing a one-click deployment pipeline, enabling automatic code
deployment to Git upon PR merge, reducing deployment time and errors.
 Developed regression suites using Ensure to validate query accuracy and ensure data integrity after deployment,
enhancing confidence in production rollouts.
 Streamlined post-deployment validation by utilizing Ensure to automate query validation, reducing manual
errors and improving efficiency. Created a regression suite using Ensure, validating query accuracy and data
integrity post-deployment.
 Developed reusable Python libraries for common data engineering tasks, reducing development time and
standardizing pipeline implementations.
 Implemented and validated data storage and retrieval solutions with BQ and GCS, ensuring scalability and
reliability.
 Designed and optimized scalable data pipelines using Google Cloud Dataflow and Apache Beam, enabling
efficient real-time and batch processing of large datasets, integrating with services like BigQuery, Cloud Storage,
and Pub/Sub.
 Migrated legacy ETL workflows to GCP, leveraging Dataflow for streaming data processing, resulting in a 30%
cost reduction and improved system reliability.
 Implemented data partitioning and clustering strategies in BigQuery, improving query performance and cost
efficiency for analytical workloads.
 Integrated CI/CD workflows with Git and Jenkins, automating the deployment of code changes and ensuring
consistent environments across QA and production.
 Optimized data storage and retrieval by implementing partitioning and clustering techniques in BigQuery,
reducing query execution time and improving cost efficiency for large-scale data analytics.
 Designed and maintained Hive tables on GCP for historical data storage, integrating with BigQuery for advanced
querying and analytics, enhancing the accessibility and usability of data for business stakeholders.
 Optimized Spark jobs by implementing techniques such as partitioning, caching, and efficient data serialization,
reducing job runtime by up to 30% for large-scale data processing tasks.
 Automated Spark job submission using custom Python scripts and workflow orchestration tools like Apache
Airflow, ensuring seamless execution and monitoring across distributed environments.

Development Stack: Spark, Sql, Hive, Apache Airflow Airflow, Automic, Ensure, Big Query, GCP, GCS, Python, Git,
Jenkins, Data Pipelines, ETL Processes, CI/CD, Regression Suites, Hadoop,YARN.

Mayo Clinic Rochester MN November 2021 to JUNE


2024
Senior Data Engineer
Responsibilities:
 Developed and maintained data pipelines using AWS Data Pipe Lines and AWS Glue to ensure seamless data
flow and storage solutions.
 Managed AWS S3 buckets for data storage, optimizing data retrieval and cost efficiency. Implemented AWS
Redshift to perform complex data queries that facilitated timely decision-making processes.
 Experienced in utilizing Snowflake, a cloud-based data warehousing platform, for managing and analyzing large
datasets.
 Utilized Hadoop YARN to manage cluster resources and streamline processing tasks across the data platform.
 Designed and executed SQL queries and stored procedures on SQL Server and Oracle 12c to handle data
transactions and reporting.
 Leveraged Spark and Spark Streaming for real-time data processing to support dynamic data analysis and
reporting needs.
 Programmed in Scala and Python to develop and optimize data processing tasks, enhancing performance and
scalability.
 Configured and managed AWS Kinesis for real-time data ingestion and analysis, supporting high-throughput
applications.
 Employed Hive for data summarization, query, and analysis, making large datasets manageable and accessible.
 Implemented Informatica to design, deploy, and manage ETL (Extract, Transform, Load) processes for diverse
data sources.
 Developed interactive dashboards and reports using Tableau to visualize complex datasets and provide
actionable insights.
 Utilized analytics for catalog content, driving data-driven decisions to optimize product listings and improve
customer experience.
 Monitored and managed cloud infrastructure and data pipelines using Datadog, ensuring real-time visibility into
system performance, metrics, and logs across multi-cloud environments.
 Used Talend to integrate data sources and perform data quality checks, ensuring accuracy and consistency
across data stores.
 Collaborated with data architects to maintain data catalogs and metadata in Collibra, improving data discovery
and governance across enterprise systems.
 Deployed and configured Prometheus for collecting and storing time-series data from distributed data systems
and pipelines, enabling comprehensive monitoring of infrastructure.
 Skilled in optimizing databases and ETL processes on AWS, including Oracle, SQL Server, and big data tools like
Amazon Redshift and EMR.
 Implemented Collibra for data governance, ensuring compliance and data stewardship across data engineering
projects while automating metadata management.
 Proven success in Agile project delivery, collaborating with architecture teams to design scalable data solutions
on AWS, while actively testing new technologies to enhance practices.
 Managed Cassandra databases to handle large volumes of data with minimal latency.
 Scheduled and orchestrated workflows using Oozie and Control-M, enhancing automation and operational
efficiency.
 Integrated Fivetran for data integration, simplifying the replication of data into data warehouses.
 Administered AWS EMR clusters to efficiently process big data across distributed frameworks.
 Utilized AWS RDS and DynamoDB for database management, supporting scalable and high-performance
requirements.
 Implemented security measures and maintained infrastructure using AWS Glue and AWS Data Pipelines.
 Conducted data cleansing and transformation using Spark and AWS Glue to ensure data quality and readiness
for analysis.
 Automated data processing and reporting with VB and VBA in Access and Excel. Used Power BI and Tableau to
create interactive dashboards and reports for data-driven decision-making.
 Monitored data systems with AWS CloudWatch to ensure performance metrics were within the expected
thresholds.
 Effectively managed Terraform codebase by utilizing Git for version control and collaborating with cross-
functional teams to ensure consistency across environments.
 Performed backup and disaster recovery operations using AWS S3 and AWS RDS, safeguarding data integrity.
Development Stack: AWS Redshift, AWS S3, AWS Data Pipe Lines, AWS Glue, Hadoop YARN, SQL Server, Spark, Spark
Streaming, Scala, Kinesis, Python, Hive, Snowflake, Informatica, Tableau, Talend, Cassandra, oozie, Control-M, Fivetran,
EMR, EC2, RDS, Dynamo DB Oracle 12c.

TD Bank, Charlotte, NC May 2019 to October 2021


Data Engineer
Responsibilities:
 Engineered and optimized data processing pipelines using Google Dataflow and GCP Dataproc to manage large-
scale data workflows efficiently.
 Managed and monitored GCP cloud services, including GCS (Google Cloud Storage) and Big Query, to support
data storage and complex query execution.
 Developed scalable data solutions in BigQuery for financial analytics and reporting, enhancing business decision-
making processes.
 Utilized GCP Dataprep to clean and prepare data, ensuring high-quality and consistency for analysis.
 Orchestrated data flow management across various platforms using Cloud Composer, improving operational
efficiency.
 Implemented real-time data streaming and messaging services using Cloud Pub/Sub, facilitating immediate data
availability and reaction to market changes.
 Programmed automation scripts in Python and Shell Scripts to streamline data operations and deployment
processes.
 Developed ETL pipelines using Apache NiFi and Talend to process and load data from various sources into
Snowflake.
 Implemented RBAC in Snowflake to enhance data security.
 Configured and maintained VPC Configuration and VPN Google-Client setups to ensure secure and private
network communications.
 Managed metadata across all data assets using Data Catalog, improving data discoverability and governance.
 Developed and deployed SSIS packages for data integration and migration tasks, reducing complexities in data
operations.
 Conducted performance tuning of Snowflake queries and schemas to optimize storage and compute costs.
 Designed and implemented data models and warehouses using SSAS to support comprehensive analytical
platforms.
 Integrated Grafana with Prometheus and Datadog to provide a centralized monitoring view of data services and
infrastructure across cloud environments.
 Created dynamic reports and dashboards using SSRS to visualize financial metrics and performance indicators.
 Engineered data integration and transformation solutions with DATASTAGE and QUALITYSTAGE, ensuring data
accuracy and quality.
 Configured Cloud Storage Transfer Service to automate data transfers between cloud storage solutions,
enhancing data availability and recovery strategies.
 Managed and optimized Cloud Spanner instances to handle critical transaction workloads, ensuring robustness
and scalability.
 Administered Cloud SQL databases to support application backends and operational reporting needs.
 Leveraged GCP Databricks for complex data analytics and machine learning operations, boosting predictive
capabilities.
 Maintained rigorous data security and compliance standards to meet the Financial Supervisory Authority (FSA)
regulations.
 Collaborated with cross-functional teams to translate business needs into scalable data solutions on the GCP
platform.
 Designed and implemented customized Jira workflows, dashboards, and automation to enhance project tracking
and reporting, driving improved collaboration and productivity across Agile team
 Conducted regular system audits and updates to ensure the security and efficiency of data systems.
 Provided 24/7 technical support and troubleshooting for data-related issues, ensuring high availability and
minimal downtime.
 Engaged in capacity planning and performance tuning of data stores and cloud resources to meet growing data
demands.
Development Stack: Google Dataflow, GCP, GCS, BigQuery, GCP Dataprep, GCP Dataflow, GCP Dataproc, Cloud Composer,
Cloud Pub/Sub, python, shell scripts, Snowflake, Federated Queries, VPC Configuration, Data Catalog. VPN Google-Client,
Pub Sub, SSIS, SSAS, SSRS, DATASTAGE, QUALITYSTAGE, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data
Catalog, GCP Databricks.

Duke Energy, Charlotte, NC September 2017 to April 2019


Data engineer
Responsibilities:
 Designed and developed data ingestion frameworks using Azure Data Factory to streamline data flow from
various sources into the Azure ecosystem.
 Managed and maintained Azure SQL databases and SQL Server 2017, ensuring high performance and availability
for critical healthcare applications.
 Implemented Azure Databricks to perform complex data processing and analytics, enhancing research
capabilities and patient data analysis.
 Configured and utilized Azure Event Hubs for real-time data streaming, supporting timely updates and
notifications in patient care systems.
 Developed data pipelines using Apache Kafka to handle large-scale data streams efficiently, improving data
ingestion and event processing.
 Programmed automation and data handling scripts using Python on Unix systems to support data operations and
analytical processes.
 Utilized Power BI and Tableau to create dynamic data visualizations and dashboards, providing actionable
insights for medical staff and researchers.
 Engineered and executed data storage solutions on Hadoop using Hive and MapReduce, optimizing data
processing for large datasets.
 Managed Teradata database systems for complex query execution and analysis, enhancing data-driven decision-
making in clinical settings.
 Integrated SQL and T-SQL programming for data manipulation and retrieval, supporting various research and
operational needs.
 Orchestrated and monitored data workflows using Azure Service Bus, ensuring reliable data transfer between
different systems and services.
 Leveraged Azure Synapse to combine big data and data warehousing, simplifying access and analysis of large
volumes of data.
 Maintained Azure Databricks GIT Hub repositories to manage code versions and collaboration for data projects.
 Developed and maintained security protocols and compliance measures within the Azure platform to protect
patient data and comply with healthcare regulations.
 Automated data processes using SFDC to enhance data flow and integration with sales and customer
management systems.
 Conducted performance tuning and optimization of data queries and scripts to improve speed and efficiency
across data platforms.
 Collaborated with healthcare professionals and research teams to understand data requirements and deliver
tailored data solutions.
 Administered routine database backups and disaster recovery operations to ensure data integrity and availability.
 Provided technical support and training for medical staff and researchers on using data applications and tools
effectively.
 Participated in cross-functional teams to drive innovations in data management and analytics for healthcare
applications.
 Monitored data storage and usage to ensure it meets the strategic needs of the clinic and complies with legal
and regulatory requirements.
 Evaluated new technologies and tools within the Azure ecosystem to keep the data infrastructure modern and
efficient.
 Documented data processes and code changes to maintain a clear audit trail for compliance and operational
continuity.
 Assisted in migrating legacy systems to Azure, minimizing disruption and aligning with modern cloud practices.
 Led data governance initiatives to ensure data quality and consistency across all platforms and systems.
Development Stack: Apache Kafka, Azure, Python, power BI, Unix, SQL Server, Hadoop, Hive, Map Reduce, Teradata,
SQL, Azure event hubs, Azure synapse, Azure data factory, Azure Databricks, Azure Databricks GIT Hub, Azure Service
Bus, Azure SQL, SQL Server 2017, Tableau, Power BI, SFDC, SQL, T-SQL, Hive.
Valuefy solutions, India July 2014 to April 2017
Hadoop Spark Developer
Responsibilities:
 Implemented and managed Hadoop clusters to support data processing projects, optimizing data storage and
retrieval operations for enhanced system efficiency and reliability. Utilized HBase for database management,
ensuring high availability and scalability.
 Developed and deployed microservices in an Azure cloud environment, facilitating scalable and efficient
application components. Leveraged Java 1.7 for service development, focusing on clean and maintainable code.
 Designed and executed MapReduce jobs for large-scale data processing, significantly improving data analysis and
processing speeds. Applied best practices in Agile environments, ensuring rapid iteration and responsive
development processes.
 Configured and maintained Hadoop ecosystem components, including HBase, Hive, and Spark, to support
complex data analytics projects. Ensured systems were optimized for performance and scalability.
 Integrated Kafka for real-time data processing, enabling efficient data streaming and processing capabilities. This
facilitated the development of responsive data-driven applications.
 Utilized JDBC to connect application layers with databases, ensuring seamless data integration and interaction.
Enhanced application responsiveness and data processing capabilities.
 Experienced in building and improving Hadoop applications for handling big data tasks effectively.
 Proficient in connecting Hadoop backend with user-friendly interfaces using Angular and React, ensuring smooth
interaction with the data.
 Applied JSON for data interchange between services, enhancing the interoperability and flexibility of data
Implemented Hive queries for data summarization, query, and analysis, enabling efficient access to large datasets
stored in Hadoop ecosystems. Optimized query performance to meet project requirements.
 Ensured data was efficiently parsed and processed in distributed systems.
 Developed robust Spark applications for fast data analytics and processing, significantly reducing data processing
times and facilitating more insightful data analysis.
 Employed Pig scripts to perform data transformations and processing tasks, simplifying complex data operations,
and improving the efficiency of data handling processes.
 Participated in Agile sprint planning, reviews, and retrospectives, contributing to the continuous improvement of
development processes and practices. Ensured project alignment with client objectives and timelines.
Development Stack: Hadoop, Azure, Microservices, Java 1.7, MapReduce, Agile, HBase, JSON, Spark, Kafka, JDBC, Hive,
JSON, Pig.

You might also like