0% found this document useful (0 votes)
51 views6 pages

Krishna Data - Engineer

Krishna Sai is a Senior Software Engineer with over 10 years of experience in data engineering and application development, specializing in scalable data solutions using Big Data technologies and cloud platforms like AWS, Azure, and GCP. He has expertise in Java, Python, and PySpark for high-performance data processing, and has successfully led strategic data initiatives to enhance data-driven decision-making. Currently seeking a dedicated data engineering role to leverage his extensive skills in delivering innovative solutions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views6 pages

Krishna Data - Engineer

Krishna Sai is a Senior Software Engineer with over 10 years of experience in data engineering and application development, specializing in scalable data solutions using Big Data technologies and cloud platforms like AWS, Azure, and GCP. He has expertise in Java, Python, and PySpark for high-performance data processing, and has successfully led strategic data initiatives to enhance data-driven decision-making. Currently seeking a dedicated data engineering role to leverage his extensive skills in delivering innovative solutions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Engineer

Krishna Sai
[email protected]
(669)3064480

SUMMARY

Accomplished Senior Software Engineer with over 10+ years of experience, specializing in data engineering and
application development. Expertise in designing and implementing scalable data solutions using Big Data technologies
and cloud platforms (AWS, Azure and GCP). Proficient in Java,databricks, Java Spark,Python and PySpark for high-
performance data processing and analytics. Proven ability to drive business growth through strategic data initiatives and
effective cross-functional collaboration. Seeking to leverage my extensive data engineering skills in a dedicated data
engineering role to deliver innovative solutions and enhance data-driven decision-making.
 Data Engineering Expertise: Over 6+ years of experience in data engineering, specializing in designing and
implementing scalable data architectures and solutions using GCP (Google Cloud Platform),Databricks and
Azure, AWS.
 Cloud Platform Proficiency: Extensive experience with GCP services such as BigQuery, Dataflow, and Dataproc.
Proven ability to leverage AWS tools including Glue, and EMR for efficient data storage, processing, and
analysis.
 Big Data Technologies: Proficient in using Java, Java Spark, and PySpark to handle large-scale data processing
and analytics. Skilled in optimizing databricks/Spark jobs and Hive SQL queries for enhanced performance.
 Data Pipeline Development: Expertise in developing and managing ETL pipelines to ensure seamless data
integration and transformation. Experienced with Airflow for orchestrating complex workflows and managing
data workflows.
 Scalable Data Solutions: Demonstrated ability to design and implement scalable data solutions that drive business
growth. Experience in building and optimizing data warehouses and data lakes for large datasets.
 Cross-Functional Collaboration: Adept at working with cross-functional teams to align data engineering projects
with business objectives. Proven track record of effective communication and collaboration to achieve project
goals.
 Strategic Data Initiatives: Led strategic data initiatives to enhance data-driven decision-making and business
intelligence. Experienced in implementing data governance and ensuring data quality and accuracy.
 High-Performance Analytics: Utilized advanced analytics and machine learning techniques to derive actionable
insights from complex data sets. Skilled in integrating AI and deep learning models into data solutions for
improved outcomes.
 Working as Azure databricks to run spark-python notebooks through ADF pipelines
 Using databricks utilities called widgets to pass parameters on runtime from ADF to data bricks.
 Cloud Resource Management: Proficient in managing cloud resources and infrastructure using Terraform.
Experienced in automating cloud deployments and optimizing resource utilization for cost-effectiveness.
 Technical Leadership: Provided technical leadership and guidance in data engineering projects. Mentored junior
team members and led efforts to solve complex data challenges.
 Innovation and Problem-Solving: Known for innovative problem-solving and the development of cutting-edge
solutions. Demonstrated ability to tackle complex data engineering challenges and deliver impactful results.

Certifications:

[1] Java Certification Course.Issued By Simplilearn


[2] Mongodb Certified Developer,Associate(c100dev).Issued By MongoDB University
Technical Skills:

Operating Systems Windows, MAC OS, Unix/Linux, Sun Solaris, RHEL, Ubuntu, Fedora, Mac
Languages Python, Java, J2EE, JavaScript. SQL
Java Libraries Apache Spark, Spring Framework, Hibernate, Log4j, SLF4J, Apache Commons, JUnit,
Maven, Ant, Guava
Big Data Apache Spark (Java and PySpark), BigQuery
Technologies
ETL & Data Sotero (data management and sharing) ,Databricks
Integration
Cloud Platforms Google Cloud Platform (GCP): BigQuery, Dataflow, Dataproc, Cloud Storage, Compute
Engine
Java Web Spring Framework, Hibernate, JavaServer Faces (JSF), Apache Struts, Play Framework,
Frameworks Vaadin
Databases RDBMS (Oracle, SQL Server, MySQL), No SQL (Mongo DB). HDFS (Hadoop),
Cassandra, PostgreSQL. SQLite, AWS
Building& Design Apache Ant. Apache Maven, Buck, Bit-Bake, Boot, Grunt Gulp, UML, IBM Rational
Tools Rose, Ansible. JIRA GNU Debugger Bugzilla.
Development Tools AWS, Docker, Selenium, Eclipse, Jenkins, Coverity, Pylint

Web Technologies Servlets, JSP, AJAX, HTML5, CSS3, XPath, JavaScript, jQuery, Web Services (SOAP,
RESTful)

EDUCATION
Vikrama Simhapuri University, India Oct 2012 – May 2016
Bachelor of Technology in Computer Science

PROFESSIONAL EXPERIENCE

Client: (Exafluence, NJ) | Jul 2021 – Present.


Role: SR.Data Engineer

Description: Together with our colleagues, customers and stakeholders, we impact life and health with science. Before
researchers can make scientific breakthroughs, they must have access to state-of-the-art tools, services and expertise to
perform experiments and engineer new products. That’s where we come in.
Project Focus: Data Engineering
Responsibilities:
 Optimized Query Performance: Implemented advanced partitioning and bucketing strategies in Hive and BigQuery to
enhance query performance and reduce storage costs. Utilized indexing and materialized views to further improve
query efficiency.
 Lead Sre To Design And Onboard Legacy Apllications To Gcp.Design And Implemented Gke Clusters With Also Is-
tio.Worked With Devops,Architects And Apllication Team To Build Fully Automated Gcp Resources Creation And
Apllication Deployement By Levaraging JENKINS/HELM/TERRAFORM
 Conduct Poc And Initial Assessment Of Gcp Product To Create Design Pattern Per Customer.
 Experience In Architecting And Designing Solutions Leveraging Gcp Services Like GCE/GKE,Cloud Data To Run
Enterprise Apllications
 Proficient in creating data pipelines using Azure coud ADFv2 components such as move and transform, copy , filter
for each databricks etc.
 ETL Migration and Optimization: Spearheaded the migration of complex SSIS logic to Java Spark,databricks, trans-
forming intricate SQL procedures into streamlined Spark ETL processes. Conducted extensive code optimization and
performance tuning, resulting in notable reductions in processing time and significant cost savings.
 Workflow Automation: Developed and maintained robust ETL pipelines using Apache Airflow Composer, orchestrat-
ing ETL jobs with improved automation. Enhanced workflow efficiency, enabling more effective resource allocation
and higher returns on investment.
 Data Pipeline Engineering: Engineered and managed data pipelines using Apache Hive,databricks, SQL, and Mon -
goDB technologies. Designed and implemented data ingestion, transformation, and storage processes to handle large-
scale datasets efficiently.
 Java Spark: Leveraged Java Spark for distributed data processing and real-time analytics. Optimized Spark jobs for
performance and resource efficiency, handling large-scale data transformations and computations.
 REST API Integration: Designed and implemented secure REST APIs for integration with third-party applications,
ensuring data encryption and decryption. Streamlined data security processes to meet compliance standards and re -
duce overhead costs.
 ETL Process Design: Architected ETL processes to load patient data from various source systems into dimension ta -
bles, ensuring accuracy, consistency, and reliability of data representation.
 Cost-Saving Initiatives: Achieved significant reductions in client expenses through strategic optimizations such as
partitioning and bucketing in Hive and BigQuery,Databricks as well as storage and indexing improvements in Mon -
goDB and Oracle.
 Cloud Collaboration: Collaborated with cloud providers to leverage Spark’s parallel computing capabilities for cost-
saving opportunities. Optimized performance and scalability while delivering substantial savings on client projects.
 Data Storage Optimization: Enhanced data storage and indexing strategies in MongoDB and Oracle. Improved query
performance and reduced storage overhead through strategic optimization of data structures.
 Data Validation and Quality Assurance: Utilized PySpark,databricks for data validation and quality assurance, au-
tomating validation processes to improve efficiency and accuracy. Developed data quality frameworks and imple-
mented regular data audits.
 Data-Driven Strategies: Collaborated with cross-functional teams to analyze client data and implement data-driven
strategies. Achieved improvements in client profit margins through data analysis and actionable insights.
 Mentorship and Team Collaboration: Provided mentorship and technical guidance to junior team members, fostering
their professional growth. Coordinated with offshore teams to address performance bottlenecks and scalability issues,
offering recommendations for optimization.
 Data Ingestion and Storage Management: Managed data ingestion pipelines and ETL processes using GCP services
such as Cloud Storage, BigQuery,Databricks, Dataflow, and Composer. Ensured seamless data flow and reliable stor-
age solutions.
 Performance Optimization: Tuned data structures and queries for performance and scalability. Managed BigQuery de-
ployments, optimized object provisioning, and automated capacity management.
 Monitoring and Backup: Implemented monitoring solutions to track data pipeline health and performance. Conducted
regular data backups and established disaster recovery procedures to mitigate data-related risks.
 Infrastructure Documentation: Created and maintained documentation for data infrastructure and workflows. Collabo -
rated with teams to ensure infrastructure and deployment practices were well-documented and followed.
 Large Dataset Processing: Leveraged PySpark for processing and analyzing large datasets. Utilized Spark’s distrib -
uted computing capabilities for efficient data handling and real-time analytics.
 NoSQL Database Management: Managed NoSQL databases with MongoDB,databricks including data migration and
optimization. Utilized Mongo Atlas for secure and scalable cloud-based database solutions.
 Applied experience with databricks, grafana and power BI.
 Microservices Deployment: Deployed and managed microservices. Ensured efficient deployment and scaling of ser-
vices to meet project requirements.
 CI/CD Pipeline Integration: Integrated microservices into CI/CD pipelines. Utilized Jenkins and Git for version con -
trol and automated deployment processes, enhancing continuous delivery practices.
 Sotero Integration: Implemented data storage and management solutions using Sotero, focusing on enhancing data or-
ganization and retrieval. Coordinated with teams to integrate Sotero with existing data systems for improved data
management.
 Bug Tracking and Issue Management: Used JIRA for tracking and managing bugs/issues, maintaining detailed
records of resolutions and project tasks. Managed project progress and communicated status updates effectively.
 Collaboration and Communication: Facilitated regular team meetings, provided progress updates, and ensured align-
ment with project goals. Collaborated with cross-functional teams to ensure successful project delivery.

Environment: Python 3.7, Java Spark, Apache Airflow, Hive, BigQuery, MongoDB, Oracle, GCP (Cloud Storage, Big-
Query, Dataflow, Composer), PySpark, SQL, JIRA, Sotero.

Client: (FX ABS SOFTWARE SOLUTIONS PVT LTD - India) | Jan 2019 – Jun 2021
Role: Data Engineer

Description: FX ABS Software Solutions Pvt.Ltd is Software Services and Data Analytics company focused on solving
complex business problems leveraging Big Data and Analytics utilizing new technologies and frameworks. Our
accelerators and solutions use advanced technologies in IOT, Big Data, Machine learning to provide actionable insights.
We develop expedited solutions built on micro services architecture using industry standard data models and KPI
repositories for Life sciences, Financial services and Healthcare.
Project Focus: Data Engineering
Responsibilities
 Designed and implemented databricks/Spark programs to parse raw data, populate staging tables, and store refined
data in partitioned tables within the Enterprise Data Warehouse, enhancing data processing efficiency and reliability.
 Managed Migration Efforts: Led the migration from Hadoop,databricks to Airflow (Astronomer),databricks and Big-
Query, ensuring a seamless transition with minimal disruption to operations.
 Streamlined ETL Processes: Created and optimized Python code for ETL processes, data transformations, and data
loading mechanisms, resulting in improved data accuracy and operational efficiency.
 Architected GCP Solutions: Designed and deployed end-to-end data solutions on Google Cloud Platform (GCP), uti-
lizing BigQuery, Dataflow, BigTable, Pub/Sub, and Dataproc to address complex data processing requirements.
 Integrated Data Sources: Effectively integrated diverse data sources such as Oracle 11g, MySQL, Apache Hadoop,
and Cassandra, ensuring consistent data flow and integrity across systems.
 Orchestrated CI/CD Pipelines: Managed CI/CD pipelines using Docker and Kubernetes, optimizing integration and
deployment processes to improve time-to-market and reduce costs.
 Created notebooks in azure databricks and integrated it with ADF to automate the same.
 Leveraged BigQuery Features: Utilized advanced features of BigQuery, including nested fields, user-defined func -
tions (UDFs), and machine learning integration to enhance data analytics and predictive modeling.
 Optimized Data Storage: Applied storage optimization strategies with BigTable to efficiently manage large datasets,
improving data accessibility and query performance.
 Automated Infrastructure Provisioning: Implemented Ansible for automating infrastructure provisioning and deploy-
ment, ensuring consistency and efficiency across environments.
 Enhanced Performance Monitoring: Monitored and fine-tuned data pipelines and applications to optimize perfor-
mance and resource utilization, addressing bottlenecks proactively.
 Utilized IaC Tools: Employed Jenkins, GitLab CI/CD, and Terraform for infrastructure as code (IaC) and deployment
automation, improving operational efficiency.
 Implemented Scalable Storage Solutions: Designed and maintained scalable data storage solutions on GCP with Big -
Query,databricks, Firestore, and BigTable, optimizing data management and retrieval.
 Established Monitoring Systems: Set up comprehensive monitoring and alerting systems using Stackdriver, Cloud
Monitoring, and Prometheus to manage system health and performance.
 Managed Hadoop Clusters: Oversaw Hadoop cluster operations to ensure high availability and performance of big
data platforms.
 Experienced in databricks , hive sql, azure Ci/CD pipelines , deltslake,hadoop file system,snowflake.
 Executed Spark-based Analytics: Developed Spark-based analytics solutions for real-time data processing and in-
sights generation.
 Maintained Scala Code: Wrote and maintained Scala code for data processing tasks, contributing to a robust and
maintainable codebase.
 Configured AWS Environments: Managed AWS environment setup and maintenance, including data storage, com-
pute resources, and security configurations.
 Automated Data Workflows: Created and executed Python automation scripts for data ingestion and processing work -
flows.
 Provided Technical Leadership: Offered technical support and mentoring to junior developers, fostering a collabora-
tive and productive team environment.

Environment: Python 3.7, Java, databricks/Spark, Apache Airflow, Hive, BigQuery, MongoDB, Oracle, GCP (Cloud
Storage, BigQuery, Dataflow, Composer), PySpark, SQL, Ansible, Jenkins, GitLab CI/CD, Docker, Kubernetes, Stack -
driver, Cloud Monitoring, Prometheus, Scala, AWS, Terraform.

Client: (Young Minds Technology Solutions Pvt Ltd - India) | Jul 2015 – Dec 2018
Role: Data Engineer

Description: Young Minds Technology Solutions Pvt.Ltd is a digital and business solutions company based in
Hyderabad(India) providing Website Design & Development, Customized Web Application, Customized Software, Mobile
Application Design and Development, Digital Marketing Services Like Search Engine Optimization, Search Engine
Marketing, Social Media Optimization And many more. We develop software solutions that boost our customers to
outperform the competition and stay updated in today’s competitive business environment.
Responsibilities:
 Developed Web Application: Designed and implemented an online news reporting system that provides services for
posters and readers. All transactions are securely authenticated, allowing posters to submit news, events, or ads for ad-
min approval and enabling readers to log in, read, and respond to posts.
 Spring Framework Integration: Utilized the Spring Framework to create a robust and scalable backend architecture,
facilitating seamless interaction between different components of the application.
 Hibernate ORM: Employed Hibernate ORM for effective database management, ensuring smooth data persistence
and retrieval.
 Responsible for estimating the cluster size, monitoring and troubleshooting for the spark databricks cluster.
 Hibernate Validation Framework: Implemented Hibernate Validate Framework for validating persistent classes, main-
taining data integrity and consistency.
 Controller Development: Created Controllers to handle client requests efficiently, ensuring smooth interaction be-
tween the frontend and backend, and improving user experience.
 Apache Tomcat Server: Developed, tested, and deployed the application using the Apache Tomcat server, ensuring a
stable and reliable deployment environment.
 HQL and Criteria API: Employed Hibernate Query Language (HQL) and Criteria API to retrieve persistent objects
within the Data Access Object (DAO) module, optimizing data retrieval and manipulation.
 User Authentication and Authorization: Implemented secure authentication and authorization mechanisms to ensure
that only authorized users can post and view content.
 Frontend Development: Developed a user-friendly frontend using HTML, CSS,databricks and JavaScript, creating an
intuitive interface for posters and readers.
 Testing and Deployment: Conducted thorough testing to ensure application reliability and performance, and managed
the deployment process to ensure a seamless go-live experience.
 RESTful Web Services: Developed RESTful web services to facilitate communication between the client-side and
server-side components, ensuring efficient data exchange.
 Responsive Design: Ensured the web application was responsive and accessible on various devices and screen sizes
by using CSS media queries and Bootstrap.
 Performance Optimization: Optimized application performance by fine-tuning SQL queries, indexing database tables,
and leveraging caching mechanisms.
 Bug Tracking and Issue Resolution: Utilized JIRA for bug tracking and issue resolution, maintaining a detailed his -
tory of issues and their resolutions to improve future development processes.
 Documentation and Training: Created comprehensive documentation for the application, including user guides and
technical manuals, and provided training sessions to ensure smooth user adoption and technical handover.

Environment: Java, Spring Framework, Hibernate ORM, Hibernate Validate Framework, Apache Tomcat, HQL, Criteria
API, HTML, CSS, JavaScript, Bootstrap, MySQL, RESTful Web Services, JIRA.

You might also like