0% found this document useful (0 votes)
134 views7 pages

Mourya K Data Engineer

This document contains a resume summary and experience details for a data engineer. It outlines the candidate's 10 years of experience working with big data technologies like Hadoop, Spark, Hive, and cloud platforms. It also lists technical skills and past work experience designing data pipelines and data warehouses.

Uploaded by

sarvotham53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views7 pages

Mourya K Data Engineer

This document contains a resume summary and experience details for a data engineer. It outlines the candidate's 10 years of experience working with big data technologies like Hadoop, Spark, Hive, and cloud platforms. It also lists technical skills and past work experience designing data pipelines and data warehouses.

Uploaded by

sarvotham53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Mourya Kodidela

Email Id: [email protected]


Number: 469-415-5019

Professional Summary:
 Seasoned data engineer with 10 years of experience crafting data-intensive applications using Big Data
Ecosystems, Cloud Computing Services, Data Warehousing, Visualization, and Reporting Tools.
 Expertly navigated the Hadoop framework, steering analysis, design, development, documentation, deployment,
and SQL integration within big data technologies.
 Showcased mastery across pivotal Hadoop ecosystem components—HDFS, YARN, MapReduce, Apache Spark,
Apache Sqoop, and Apache Hive —critical to fostering robust data engineering.
 Proficiently handled data integration from various sources, encompassing RDBMS, NoSQL databases,
spreadsheets, text files, JSON files, and delimited files.
 Experienced with Apache Spark, improving the performance and optimization of the existing jobs using Spark
Context, Spark-SQL, and DataFrame API, and worked explicitly on PySpark and Scala.
 Successfully designed and implemented a scalable data processing solution on Google Cloud Platform(GCP) by
utilizing Google Cloud Storage for data storage, provisioning Google Data Proc Cluster for distributed data
processing, orchestrating tasks with Google Dataproc Workflow Templates, and leveraging Google BigQuery for
high-speed data analytics. This streamlined workflow improved data analysis efficiency and actionable insights,
ultimately enhancing business operations.
 Guided the complete design and implementation of diverse projects, leveraging ETL/Visualization tools and
showcasing proficiency in big data, Cloud Computing, and in-memory applications.
 Implemented and scheduled data pipelines using Apache Airflow to automate data ingestion, transformation,
and loading (ETL) processes, which included designing Directed Acyclic Graphs (DAGs) to define workflows,
utilizing operators for various data processing tasks and configuring the Airflow scheduler to ensure timely
execution of pipelines
 Experience in Dimensional Modeling using Snowflake schema methodologies of Data Warehouse and Integration
projects.
 Automated workflow processes using Control-M for scheduling Batch jobs, setting up dependencies, and
generating reports using custom scripts, enhancing efficiency and data accessibility.
 Experience in Data modeling for Data Mart/Data Warehouse development including conceptual, logical and
physical model design, developing Entity Relationship Diagram (ERD), reverse/forward engineer (ERD) with CA
Erwin data modeler.
 Experience in extracting, transforming and loading (ETL) data from spreadsheets, database tables and other
sources using Microsoft SSIS.
 Have experience in Dimensional Modeling using Snowflake schema methodologies of Data Warehouse and
Integration projects.
 Experience in building and optimizing AWS data pipelines, architectures, and data sets.
 Experience in managing and reviewing Hadoop log files .
 Strong Experience in Data Migration from RDBMS to Snowflake cloud data warehouse.
 Excellent knowledge in designing and developing dashboards using QlikView by extracting the data from multiple
sources.
 Experience in Data Transformation and Data Mapping from source to target database schemas and also data
cleansing.
 Offers crucial production support, diligently identifying root causes, resolving bugs, and promptly updating
stakeholders on production matters.
 Highly skilled in AWS, Snowflake Database, Python, Oracle, Exadata, Informatica, SQL, PL/SQL, bash scripting,
Hadoop, Hive, Databricks.
 Experience in data stream processing using Kafka (Zookeeper for developing data pipelines with PySpark
 Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system
using ER-WIN and Infosphere.
 Expertise in all aspects of Agile SDLC from requirement analysis, Design, Development Coding, Testing,
Implementation, and maintenance.

1
 Experience with Airflow to schedule ETL jobs and Glue and Athena to extract the data from AWS data
warehouse.
 Designed NoSQL, Google BigQuery for transforming unstructured data to structured data sets.
 Experience with container based deployments using Docker, working with Docker images, Docker registries.
 Hands-On experience on Analyzing SAS ETL, Implementation of Data integration in Informatica using XML,
Webservices, SAP ABAP, SAP IDoc.
 Created various types of reports such as complex drill-down reports, drill through reports, parameterized
reports, matrix reports, Sub reports, non-parameterized reports and charts using reporting services based on
relational and OLAP databases.
 Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation,
and aggregation from multiple file formats.
 Experienced with Teradata utilities Fast Load, Multi Load, BTEQ scripting, Fast Export, SQL Assistant and Tuning
of Teradata Queries using Explain plan
 Established end-to-end CI/CD pipelines for data workflows using tools like Jenkins and GitLab, ensuring version
control, automated testing, validation, and deployment of data pipelines.
 Possess critical communication, analytical, and leadership skills, adeptly navigating independent and
collaborative work settings.
 Dedicated to ensuring data accuracy and integrity through validation frameworks, automated testing, and
anomaly detection. Recognized for enhancing data reliability, which directly improves downstream analytics.
 Exhibited a robust grasp of the Software Development Life Cycle (SDLC), showcasing adeptness in testing
methodologies, task execution, resource management, scheduling, and related disciplines, and implemented
various data warehouse projects in Agile Scrum/Waterfall methodologies.

Technical Skill Set:


Technologies/Frameworks Apache Sqoop, Apache Spark, Apache Hive, Apache HBase, Apache
MapReduce, Pyspark, Apache Hue, Apache Oozie, Apache Pig,
Gitlab, Control-M, Apache Airflow, Jenkins, Google Cloud Storage,
Google Data Proc Cluster, Google BigQuery, Google Dataproc
Workflow Templates, Figma, Confluence, Screwdriver, Jira.
Programming Languages Python, SQL, Bash Scripting, Scala.
Database/Data Lake/Data Warehouse IBM DB2, IBM Sailfish, Microsoft SQL Server, PostgreSQL, Apache
HBase, Apache Hive, Google BigQuery, Snowflake.
Cloud Technologies Snowflake, SnowSQL, SnowpipeAWS, Google Cloud, AWS.
Operating Systems Windows, UNIX, LINUX.
Development Methodologies Agile/Scrum, Waterfall.
Visualization/Reporting Tools Tableau, Power BI, Figma, Matplotlib.
Professional Experience:
Client: Walmart Inc(Remote)
Feb 2022 – Present
Role: Sr. Data Engineer
Engineered backend data solutions catering to digital marketing analytics, spotlighting essential KPIs including GMV,
spend, visits, YOY and YTD growth, conversion rates, and market share versus competitors, integrated diverse
demographic and survey-based preference data, enabling targeted strategies across demographics and ethnicities
for Marketing Analytics Data Engineering Team.
Responsibilities:
 Design, implement and maintain all AWS infrastructure and services within a managed service environment.
 Worked on data governance to provide operational structure to previously ungoverned data environments.
 Involved in ingestion, transformation, manipulation, and computation of data using kinesis, SQL, AWS glue and
Spark
 Participated in the requirement gathering sessions to understand the expectations and worked with system
analysts to understand the format and patterns of the upstream source data.
 Designed and implemented quality engineering (QE) processes and practices within data engineering workflows
to ensure data quality, integrity, and consistency.
 Spearheaded the development and maintenance of mission-critical backend data engineering pipelines
2
strategically designed to bolster the effectiveness of Tableau and Figma dashboards. Ensured the seamless
integration of data sources to empower data-driven decision-making across the organization.
 Developed PySpark based pipelines using spark data frame operations to load data to EDL using EMR for jobs
execution & AWS S3 as storage layer.
 Enterprise Data Lake was designed and set up to enable a variety of use cases, covering analytics, processing,
storing, and reporting on large amounts ofdata.
 Stayed updated with emerging technologies, tools, and trends in data engineering and quality engineering
domains, and recommended adoption of new technologies to improve efficiency and effectiveness.
 Collaboratedwith the clients and solution architectfor maintaining quality data points in source by carrying out
theactivities such as cleansing, transformation, and maintaining Integrity in a relational environment.
 Constructed sophisticated data pipelines by harnessing the capabilities of Apache Airflow, Apache Hive, Google
BigQuery, and PostgreSQL tables. Orchestrated the extraction, transformation, and loading (ETL) of disparate
data, delivering harmonized datasets ready for analysis and visualization.
 Collaborated seamlessly with the dashboard design and requirements teams, actively participating in
requirement gathering sessions and fostering a collaborative environment to bridge the gap between technical
intricacies and visualization needs.
 Mentored junior data engineers and quality engineers, providing guidance on best practices, methodologies, and
tools.
 Worked on Dimensional Data modelling in Star and Snowflake schemas and Slowly Changing Dimensions(SCD).
 Leveraged SQL as the cornerstone of pipeline development, harnessing its power to orchestrate complex data
transformations, aggregations, and data enrichments. Employed a variety of Apache Airflow operators to
automate, monitor, and optimize pipeline workflows. Masterfully scheduled DAGs to ensure timely execution.
 Maintained a dynamic knowledge-sharing ecosystem by meticulously updating and maintaining Confluence
pages, enabling team members to access accurate and up-to-date information. This proactive approach
promoted transparency and consistency across the team.
 mplement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and
SnowSQL.
 Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major
regulatory and financial reports using advanced SQL queries in snowflake.
 Stage the API or Kafka Data(in JSON file format) into Snowflake DB by FLATTENing the same for different
functional services.
 Orchestrated effective collaboration with the update team (upstream) and downstream teams, ensuring
seamless data flow and efficient dependency management. Executed deployments to the Airflow server with
precision, reinforcing the reliability and performance of data processes.
 Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in
AWS.
 Demonstrated a commitment to data quality and operational excellence by generating and distributing daily
reports, notifications, and alerts. Proactively implemented monitoring mechanisms to detect and rectify
anomalies, safeguarding data integrity.
 Thrived in a fast-paced environment, actively responding to ad hoc requests, bug fixing, and critical issue
resolution. Played a pivotal role in maintaining the reliability and availability of pipelines, enabling the business
to operate smoothly and make informed decisions.
 Took charge of deployments, swiftly responding to evolving business needs by implementing changes and
enhancements to data pipelines. This agile approach ensured the adaptability and relevance of data solutions.
 Collaborated seamlessly with cross-functional teams, employing a holistic approach to problem-solving and
promoting knowledge sharing. Engaged in constructive communication with team members, stakeholders, and
leadership to drive project success.
Skill Set: Apache Hive, Google Cloud Storage, Google Data Proc Cluster, Google BigQuery, Apache Airflow, Tableau,
Figma, Postgres SQL, SQL, Confluence, GitLab, Jira.

Client: Verizon Communications, Inc.(Remote)


Feb 2020–Jan 2022
Role: Data Engineer
Spearheaded a comprehensive data migration initiative, orchestrating the seamless transition of on-premises jobs to
the Google Cloud Platform (GCP) ecosystem. Efforts included strategizing, planning, and executing the migration
3
while ensuring minimal disruptions to business operations.
Responsibilities:
 Skillfully executed migrating existing jobs to Google BigQuery, harnessing the power of advanced BQ commands
to optimize data processing efficiency and storage utilization. Notably, resolved persistent performance issues
arising from small file fragmentation, boosting overall job performance and system responsiveness.
 Engineered and implemented intricate ETL pipelines utilizing Apache Airflow, effectively transitioning jobs from
the legacy Apache Oozie platform. The transition resulted in heightened scheduling flexibility and reduced job
failure rates, streamlining operations and reducing operational downtime.
 Actively participated in cross-functional meetings and discussions to gather requirements, define user stories,
and prioritize tasks for data engineering and quality engineering initiatives.
 Leveraged the efficiency of Google Workflow Templates for meticulous bug fixing, rigorous testing, and rapid
troubleshooting, elevating workflow efficiency. This approach expedited issue resolution and fortified data
integrity throughout the data processing cycle.
 Collaborated synergistically with cross-functional teams encompassing cloud architects, data scientists, and
analysts, offering strategic insights and technical acumen to drive innovative data migration strategies and
solutions.
 Worked with processes to transfer/migrate data from AWS S3/Relational database and flat files common staging
tables in various formats to meaningful data into Snowflake.
 Championed converting legacy Apache Pig jobs to the high-performing Apache Spark framework, reducing
processing times and modernizing them.
 Setup Snowflake Stage and Snowpipe for continuous loading of data from S3 buckets into landing table.
 Develop Snowflake Procedure to perform transformations, load the data into target table and purge the stage
tables.
 Created Snowflake Tasks for scheduling the jobs.
 Configured and orchestrated migrated jobs on GCP, ensuring seamless integration and execution within the
cloud environment.
 Ensured data quality, integrity, and security throughout the migration process, implementing best practices and
compliance measures.
 Setting up data pipelines in stream sets to copy data from oracle to Snowflake.
 Create dashboards on snowflake cost model, usage in QlikView.
 Worked closely with data scientists and analysts to provide clean and structured data for analysis, enabling data-
driven decision-making.
 Harnessing the capabilities of the Google Cloud Platform, orchestrated the deployment and management of
dynamic data solutions. This approach effectively harnessed cloud-native services, optimizing resource
allocation, scalability, and cost-efficiency.
 Writing SQL scripts for applying the transformation logic.
 Transformed data processing by replacing legacy jobs with Google BigQuery, utilizing BQ commands for
enhanced performance.
 Presented the meticulously crafted findings to key stakeholders, resulting in the organizational adoption of
Apache Airflow as the preferred workflow orchestration platform.
 Proficiently managed Google DataProc ephemeral clusters, tailoring cluster configurations to workload
requirements and optimizing resource utilization.

Skill Set: Pyspark, Shell Scripting, Spark-SQL, Apache Hive, Apache Pig, Google BigQuery, Apache Airflow, Apache
Oozie, Google Cloud Storage, Google Dataproc Clusters, GitLab, Google Workflow Templates, Screwdriver.

Client: Florida Blue, Jacksonville, FL


Jan 2016 – Jan 2020
Role: Data Engineer
Data-driven Applications for Commercial Analytics Team Product involves brewing Spark-Scala for data ingestion
(inputs from HDFS, S3, Apache Hive, DB2, Microsoft SQL Server), data refinery (various transformations like a filter,
merge, join, enrichment), and data store components.
Responsibilities:
 Participated in all phases of data mining; data collection, data cleaning, developing models, validation,

4
visualization and performed Gap analysis.
 Created impactful Big Data projects utilizing Spark in conjunction with Scala, seamlessly integrating with essential
tools from the Hadoop Ecosystem, including YARN, MapReduce, Apache Hive, Apache Sqoop, and Control-M.
 Constructed efficient dataflow pipelines to migrate Hedis medical data from diverse origins like SQL Server, DB2,
and files. Employed Spark-Scala and an Ingestion Framework to enforce transformation rules and validation,
optimizing data movement to the target platform.
 Operated within dynamic environments encompassing Apache Hive, S3 buckets, and frameworks related to data
ingestion. Facilitated seamless integration with Dashboards and orchestrated workflows using Netflix conductor.

 Developed ETL (Extract, Transform, Load) jobs to automate data integration processes, improving data accuracy
and reducing processing time.
 Performed ad-hoc SQL queries on data stored in Cloud Storage buckets, enabling rapid data analysis and
providing actionable insights to stakeholders, resulting in data-driven decision-making and improved business
processes.
 Orchestrated and managed data processing clusters, leveraging Apache Spark for data analysis, leading to
actionable insights for business stakeholders.
 Optimized database performance by implementing best practices, resulting in a reduction in query execution
time and enhanced data retrieval capabilities.
 Managed the scheduling of numerous scalable, independent end-to-end data migration jobs, expertly employing
scheduling tools such as Control-M.
 Managed data migration from SQL Server and Teradata to Amazon S3 and structured a data service layer in Hive.
Worked with databases like Oracle, MySQL, DB2, and Postgres.
 Spearheaded CD/CI initiatives with Jenkins and Shell Scripting, streamlining deployments and optimizing code
reuse.
 Patterns observed in fraudulent claims using text mining in R and Hive.
 Exported the data required information to RDBMS using Sqoop to make the data available for the claims
processing team to assist in processing a claim based on the data.
 Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in
partitioned tables in the EDW.
 Managed seamless data migrations, especially Oracle to S3 and Teradata to Snowflake, ensuring smooth on-
prem to cloud shifts.
 Developed PySpark-based Spark applications, extracting vital customer usage patterns through data analysis.
 Perform advanced SQL queries to pull data from AWS Redshift, track KPIs, and automate monthly report
refreshing using Power BI dashboards along with ad-hoc analysis and reporting.
 Applied diverse transformations and actions to Spark DataFrames, aligning them with specific business
requisites.
 Deployed Spark-SQL to load Parquet data efficiently, created Case class-defined Datasets, and effectively
managed structured data. This culminated in storing data in Hive tables for downstream utilization.
 Collaborated with cross-functional teams, including data scientists and analysts, to understand data
requirements, ensuring the provisioning of clean, accurate, and structured data for analysis and reporting.
 Expertly managed extensive datasets through strategic partitioning, harnessed Spark's in-memory capabilities,
optimized performance with efficient joins, and adeptly employed transformations during the data ingestion.
 Utilized version control systems and collaborative platforms to streamline code management, fostering
enhanced team productivity, seamless code review, and adequate knowledge sharing.
 Created Scala scripts and UDFs utilizing DataFrames in Apache Spark to aggregate and process data.
 Ensured smooth operations by monitoring production jobs, identifying and resolving errors, reprocessing failed
batch jobs, and communicating issues to stakeholders.
 Enhanced product performance through judicious selection of file formats (Avro, ORC), resource allocation,
optimized joins, and efficient transformations.

Skill Set: HDFS, S3, Apache Hive, Apache Hue, DB2, Microsoft SQL Server, YARN, HBase, MapReduce, Scala, Apache
Sqoop, Control-M, Spark-SQL, Netflix conductor, Shell Script, UDFs, Ingestion Framework, GitLab, Jenkins, S3
storage.

5
Client: Star Network, India
Jun 2013 - Dec 2014
Role: Data Engineer
Horizon and Bloom, internal products within the organization, provide revenue and viewership KPIs to diverse
teams. Collaborating with the backend team, I contributed to the development of scalable Data Lake solutions and
applications supporting the functionalities of Horizon and Bloom.
Responsibilities:
 Contributed to seamlessly transferring and transforming extensive volumes of structured, semi-structured, and
unstructured data from relational databases into HDFS using Apache Sqoop imports.
 Constructed robust distributed data solutions on the Hadoop framework, ensuring scalability and optimal data
handling.
 Formulated Apache Sqoop Jobs and Hive Scripts to extract data from relational databases, comparing it against
historical data for insightful analysis. Utilized reporting and visualization tools like Matplotlib to present data
insights effectively.
 Building data pipeline ETLs for data movement to S3, then to Redshift.
 Designed and implemented ETL pipelines between from various Relational Data Bases to the Data Warehouse
using Apache Airflow.
 Worked on Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
 Developed SSIS packages to Extract, Transform and Load ETL data into the SQL Server database from the legacy
mainframe data sources.
 Worked on Design, Development and Documentation of the ETL strategy to populate the data from the various
source systems using Talend ETL tool into Data Warehouse.
 Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation
and Materialized views to optimize query performance.
 Developed logistic regression models (using R programming and Python) to predict subscription response rate
based on customer’s variables like past transactions, response to prior mailings, promotions, demographics,
interests and hobbies, etc.
 Created Tableau dashboards/reports for data visualization, Reporting and Analysis and presented it to Business.
 Assumed a pivotal role in implementing dynamic partitioning and Bucketing techniques to enhance data
organization within Hive Metadata
 Mastery in converting extensive structured and semi-structured data, harnessing state-of-the-art methodologies
 Created and managed multiple Apache Hive tables, populating them with data and crafting Hive Queries to drive
internal processes.
 Spearheaded the maintenance and monitoring of reporting jobs, employing Jenkins for continuous integration
and deployment.
 Facilitated smooth data flow to downstream consumption teams through proactive meetings, ensuring
consistent and reliable access for end users.
 Created custom T-SQL procedures to read data from flat files to dump to SQL Server database using SQL Server
import and export data wizard.
 Design and architect various layer of Data Lake.
 Developed ETL python scripts for ingestion pipelines which run on AWS infrastructure setup of EMR, S3, Redshift
and Lambda.
 Monitoring big query, Dataproc and cloud Data flow jobs via Stack driver for all environments.
 Configured EC2 instances and configured IAM users and roles and created S3 data pipe using Boto API to load
data from internal data sources.
 Developed data stage jobs to cleanse, transform, and load data to the data warehouse, complemented by
sequencers to encapsulate the job flow.
 Conducted comprehensive Data Analysis and Profiling, generating both scheduled and ad-hoc reports for users.
 Engineered Bash scripts to enable seamless integration of big data tools and to manage error handling and
notifications.

Skill Set: Apache Hadoop, Apache Sqoop, RDBMS, HDFS, Python, Apache Hadoop, Matplotlib, Bash, Gitlab,
Jenkins.

6
Education:
 Master of Science in Computer Science, Florida State University, Tallahassee, FL. 2017
 Bachelor of Technology in Computer Science & Engineering, Jawaharlal Nehru Technological University,
Anantapur, India. 2013

You might also like