0% found this document useful (0 votes)
21 views9 pages

Swetha G

Swetha G is a Senior Data Engineer with over 10 years of experience in data engineering across AWS, GCP, and Azure platforms. She specializes in designing ETL processes, data migration, and building scalable data solutions using various programming languages and tools. Her professional experience includes roles at Walmart, Apple, and ALDI, where she developed data pipelines, optimized data workflows, and implemented data warehousing solutions.

Uploaded by

hr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views9 pages

Swetha G

Swetha G is a Senior Data Engineer with over 10 years of experience in data engineering across AWS, GCP, and Azure platforms. She specializes in designing ETL processes, data migration, and building scalable data solutions using various programming languages and tools. Her professional experience includes roles at Walmart, Apple, and ALDI, where she developed data pipelines, optimized data workflows, and implemented data warehousing solutions.

Uploaded by

hr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Swetha G

Senior Data Engineer

Senior Data Engineer | AWS, GCP, Azure Analytics Data Engineer

About Me: Around 10+ years of experience in Data Engineering with extensive experience across AWS, GCP, and Azure
cloud platforms. I excel in designing and optimizing ETL processes, data ingestion, and transformation pipelines using
Python, SQL, PySpark, Scala, and Java. With a strong background in data migration, modeling, governance, and
orchestration, I leverage both relational (MySQL, MS SQL) and NoSQL (Cassandra, etc.) databases to build scalable, data-
driven solutions that empower businesses with actionable insights.

PROFESSIONAL PROFILE:
Big Data & Hadoop:
 Experience in implementing the best practices and design patterns for the Data Lake, enterprise data warehouse,
and domain-specific data marts.
 Integrating Oozie workflows with data ingestion tools and frameworks such as Sqoop, Flume, or Kafka to
ingest data from external sources into Hadoop data lakes or warehouses.
 Expertise in Big Data architecture like Hadoop (MS Azure, Hortonworks, and Cloudera) distributed systems.
 Expertise in working with Big Data/ Hadoop and Yarn architecture along with various Hadoop Daemons such as
JobTracker, TaskTracker, NameNode, DataNode, Resource/ Cluster Manager, Kafka (distributed stream-processing).
 Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain data consistency
which is important for decision-making in the process.
AWS:
 Expertise in Amazon Web Services (AWS) Cloud Platform which includes services like VPC, DynamoDB, Route 53,
Elastic Container Services (ECS), Security Groups, CloudWatch, EC2, S3, Security Groups, Kinesis, Red shift, IAM,
CloudFormation, ELB, Cloud Front, Elastic Beanstalk (EBS).
 Experience in using Python included Boto3 for automation and scheduling Lambda functions for routine AWS tasks.
 Implemented a ‘server less’ architecture using API Gateway, Lambda, and DynamoDB and deployed AWS Lambda
code from Amazon S3 buckets. Created a Lambda Deployment function and configured it to receive events from S3
bucket.
 Developing ETL pipelines to move on-prem data (data sources that include Flat Files, Mainframe Files, and
Databases) to AWS S3 using Talend, PySpark. Created and embedded python modules in the ETL pipeline to
automatically migrate data from S3 to Redshift using AWS Glue.
GCP:
 Hands-on experience with GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Data Proc, Stack driver,
and Data Warehousing.
 Developed and maintained data pipelines on Google Cloud Platform (GCP) using tools such as Dataflow and Apache
Beam, with data storage and retrieval primarily handled through Google Cloud Storage (GCS).
 Developed Scala-based ETL pipelines to extract, transform, and load data from various sources into Google Cloud
Platform (GCP) Big Query.
 Designed and implemented a scalable data lake infrastructure on Google Cloud Platform (GCP) utilizing services such
as Google Cloud Storage, Big Query, and Dataflow.
 Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google
Cloud Platform (GCP) using cloud native tools such as Big query, Data Proc, Google Cloud Storage, Composer.
 Implemented data processing algorithms in Scala utilizing Apache Spark on GCP Dataproc for efficient batch
processing of large datasets.
Azure:
Hands on experience in setting up the Azure Data factory and creating the ingestion Pipelines to pull data to Azure Data
Lake Store and Azure Blob Storage. Migration of on-premises databases to Microsoft Azure environment (Blobs, Azure
Page 1|
9
Data Warehouse, Azure SQL Server, PowerShell Azure components, SSIS Azure components).

Page 2|
9
Skilled in Azure, Azure Data Factory (ADF), Databricks, ETL, SQL Databases, Data Warehousing, and Power BI
Worked on Azure Data Lakes (ADLS), Data Lake Analytics and integrated with other Azure Services.
 Developed high performant data ingestion pipelines from multiple sources using Azure Data Factory
and Azure Databricks.
 Extensively worked on creating pipelines in Azure Cloud ADFv2 using different activities like Move &Transform,
Copy, filter, for each Data bricks.
 Proficient in working with Delta Lake (Delta format) and Parquet files in Azure Databricks. Tasks include ingesting
data from raw sources to bronze tables, cleaning and transforming data into silver tables, and creating gold tables
tailored to specific business needs.
Data warehousing / business intelligence (BI)
 Experience in data warehousing and business intelligence area in various domain.
 Created Power BI, Tableau dashboards with large data volumes from data source SQL servers.
 Active involvement in all scrum ceremonies – Sprint Planning, Daily Scrum, Sprint Review and Retrospective
meetings and assisted Product owner in creating and prioritizing user stories.
Data Science: Proficient in managing entire data science project life cycle and actively involved in all the phases of
project life cycle including Data acquisition, Data cleaning, Feature scaling, Dimension reduction techniques, Feature
engineering, Statistical modeling and Ensemble learning.
 Implemented several statistical methodologies like Classification (K nearest neighbors, support vector machines,
decision trees, Naïve-Bayes classifier) and Regression models (multiple regression and regression trees, SVR, and k-
means clustering) in Python, R and SAS.
 Designing, implementing, and maintaining data lakes using Iceberg for storing large volumes of structured
and semi-structured data.
Extract, Transform and Load (ETL) source data into target tables to build Data Marts.
 Conducted Gap Analysis, created Use Cases, workflows, screen shots and Power Point presentations for various Data
Applications.
 Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks.
Data integration, Transformation Code Pipelines:
 Hands-on experience in developing data integration and transformation code pipelines in object-oriented scripting
language like PySpark.
 Experience in Developing Spark applications using Spark - SQL, PySpark, and Data Lake in Databricks for data
extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to
uncover insights into the customer usage patterns.
 Constructed data pipelines to process streamed and chunked data on AWS by ingesting from 10+ data sources.
Scripting: Wrote production level code for querying, and retrieving the data from various RDBMS resources like MySQL
and different cloud resources GCS, S3 bucket.
Data Pipelines: Designed and developed complex data pipelines to ensure the maintenance of data quality. These
pipelines efficiently process both batch and streaming data from Google Cloud Pub/Sub and ingest it into Google
BigQuery. Power BI / Tableau:
 Experienced in developing Power BI reports and dashboards from multiple data sources using data blending.
 Experience in writing complex SQL Queries using stored procedure, common table expressions (CTEs), temporary
table to support Power BI.
Data Bases/Data Modelling/Data Warehousing: In-depth comprehension and familiarity with NoSQL databases,
including MongoDB, PostgreSQL, HBase, and Cassandra. Expertise in T-SQL (DDL, DML, TCL), and developing SQL Server
Programmability Objects such as Stored Procedures, Functions, Triggers, Views, and Sub queries for various business
requirements. Dimensional modelling and hands-on solid experience with Star Schema and Snow-Flake Schema for the
fact and dimension tables in Data Warehouse. Experience in Data Warehousing, Data Integration, Cloud Architecture
Design, and Data Modelling. Proficient in Oracle backend programming.

Page 3|
9
TECHNICAL SKILLS:
SDLC Agile, Scrum, Waterfall.
Big Data Ecosystem Hadoop, MapReduce, Kafka, Airflow, NiFi.
Hadoop Distributions Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapReduce, Apache EMR.
Cloud Platforms Amazon Web Services (AWS), MS Azure, GCP.
AWS Services EMR, S3, EC2, VPC, Redshift, Lambda, Dynamo DB, RDS, SNS, SQS, Glue.
GCP Services Big query, Dataproc, GCS (Cloud Storage), Dataflow, Vertex AI, Cloud composer, Cloud Shell,
Cloud function.
MS Azure Services Azure SQL Database, Azure Data Lake (ADL), Azure Data Factory (ADF), Azure SQL Data
Warehouse, Azure ServiceBus, Azure Key Vault, Azure Analysis Service (AAS), Azure Blob Storage.
ETL/ BI Tools Informatica, SSIS, Tableau, PowerBI, SSRS.
ETL Tools Informatica Power Center 10.x/9.x/8.x, Informatica Data Transformation B2B (Parser,
Mapper & Serializer).
CI/ CD Azure DevOps, Jenkins, Ant, Maven.
Ticketing Tools JIRA
Operating Systems Linux, Windows, Unix.
Databases (RDBMS/ NoSQL) Oracle, SQL Server, Cassandra, Teradata, PostgreSQL, HBase, MongoDB.
Programming Python (Pandas, NumPy), SQL, Scala, PL/SQL, R, Java, Shell Scripting.
Languages/Scripting
DWH Schemas Star Schema, Snowflake Schema.
Data Modeling Tools Erwin, MS VISO.
Web/ Application Server Apache Tomcat, WebLogic, WebSphere, JBoss.
Version Control Git, , GitHub, Subversion.
Reporting / BI Tools MS Excel, Tableau, Power BI, QlikView, SSRS, Splunk.

Professional Experience:
Walmart August 2024 – Present
Data Engineer
Responsibilities:
 Transform SparkSQL queries into BigQuery SQL, enhancing query performance and data migration efficiency.
 Migrate Azure Databricks PySpark notebooks to modular Python code, streamlining maintenance and scalability.
 Develop and maintain DAG scripts for daily and weekly data processing jobs.
 Create Python-based Airflow scheduling scripts to automate data workflows.
 Orchestrate complex data workflows on GCP using Astronomer Airflow, with intermediate data stored in GCS.
 Integrated Astronomer Airflow with GCP services (Cloud Storage, Dataproc, BigQuery) to enable seamless data
pipeline operations.
 Implemented constants management across pipelines and DAGs to ensure configuration consistency.
 Leveraged Airflow variables for managing environment-specific configurations.
 Design a backtesting framework in Airflow to validate machine learning model accuracy.
 Executed bi-directional data migration between Azure Blob Storage and GCP BigQuery.
 Restructured DAG folder architecture to align with bucket-based deployment best practices.
 Utilized various Airflow operators (PythonOperator, TriggerDagRunOperator, BashOperator) to effectively
orchestrate end-to-end workflows.
 Architected and optimized robust CI/CD pipelines using GitHub and LooperPro, automating testing and deployment
to enhance software delivery.
 Tracked Airflow deployments to GCS buckets using Concord.
 Integrate Vertex AI to build, train, and deploy machine learning models, streamlining the end-to-end ML lifecycle on
GCP.

Page 4|
9
 Conducted code reviews and provided peer mentoring to uphold high coding standards.
 Developed Power BI reports for replenishment and allocation pipelines, facilitating data-driven decision-making.
 Migrated Power BI reports from Azure Databricks to GCP BigQuery data sources.
 Collaborated with Data Scientists and Product Owners during SCRUM ceremonies (Sprint Planning, Retrospectives,
Backlog Refinement) to align technical deliverables with business objectives.
Environment:
GCP, GCS (Cloud Storage), Dataproc, Vertex AI, Python, SparkSQL, PySpark, Big Query, LooperPro, Concord, Azure
Databricks, Azure blob storage, Astronomer Airflow, SQL, Power BI, Github, VS Code, Agile.

Apple September 2023 – July2024


Senior Data Engineer
Responsibilities:
 Implemented reporting data marts for Sales and Marketing teams on AWS redshift. Handled data schema design
development, ETL pipelines in Python/ MySQL Stored Procedures Automation using Jenkins.
 Worked on AWS to aggregate clean files in Amazon S3 and on Amazon EC2 Clusters to deploy files into Buckets.
 Designed and architected solutions to load multipart files which can't rely on a scheduled run and must be event
driven, leveraging AWS SNS.
 Used of several ETL technologies, such as Informatica Power Centre, for data migration, data profiling, ingestion,
data cleaning, transformation, import, and export.
 Optimizing query performance and resource utilization for Iceberg tables by partitioning data, indexing columns, and
tuning storage configurations.
 Involved in Data Modeling using Star Schema, Snowflake Schema.
 Used AWS EMR clusters for creating Hadoop and spark clusters. These clusters are used for submitting and
executing python applications in production.
 Developing data pipeline with Amazon AWS to extract data from weblogs and store in HDFS.
 Migrated the data from AWS S3 to HDFS using Kafka.
 Developed PySpark code for AWS Glue jobs and EMR clusters.
 Transforming and loading the large sets of structured, semi structured and unstructured data.
 Worked on designing AWS EC2 instance architecture to meet high availability application architecture and security
parameters.
 Created AWS S3 buckets and managed policies for S3 buckets and Utilized S3 buckets and Glacier for storage and
backup.
 Worked on Hadoop cluster and data querying tools to store and retrieve data from the stored databases.
 Designed and deployed automated ETL workflows using AWS lambda, organized and cleansed the data in S3 buckets
using AWS Glue and processed the data using Amazon Redshift.
 Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
 Managed the AWS Glue Data Catalog to create and maintain metadata for datasets, enabling efficient data discovery
 and governance.
 Installed and configured Apache airflow for workflow management and created workflows in python.
 Wrote UDFs in Hadoop PySpark to perform transformations and loads.
 Wrote TDCH scripts and used Apache NIFI to load data from Mainframes DB2 to Hadoop cluster.
 Working with ORC, AVRO and JSON, Parquetted file formats and create external tables and query on top of these
files Using Big Query.
Environment:
Data Warehousing: Redshift, Snowflake. ETL Tools: Talend, Apache NiFi, Apache Airflow. Database Systems: MySQL, Oracle,
MongoDB, snowflake. Data Visualization Tools: Power BI, MS Excel VBA. Programming Languages: Python, Java, SQL,
Spark. Cloud Infrastructure: AWS Mainframes, Teradata, DB2.

Page 5|
9
ALDI, Batavia, IL March 2023 – September 2023
AWS Data Engineer
Responsibilities:
Data Warehouse project:
 Exposure to Full Lifecycle (SDLC) of Data Warehouse projects including Dimensional Data Modelling.
 Developed predictive models using Python & R to predict customer churn and classification of customers.
 Developed Spark jobs using Python in test environment for faster data processing and used Spark SQL for querying.
AWS:
 Created AWS API Gateway that involved lambda function and returned the status code as per the validations. In this
when the user hits the endpoint with a valid request body then it invokes lambda function where the actual code is
written to fetch data from database. Validations are performed using JSON with schema level validations.
 Worked on lambda functions to trigger S3 bucket and written code to insert the data dynamically in DynamoDB and
checking CloudWatch logs in case of any issues.
 Developed backend coding using Python and AWS server less to convert the XML data coming from S3 bucket into
JSON Format and automated workflow of raising tasks and inserting the required data into DynamoDB and also used
previous data from DynamoDB and used to check if the errors are happening constantly.
 Worked on AWS glue ETL tool, in which we used AWS Glue Crawler, AWS Glue Data catalog and Connections.
 Worked on AWS Athena query service that makes it easy to analyze the data in Amazon S3 using standard SQL.
 Designed infrastructure for AWS application and workflow using Terraform and had done implementation and
continuous delivery of AWS infrastructure using Terraform.
 Transformed data using AWS Glue dynamic frames with PySpark, cataloged the transformed data using Crawlers, and
 scheduled jobs and crawlers using the workflow feature.
 Developed and tested environments of different applications by provisioning Kubernetes clusters on AWS using
Docker, Ansible, and Terraform.
Big Data
 Perform validation and verify software at all testing phases which include Functional Testing, System Integration
Testing, End to End Testing, Regression Testing, Sanity Testing, User Acceptance Testing, Smoke Testing, Disaster
Recovery Testing, Production Acceptance Testing, and Pre-prod Testing phases.
 Worked extensively on MongoDB, Hadoop, Hive, Spark, SQL, and PySpark.
 Developed Spark applications for data quality assessment using Scala and Python.
 Oversaw the migration process from Control-M to Airflow, creating a comprehensive plan.
 Worked with Snowflake utilities, Snow SQL, Snow Pipe, and Big Data model with Python.
 Created ETL pipelines in and out of the data warehouse using a combination of Python and Snowflakes Snow SQL
Writing SQL queries against Snowflake.
 Implement a data lake solution using Spark and Databricks, ingesting data from various sources including streaming
data from Kafka.
Environment: SDLC, R, Python, NumPy, SciPy, Matplotlib, Pandas, AWS, HDInsight, USQL, Big Data, Snowflake, Snow
SQL, Snow Pipe, Spark, Spark Streaming, Spark RDD, PySpark, Scala, Hive, Oozie, HBase, Pig, Sqoop, Linux, Power BI,
Informatica, SQL Server, MongoDB, EC2, S3, DynamoDB, SSIS, Auto Scaling, Spark YARN, Lambda, Apache Nifi, Machine
learning, Kafka.

Hindustan Unilever, India February 2019 – January 2022


Azure Data Engineer
Key Contributions:
 Developed a data set process for data mining and data modelling and recommend the ways to improve data quality,
efficiency and reliability. Extracted, Transform and Load data from Sources Systems to Azure Data Storage services
using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to
one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in
In Azure Databricks.
 Performed provisioning of IaaS, PaaS Virtual Machines and Web, Worker roles on Microsoft AZURE Classic and Azure
Page 6|
9
Resource Manager, and Deployed Web applications on Azure using PowerShell Workflow.

Page 7|
9
 Used Azure PaaS Solutions like Azure Web Apps, Web Roles, Worker Roles, SQL Azure and Azure Storage and in
configuring and deploying the Operations Management Suite (OMS) to monitor and track changes.
Responsibilities:
 Worked on Microsoft Azure services like HDInsight Clusters, BLOB, ADLS, Data Factory and Logic Apps and also done
POC on Azure Data Bricks.
 Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON
files, XML Files. Mastered in using different columnar file formats like RC, ORC and Parquet formats.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure
Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics Data Ingestion to one or more Azure Services -
(Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
 Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from
different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
 Involved in data warehouse implementations using Azure SQL Data warehouse, SQL Database, Azure Data Lake
Storage (ADLS), Azure Data Factory v2.
 Involved in creating specifications for ETL processes, finalized requirements and prepared specification documents.
 Migrated data from on-premises SQL Database to Azure Synapse Analytics using Azure Data Factory, designed
optimized database architecture.
 Created Azure Data Factory for copying data from Azure BLOB storage to SQL Server.
 Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight/Databricks.
 Work with similar Microsoft on-prem data platforms, specifically SQL Server and SSIS, SSRS, and SSAS.
 Create Reusable ADF pipelines to call REST APIs and consume Kafka Events.
 Worked on designing developing the SSIS Packages to import export data from MS Excel, SQL Server, Flat files.
 Implemented Copy activity, Custom Azure Data Factory Pipeline Activities.
 Worked on creating Azure Data Factory for moving and transforming the data.
 Automated and scheduled business processes and workflows using Azure Logic Apps.
 Migrated SQL databases to Azure Data Lake, Azure SQL Database, Data Bricks, and Azure SQL Data Warehouse.
 Employed various SSIS transformations such as aggregate, conditional split, data conversion, derived columns,
union, and sort. Developed Python Spark scripts on Azure HDInsight for data aggregation and validation.
 Automated jobs in Azure Data Factory using different triggers (Event, Scheduled, Tumbling).
 Utilized Cosmos DB (Graph database) for storing catalog data and event sourcing in order processing pipelines.
 Designed and developed user-defined functions, stored procedures, and triggers for Cosmos DB (Graph database).
Environment: Microsoft Azure services like HDInsight, BLOB, ADLS, Logic Apps snowflake, Apache Spark 2.3, Spark-SQL,
ETL, Maven, Oozie, Java 8, Python3, Unix, Power shell scripting.

Amtech Enterprises (Goldman Sachs), India September 2016 – January


2019 Data Engineer
Responsibilities:
 Migrating an On-prem database to Google cloud Big Query.
 Client wants to migrate their enterprise data warehouse from SQL server to GCP using data pipelines that pull data
from transactional databases (SQL server, POS, Applications), staging it within Cloud Storage and performing
transformations using views within Big Query.
 Built Data pipelines using Apache Beam framework in GCP for ETL related jobs for different airflow operators.
 Performed initial POC by migrating their data using Big Query Data Transfer, and then subsequently moving
incremental data into Cloud Storage, triggering a Cloud Function.
 Used Cloud products like GCS, Cloud functions, Data Flow, Big Query and Cloud composer.
 Loading data into GCS buckets that is transferred from on-prem databases.
 Built data pipelines in Airflow in GCP for ETL related jobs using different airflow operators.
 Exposure to IAM roles in GCP. Project-wide IAM policies and Firewall rules on GCP to restrict unwanted access from
public internet and individuals.

Page 8|
9
 Used Big Query command line utilities to load data into Big Query tables for arrival of csv/text files in GCS bucket.
 Used Cloud Shell SDK in GCP to configure the services Data Proc, Google Cloud Storage Big Query.
 Created Airflow Scheduling scripts in Python. Optimize AI models (compression, quantization, etc.) for deployment
on cloud, create dockized versions of the optimized models, Deploying the dockized models on scalable distributed
infrastructure Lifecycle management of the deployed models Used ML Ops frameworks Kubeflow, ML Flow.
Environment: GCP, Big query, GCP Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, Gsutil, Bq
Command Line Utilities, Dataproc, EDI, VM Instances, Cloud SQL, MySQL, Postgres, SQL Server, Salesforce SOQL, Python,
Scala.

MDigital Tech, India August 2013 – September 2016


Data Engineer /Data Analyst Key
Contributions:
 Worked on the development of tools which automate AWS server provisioning, automated application
deployments, and implementation of basic failover among regions through AWS SDKs.
 Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop
to the RDBMS servers after aggregations for other ETL operations.
Responsibilities:
 Monitored containers in AWS EC2 machines using Datadog API and ingest, enrich data into the internal cache system.
 Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS
Lambda, DynamoDB.
 Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases,
such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
 Have setup data governance touch points with key teams to ensure data issues were addressed promptly.
 Responsible for facilitating UAT (User Acceptance Testing), PPV (Postproduction Validation) and maintaining
Metadata and Data dictionary.
 Responsible for source data cleansing, analysis and reporting using pivot tables, formulas (v-lookup and others), data
validation, conditional formatting, and graph and chart manipulation in Excel.
 Actively involved in data modeling for Application migration to Teradata and developed the dimensional model.
 Created Power BI Reports using the Tabular SSAS models as source data in Power BI desktop and publish reports to
service. Designed different types of KPI's for percentage growth in different fields and compared over a period and
designed aggregations and pre-calculations in SSAS.
 Created Data Flow Diagrams (DFDs), ER diagrams for data modeling and Web-page mock-ups using MS Visio for
acceptance from end users.
 Responsible for creating Use case diagrams, Context diagrams, Swim Lane diagrams, Data Flow (DFD) diagrams.
 Followed Scrum and SAFE Agile Methodology SDLC.
 Creating AWS Lambda functions using python for deployment management in AWS and designed, investigated, and
implemented public facing websites on Amazon Web Services and integrated it with other applications
infrastructure.
Environment: Cloudera (CDH3), AWS, Snowflake, HDFS, Kafka, Sqoop, Shell Scripting, python 3.

Education:
Master’s Major (Computer Science/ Data Science /Data Engineering)
 MS (ITM) from Trident University, California, 2023

Page 9|
9

You might also like