0% found this document useful (0 votes)
105 views5 pages

Swathi - DE Data Engineer

Uploaded by

sarvotham53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views5 pages

Swathi - DE Data Engineer

Uploaded by

sarvotham53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

SWATHI

Email Id: [email protected]


Number: 469-415-5019

PROFESSIONAL SUMMARY
8+ years of experience in Data Analysis, Data Engineering in Commercial, Financial, Retail and Health domains, Data
wrangling, Data Scrubbing, Implementing ETL processes and Orchestrating, Data visualization to draw meaningful
insights, Big Data Technologies, Data Warehousing, Cloud. Committed to data accuracy and skilled in communicating
complex findings effectively to diverse stakeholders.
EXPERTISE
 In-Depth knowledge of Software Development Life Cycle (SDLC) AGILE and WATERFALL, having scrupulous understanding
of various phases like Requirements, Analysis, Design, Development and Testing
 Developed and implemented modular and reusable ETL processes, along with associated data structures, catering to Data
Staging, Change Data Capture (CDC), Data Balancing & Reconciliation, Data Lineage Tracking, and ETL Auditing.
 Proficient in crafting SQL queries to retrieve filtered data from a variety of databases, including RDBMS like SQL Server,
MySQL, PostgreSQL, Oracle, and NoSQL databases such as MongoDB, HBase, and Cassandra, with a focus on handling
unstructured data.
 Demonstrated expertise in configuring and sustaining Amazon Web Services (AWS) infrastructure, encompassing services
like Amazon EC2, ELB, Auto-Scaling, S3, Route53, IAM, VPC, RDS, Amazon Redshift, Step Functions, Security Groups, Load
Balancers, Target Groups, CloudWatch, SNS, Lambda, ECS, CloudFormation, and EMR.
 Led the migration of data center operations to Amazon Web Services (AWS), providing initial support to Applications and
Database teams during the transition.
 Own the design, development, validation, and maintenance of ongoing metrics, reports, analyses, experiments, and related
roadmaps, etc. to drive key business decisions.
 Experience in Tableau report developing, testing, and deploying report using Tableau Server.
 Experience in creating metrics, attributes, charts, filters, hierarchies, trend lines, sets and grouping, data blending,
Parameters and Complex Calculation to manipulate data using Tableau.
 Proficient in design and development of various dashboards, reports utilizing Tableau visualizations like bar graphs, scatter
plots, pie-charts, geographic visualization.
 Developed SSIS packages using for each loop in Control Flow to process all excel files within folder, File System Task to
move file into Archive after processing and Execute SQL task to insert transaction log data into the SQL table.
 Highly accomplished Power BI Analyst with a decade of experience in data analysis, visualization, and reporting.
 Proficient in Microsoft Power BI for transforming raw data into actionable insights and compelling dashboards.
 Expertise in data modeling, facilitating efficient data retrieval and supporting complex reporting requirements.
 Proven track record of generating insightful reports and presentations for key stakeholders.
 Ensures data quality, accuracy, and compliance with industry standards and regulations.
 Effective in training and supporting end-users in utilizing Power BI reports and dashboards.
 Possesses advanced problem-solving and analytical skills.
 Excellent communication and presentation abilities to convey complex information clearly.
 Experience in working with Hadoop and Spark distributions – Cloudera, Hortonworks.
 Experienced at performing read and write operations on HDFS filesystem.
 Experience in Implementing Spark with the integration of Hadoop Ecosystem.
 Experienced in working with data architecture including pipeline design of data ingestion, Architecture information of
Hadoop, data modelling, machine learning and advanced data processing.
 Experience in data cleansing using Spark Map and Filter Functions.
 Experience in designing and developing Applications in Spark using Scala.
 Experience Scheduling Jobs and Pipelines in Airflow using Python.
 Experience migrating Map Reduce programs into Spark to improve performance.
 Worked with Spark RDD for parallel processing of datasets in HDFS, MySQL and other sources.
 Used Airflow to schedule, manage and monitor Spark Jobs on a Cluster.
TECHNICAL SKILLSET
Databases MS SQL Server 2017,2016,2014, 2012, 2008 R2, 2008/2005/2000, DB2 10

Operating Systems Windows 98/ 2000/XP/Vista/2003, 2007, 2008, 2010, Mac OS, Linux, Cloudera and Ubuntu

OLAP/Reporting Tools SQL Server Analysis Service (SSAS), SQL Server Reporting Service (SSRS), Power BI

Data Modeling Tools Microsoft Visio 2000/2003

SQL Server Tools SQL server Management Studio, SQL server Query Analyzer, SQL server mail service, DBCC,
BCP, SQL server profiler

Programming languages SQL, PL/SQL, T-SQL, C#, Asp.NET, HTML, Shell, Bash, Python, SQL, PySpark, Scala, Java, Pig

Data Analysis & Visualization Microsoft Power BI Desktop, Power BI Report Server, PowerBI Service, QlikView, QlikSense,
Tableau.

Data Warehousing & BI SQL Server, Business Intelligence Studio, SSIS, SSAS, Access Manager, SQL Server 2000
Analysis Services and SQL Reporting services, DTS

Cloud Platform Amazon Web Services and Azure

Big Data Technologies Apache Spark, Apache Hadoop, Hue, Map Reduce, Apache Hive, Apache Sqoop, Apache
Kafka, Apache Flume, Apache Airflow, Apache Zookeeper, HDFS, Cassandra, Amazon S3, EC2,
EMR

PROFESSIONAL EXPERIENCE
Discover Financials- REMOTE Sep 2022 - Present
Senior Data Engineer
Responsibilities:

 Responsible for building the data lake in Amazon AWS, ingesting structured shipment and master data from Azure
Service Bus using the AWS API Gateway, Lambda, and Kinesis Firehose into s3 buckets.
 Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in AWS.
 Create complex SQL queries in Teradata Data Warehouse environment to test the data flow across all the stages.
 Acted as a subject matter expert on data quality best practices, tools, and technologies, staying abreast of emerging trends
and innovations in the field of data quality engineering.
 Integrated data sources from Kafka (Producer and Consumer API) for data stream-processing in Spark using AWS Network
 Designing the rules engine in spark SQL which will process millions of records on a Spark Cluster on the Azure Datalake.
Extensively involved in designing the SSIS packages to load data into Data Warehouse
 Built customer insights on customer/service utilization, bookings & CRM data using Gainsight.
 Executed process improvements in data workflows using Alteryx processing engine and SQL
 Collaborated with business owners of products for understanding business needs and automated business processes and
data storytelling in Tableau.
 Implemented Agile Methodology for building data applications and framework development.
 Presented findings, recommendations, and progress updates to senior leadership and stakeholders, advocating for
investments in data quality initiatives and demonstrating the business value of quality engineering efforts.
 Implemented business processing models using predictive & prescriptive analytics on transactional data with regression.
 Implemented Logistic, Random forests ML models with Python packages to decide insurance purchase by a Confidential
Member.
 Applied machine learning and natural language processing techniques, including topic modeling, to extract additional
analytics value from free-text fields.
 Performed data joins across tables from multiple sources, providing meaningful insights based on diverse datasets.
 Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
 Develop RDD's/Data Frames in Spark using and apply several transformation logics to load data from Hadoop Data Lakes.
 Filtering and cleaning data using Scala code and SQL Queries.
 Involved as primary on-site ETL Developer during the analysis, planning, design. development, and implementation stages
of projects using IBM Web Sphere software (Quality Stage v9.1, Web Service, Information Analyzer, Profile Stage)
Prepared Data Mapping Documents and Design the ETL jobs based on the DMD with required Tables in the Dev
Environment.
 Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
 Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Bigdata technologies.
Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports.
 Active participation in decision making and QA meetings and regularly interacted with the Business Analysts &development
team to gain a better understanding of the Business Process, Requirements & Design.
 Used DataStage as an ETL tool to extract data from sources systems, loaded the data into the ORACLE database.
 Designed and Developed Data Stage Jobs to Extract data from heterogeneous sources, applied transform logics to
extracted data and Loaded into Data Warehouse Databases.

Environment:
Python, R, spark, AWS, Amazon EMR, Amazon S3, ETL(Informatica), Hadoop, Tableau, Cloudera, Postgres

Humana Healthcare- Chicago IL Jan 2021 - Aug 2022


Senior Data Engineer
Responsibilities:

 Directed a customer clustering project, utilizing ETL Data Stage Director to schedule, run jobs, and perform testing
and debugging, while monitoring performance statistics.
 Installed Hadoop, MapReduce, HDFS, and AWS, creating multiple MapReduce jobs in PIG and Hive for data cleaning and
pre-processing.
 Architected, designed, and developed business applications and Data Marts for reporting throughout different phases of
the development life cycle, including analysis, design, coding, unit testing, integration testing, and release.
 Implemented a Spark GraphX application to analyze guest behavior for data science segments and worked on batch
processing of data sources using Apache Spark and Elasticsearch.
 Developed Big Data solutions with a focus on pattern matching and predictive modeling.
 Collaborated with the EDW team on high-level design documents for ETL processes, data dictionaries, metadata
descriptions, file layouts, and flow diagrams.
 Designed an Estimator model for various product and services bundled offerings to optimize and predict gross margin.
 Designed OLTP system environments, maintained metadata documentation, and utilized a forward engineering approach
for creating databases for OLAP models.
 Explored Spark for improving performance and optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL,
Data Frame, Pair RDD's, and Spark YARN.
 Participated in a comprehensive Data Science program covering data manipulation and visualization, web scraping, machine
learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, and Hadoop.
 Migrated PIG scripts and MapReduce programs to Spark Data Frames API and Spark SQL to enhance performance.
 Developed UNIX shell scripts for database connectivity, executing queries in parallel job execution.
 Collaborated closely with ETL Developers in designing and planning ETL requirements for reporting, providing progress
updates, and addressing risks and issues.
 Conducted scoring and financial forecasting for collection priorities using Python and SAS.
 Imported data from various sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
 Worked in an AWS environment for the development and deployment of custom Hadoop applications.
 Managed existing team members and led the recruitment and onboarding of a larger Data Science team to
address analytical knowledge requirements.
 Developed predictive causal models using annual failure rate and standard cost basis for new bundled services.
 Designed and developed analytics, machine learning models, and visualizations, covering prototyping to production
deployment and product recommendation and allocation planning.
 Participated in Normalization/De-normalization, Normal Form, and database design methodology, demonstrating
proficiency in data modeling tools such as MS Visio and Erwin Tool for logical and physical database design.
 Conducted prototyping and experimentation of machine learning algorithms, successfully integrating them into production
systems to meet diverse business needs.
 Implemented pipeline and partitioning parallelism techniques, ensuring load balancing of data. Deployed various
partitioning methods, including Hash by column, Round Robin, Entire, Modulus, and Range for bulk data loading and
performance optimization.
 Worked with multiple datasets containing billions of structured and unstructured data values related to web applications
usage and online customer surveys.
 Designed, built, and deployed a set of Python modeling APIs for customer analytics, incorporating multiple machine
learning techniques for user behavior prediction and supporting various marketing segmentation programs.
 Developed and maintained data integration programs in Hadoop and RDBMS environments, utilizing both RDBMS and
NoSQL data stores for data access and analysis.
 Employed major ETL transformations in Informatica mappings and created Hive queries and tables to identify trends in
historical data before promoting them to production.
 Worked on data modeling and Advanced SQL with Columnar Databases using AWS.
 Utilized Apache Sqoop for efficient data transfer between Apache Hadoop and relational databases (Oracle) for product-
level forecasting. Extracted data from Teradata into HDFS using Sqoop.
 Applied classification techniques, including Random Forest and Logistic Regression, to quantify the likelihood of each user
referring.
 Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools
using R, Tableau, and Power BI.

Environment:
IBM DataStage, Python, Spark framework, AWS, Redshift, MS Excel, NoSQL, Tableau, T-SQL, ETL, RNN, LSTM, MS Access, XML, MS
Office 2007, Outlook, MS SQL Server.

Wells Fargo – Charlotte NC Feb 2018 - Dec 2020


Senior Data Engineer
Responsibilities:

 Worked on analysis of extensive datasets, developing multiple custom models and algorithms to drive innovative
business solutions.
 Contributed to designing data warehouses and data lakes on both regular (Oracle, SQL Server) and high-performance big
data (Hadoop - Hive and HBase) databases. Involved in data modeling, designing, implementing, and deploying high-
performance, custom applications at scale on Hadoop/Spark.
 Generated ad-hoc SQL queries, utilizing joins, database connections, and transformation rules to extract data from legacy
DB2 and SQL Server database systems.
 Translated business requirements into working logical and physical data models for OLTP and OLAP systems.
 Created BTEQ, Fast Export, Multiload, Pump, and Fast Load scripts for extracting data from various production systems.
 Reviewed Stored Procedures for reports, wrote test queries against the source system (SQL Server-SSRS), and validated
results against the Datamart (Oracle).
 Conducted data profiling, preliminary data analysis, and addressed anomalies such as missing, duplicates, outliers, and
irrelevant data, utilizing Proximity Distance and Density-based techniques for outlier removal.
 Participated in the analysis, design, and implementation/translation of Business User requirements.
 Employed supervised, unsupervised, and regression techniques in building models.
 Conducted Market Basket Analysis to identify groups of assets moving together and recommended risk mitigation
strategies to clients.
 Developed ETL procedures and Data Conversion Scripts using Pre-Stage, Stage, Pre-Target, and Target tables.
 Created data pipelines using state-of-the-art Big Data frameworks/tools.
 Extracted appropriate features from datasets to handle bad, null, and partial records using Spark SQL.
 Stored data frames into Hive as tables using Python (PySpark).
 Ingested data into HDFS from various Relational databases like Teradata using Sqoop and exported data back to Teradata
for storage.
 Developed Apache Spark applications using Spark tools like RDD transformations, Spark Core, Spark MLlib, Spark Streaming,
and Spark SQL.
 Implemented partitioning, dynamic partitions, and buckets in Hive.
 Executed Hive queries on ORC tables stored in Hive for data analysis to meet business requirements.
 Utilized Spark to get data from HDFS, process it, and store it back into HDFS.
 Developed a Python script to load CSV files into S3 buckets, created AWS S3 buckets, managed logs, objects within
each bucket, and performed folder management.
 Created Airflow scheduling scripts in Python to automate the process of Sqooping a wide range of datasets.
 Involved in file movements between HDFS and AWS S3, extensively working with S3 buckets in AWS.
 Automated the process of Sqooping a wide range of datasets by creating Airflow scheduling scripts in Python.
 Managed file movements between HDFS and AWS S3, extensively working with S3 buckets in AWS.
 Developed Spark SQL scripts using Python for efficient data processing.
 Utilized Sqoop to extract data from the warehouse and SQL Server, loading it into Hive.
 Employed the Spark framework to transform data for the final consumption of analytical applications.
 Scheduled Oozie workflow engine to run multiple Hive jobs, automating tasks like loading data into HDFS and pre-
processing with Spark.
 Conducted Exploratory Data Analysis using R, generating various graphs and charts for data analysis using Python libraries.
 Dynamically implemented SQL server work on the website using SQL Developer tool.
 Experienced with continuous integration and automation using Jenkins. Implemented Service Oriented Architecture (SOA)
using JMS for sending and receiving messages while creating web services.
 Executed multiple business plans and projects, ensuring business needs are met. Interpreted data to identify trends across
future datasets.
 Simultaneously working on a pilot project to transition the environment to Amazon EMR, a cloud-based Hadoop
distribution, and other available Amazon cloud solutions.
 Developed interactive dashboards and created various Ad Hoc reports for users in Tableau by connecting various data
sources.

Environment:
Python, SQL Server, Hadoop, HDFS, HBase, MapReduce, Hive, Impala, Pig, Sqoop, Mahout, LSTM, RNN, Spark MLLib, MongoDB,
AWS, Tableau, Unix/Linux.

Neeyamo Enterprise Solutions, Pune, India July 2015 – July 2017

Data Analyst/Engineer

Responsibilities:

 Managed different dataflow and control flow tasks, including for loop containers, sequence containers, script tasks,
executes SQL tasks, and package configurations.
 Developed new procedures to handle complex business logic and modified existing stored procedures, functions,
views, and tables for project enhancements and defect resolution.
 Loaded data from various sources, such as OLEDB, flat files, to SQL Server 2012 database using SSIS packages. Created data
mappings for seamless data transfer from source to destination.
 Implemented batch jobs and configuration files for an automated process using SSIS.
 Designed SSIS packages to retrieve data from SQL Server and export it to Excel Spreadsheets and vice versa.
 Built SSIS packages for fetching files from remote locations like FTP and SFTP, decrypting, transforming, marting to the data
warehouse, and ensuring proper error handling and alerting.
 Utilized Expressions, Variables, and Row Count extensively in SSIS packages.
 Conducted data validation and cleansing of staged input records before loading into the Data Warehouse.
 Automated the extraction process for various files, including flat/excel files, from sources like FTP and SFTP (Secure FTP).
 Deployed and scheduled reports using SSRS to generate daily, weekly, monthly, and quarterly reports.

Environment:
MS SQL Server 2005 & 2008, SQL Server Business Intelligence Development Studio, SSIS-2008, SSRS-2008, Report Builder, Office,
Excel, Flat Files, .NET, T-SQL.

You might also like