Cloud Bigdata Amand AWS

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Amand Singh

Big Data Engineer/Cloud Architect


Email: [email protected]
SUMMARY:

 About 12+ years of experience in Leading Architecture and design of Data processing, Data warehousing, Data
Quality, Data Analytics & Business Intelligence development projects with complete end-to-end SDLC process.
 Experience in Architecture, Design and Development of large Enterprise Data Warehouse (EDW) and Data-
marts for target user-base consumption.
 Experienced in designing many key system Architecture's along with integration of many modules and systems
including Big-Data Hadoop systems, Java, AWS with hardware sizing, estimates, benchmarking
and data architecture.
 Expert in writing SQL queries and optimizing the queries in Oracle 10g/11g/12c, DB2, Netezza, SQL Server
2008/2012/2016 and Teradata 13/14.
 Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and
Teradata and worked on Teradata SQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and
Fast Export.
 Hands on experience Hadoop framework and its ecosystem like Distributed file system (HDFS), MapReduce,
Pig, Hive, Sqoop, Flume, Spark and Proficient in Hive Query language and experienced in hive performance
optimization using Static-Partitioning, Dynamic-Partitioning, Bucketing and Parallel Execution concepts.
 Good experience in Data Profiling, Data Mapping, Data Cleansing, Data Integration, Data Analysis, Data Quality,
Data Architecture, Data Modelling, Data governance, Metadata Management & Master Data Management.
 Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop
performed advanced analytical application by making use of Spark, Pyspark with Hive and
SQL/Oracle/Snowflake.
 Expertise in Data Modeling created various Conceptual, Logical and Physical Data Models for DWH projects.
Created first of a kind unique Data Model for an Intelligence Domain.
 Excellent knowledge on Perl & UNIX and expertise lies in Data Modeling, Database design and implementation
of Oracle, AWS Redshift databases and Administration, Performance tuning etc.
 Experienced in analyzing data using Hadoop Ecosystem including HDFS, Hive, Spark, Spark Streaming, Elastic
Search, Kibana, Kafka, HBase, Zookeeper, PIG, Sqoop, and Flume.
 Experienced working with Excel Pivot and VBA macros for various business scenarios and involved in data
Transformation using Pig scripts in AWS EMR, AWS RDS and AWS Glue.
 Experienced in designing & implementing many projects using various set of ETL/BI tools involving the latest
features & product trends, experience in technologies such as Big-data, Cloud Computing (AWS) & In-memory
Apps.
 Experience in Importing the Data using SQOOP from various heterogeneous systems like RDMS (MySQL,
Oracle, DB2 etc.,)/Mainframe/XML etc., to HDFS and Vice Versa.
 Experienced in continuous Performance Tuning, System Optimization & many improvements for BI/OLAP
systems & traditional Databases such as Oracle, SQL, DB2 and many high-performance databases.
 Well versed in Normalization / De-normalization techniques for optimum performance in relational and
dimensional database environments and implemented various data warehouse projects in Agile
Scrum/Waterfall methodologies.
 Expertise in writing SQL queries and optimizing the queries in Oracle, SQL Server 2008/12/16 and Teradata
and involved in developed and managing SQL, Java, Python code bases for data cleansing and data analysis
using Git version control.
 Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies,
disciplines, tasks, resources and scheduling.
 Extensive ETL testing experience using Informatica (Power Center/ Power Mart) (Designer, Workflow
Manager, Workflow Monitor and Server Manager)
 Expertise in Excel Macros, Pivot Tables, VLOOKUPs and other advanced functions and expertise Python user
with knowledge of statistical programming languages SAS. 

Technical Skills:

 Analysis and Modeling Tools: Erwin 9.6/9.5/9.1, Oracle Designer, ER/Studio.


 Languages: SQL, Python, Pyspark, Scala, Java, T-SQL and Perl
 Database Tools: Microsoft SQL Server 2016/2014/2012, Teradata 15/14, Oracle 12c/11g/10g, MS Access,
Poster SQL, Netezza, DB2, Snowflake, HBase, MongoDB and Cassandra
 ETL Tools: SSIS, Informatica Power 9.6/9.5 and SAP Business Objects.
 Cloud: AWS S3, AWS EC2, AWS EMR, AWS Airflow, SQS, AWS RDS and AWS Glue.
 Operating System: Windows, Dos and Unix.
 Reporting Tools: SSRS, Business Objects, Crystal Reports.
 Tools & Software's: TOAD, MS Office, BTEQ, Teradata SQL Assistant and Netezza Aginity
 Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, HBase, Sqoop, Flume, Oozie and No SQL
Databases

Work Experience
Sr. Data Engineer
Fannie Mae, Washington DC
May 2019 to Present
Responsibilities:

 Designed architecture collaboratively to develop methods of synchronizing data coming in from multiple


source systems and lead the strategy, architecture and process improvements for data architecture and data
management, balancing long and short-term needs of the business.
 Performed System Analysis & Requirements Gathering related to Architecture, ETL, Data Quality, Cloudera,
MDM, Dashboards and Reports. Captured enhancements from various entities and provided Impact analysis.
 Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
 Designed Real time Stream processing Application using Spark, Kafka, Scala , Oozieand Hive to perform
Streaming ETL and apply Machine Learning.
 Designed the Logical Data Model using ERWIN 9.64 with the entities and attributes for each subject areas and
involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business
process, dimensions and measured facts and developed and maintained an Enterprise Data Model (EDM) to
serve as both the strategic and tactical planning vehicles to manage the enterprise data warehouse.
 Developed and maintained mostly Python and some Perl ETL scripts to scrape data from external web sites
and load cleansed data into a MySQL DB.
 Enhancements to traditional data warehouse based on STAR schema, update data models,
perform Data Analytics and Reporting using Tableau and extracted the data from MySQL, AWS into HDFS using
Sqoop and created Airflow Scheduling scripts in Python.
 Used AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change
in a DynamoDB table and load the transformed data to another data store.
 Used Hive Context which provides a superset of the functionality provided by SQL Context and Preferred to
write queries using the HiveQL parser to read data from Hive tables (fact, syndicate).
 Working on AWS and architecting a solution to load data create data models and run BI on it and developed
Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
 Involved in data modeling session, developed technical design documents and used the ETL DataStage
Designer to develop processes for extracting, cleansing, transforms, integrating and loading data into data
warehouse database.
 Creating Spark clusters and configuring high concurrency clusters using Databricks to speed up the
preparation of high-quality data and creating Databricks notebooks using SQL, Python and automated
notebooks using jobs.
 Involved in creating Hive tables, and loading and analyzing data using hive queries Developed Hive queries to
process the data and generate the data cubes for visualizing Implemented
 Selecting the appropriate AWS service based on data, compute, database, or security requirements and defined
and deployed monitoring, metrics, and logging systems on AWS.
 Designed and developed a Data Lake using Hadoop for processing raw and processed claims via Hive and
DataStage and designed both 3NF data models for ODS, OLTP systems and dimensional data models using star
and snow flake Schemas.
 Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3,
ORC/Parquet/Text Files into AWS Redshift and Data Extraction, aggregations and consolidation of Adobe data
within AWS Glue using PySpark and create external tables with partitions using Hive, AWS Athena and Redshift
 Creating dimensional data models based on hierarchical source data and implemented on Teradata achieving
high performance without special tuning.
 Involved in designing Logical and Physical data models for different database applications using the Erwin and
involved in Data modeling, Design, implement, and deploy high-performance, custom applications at scale on
Hadoop /Spark.
 Involved in Building predictive models to identify High risk cases using Regression and Machine learning
techniques by using SAS and Python and Performed Data analysis, statistical analysis, generated reports,
listings and graphs using SAS Tools-SAS/Base, SAS/Macros and SAS/Graph, SAS/SQL, SAS/Connect,
SAS/Access.
 Working on AWS and architecting a solution to load data create data models and run BI on it and involved in
creating, debugging, scheduling and monitoring jobs using Airflow.
 Developed automated data pipelines from various external data sources (web pages, API etc) to internal Data
Warehouse (SQL server, AWS), then export to reporting tools like Datorama by Python.
 Worked on AWS utilities such as EMR, S3 and Cloud watch to run and monitor jobs on AWS and involved in
working on converting SQL Server table DDL, views and SQL Queries to Snowflake.
 Involved in loading data from LINUX file system to HDFS Importing and exporting data into HDFS and Hive
using Sqoop Implemented Partitioning, Dynamic Partitions, and Buckets in Hive.

Technology: Erwin 9.64, Python, Oracle 12c, Sqoop, Kafka, Hive, H-Base, PySpark, Oozie,Scala, Databricks,
Teradata, MS SQL, Apache Cassandra, Impala, Cloudera, AWS, AWS Glue, SQS, AWS EMR, Redshift, Lambda, Airflow,
Apache Hadoop, DataStage , Snowflake, Informatica Metadata Manager, Map Reduce, XML Files, SAS, Zookeeper,
AWS, MySQL, Dynamo DB, SAS, SAP BO, Dat Vault 2.0, LINUX, Tableau, PL/SQL and Python.

Sr. Data Engineer


Chase - Chicago IL
September 2016 to April 2019
Responsibilities:

 Responsible for defining Data Architecture Standards on Teradata / Hadoop Platforms and defined specific
process specific zones for standardizing data and loading into target tables. Also responsible for defining audit
columns for each specific zone in Hadoop environment.
 Participated in requirement gathering meetings and translated business requirements into technical design
documents/ report visualization specification design documents and synthesized and translated Business data
needs into creative visualizations in Tableau.
 Managed Logical and Physical Data Models in ER Studio Repository based on the different subject area
requests for integrated model. Developed Data Mapping, Data Governance, and transformation and cleansing
rules involving OLTP, ODS.
 Optimizing and tuning the Redshift environment, enabling queries to perform up to 100 xs faster for Tableau
and SAS Visual Analytics.
 Understanding the business requirements and designing the ETL flow in DataStage as per the mapping sheet,
Unit Testing and Review activities.
 Written Python Scripts, mappers to run on Hadoop distributed file system (HDFS) and performed
troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main
source of data for both customers and internal customer service team.
 Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing
Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive.
 Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient
database design.
 Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc and manage
metadata alongside the data for visibility of where data came from, its linage to ensure and quickly and
efficiently finding data for customer projects using AWS Data lake and its complex functions like AWS Lambda,
AWS Glue.
 Develop Executive Dashboards by collecting the requirement from department directors and stakeholders,
profile the data by Informatics developer and mapping the data columns from source to target, do further
analysis by querying in Hadoop hive and impala, working closely with big data engineers to customize the data
structure to align tableau visualization to meet the special business requirements.
 Full life cycle of Data Lake, Data Warehouse with Big data technologies like Spark, Hadoop and designed and
deployed scalable, highly available, and fault tolerant systems on AWS.
 Load data into Amazon Redshift and use AWS Cloud Watch to collect and monitor AWS RDS instances and
designed and developed ETL/ELT processes to handle data migration from multiple business units and sources
including Oracle, Postgres, Informix, MSSQL, Access and others.
 Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase and Involved
in loading and transforming large sets of data and analyzed them by running Hive queries.
 Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ER Studio and developed Data
Model -Conceptual/l Logical/ Physical DM for ODS & Dimensional delivery layer in SQL Data Warehouse
 Performed database health checks and tuned the databases using Teradata Manager and used MapReduce, and
"Big data" work on Hadoop and other NOSQL platforms.
 Developed, managed and validated existing data models including logical and physical models of the Data
Warehouse and source systems utilizing a 3NF model.
 Implemented logical and physical relational database and maintained Database Objects in the data model using
ER Studio and used Star schema and Snowflake Schema methodologies in building and designing the Logical
Data Model into Dimensional Models.
 Migrate data into RV Data Pipeline using DataBricks, Spark SQL and Scala and migrate Confidential Callcenter
Data into RV data pipeline from Oracle into HDFS using Hive and Sqoop
 Developed several behavioral reports and data points creating complex SQL queries and stored procedures
using SSRS and Excel.
 Spun up HDInsight clusters and used Hadoop ecosystem tools like Kafka, Spark and databricks for real-time
analytics streaming, sqoop, pig, hive and CosmosDB for batch jobs.
 Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services
(SSRS) and generated reports using Global Variables, Expressions and Functions using SSRS.
 Used extensively Base SAS, SAS/Macro, SAS/SQL, and Excel to develop codes and generated various analytical
reports. 
 Implemented Spark using Python/Scala and utilizing Spark Core, Spark Streaming and Spark SQL for faster
processing of data instead of MapReduce in Java
 Validated the data of reports by writing SQL queries in PL/SQL Developer against ODS and involved in user
training sessions and assisting in UAT (User Acceptance Testing).
 Involved working on AWS cloud formation templates and configured SQS service through java API,
microservices to send and receive the information.

Technology: ER Studio 16.5, Oracle 12c, Cloudera, Microservices, Python, AWS S3, Lambda, RDS, SQS, Shell
Scripting, UNIX, Pyspark, Hadoop, Spark, Hive, PIG, MongoDB, Cassandra, MapReduce, LINUX, Windows7, XML,
SQL, PL/SQL, T-SQL, Databricks, Datastage, UNIX, Agile, SSAS, Informatica, MDM, Teradata, MS Excel, MS Access,
Metadata, SAS, SQL Server, Tableau, Netezza, ERP, SSRS, Teradata SQL Assistant, DB2, SAS.

Sr. Data Modeler/ Data Analyst


US Bank – Minneapolis, MN
Jan 2010 to August 2016
Responsibilities:
 Gather and analyze business data requirements and model these needs. In doing so, work closely with the
users of the information, the application developers and architects, to ensure the information models are
capable of meeting their needs.
 Coordinated with Data Architects on AWS provisioning EC2 Infrastructure and deploying applications in Elastic
load balancing.
 Performed Business Area Analysis and logical and physical data modeling for a Data Warehouse utilizing the
Bill Inmon Methodology and also designed Data Mart application utilizing the Star Schema Dimensional Ralph
Kimball methodology.
 Worked on AWS utilities such as EMR, S3 and Cloud watch to run and monitor jobs on AWS
 Designed and Developed logical & physical data models and Meta Data to support the requirements using
Erwin
 Developed, maintained, and tested Unix shell and Perl DBI/DBD ETL scripts and developed Perl ETL scripts to
scrape data from external marketing websites and populate a MySQL DB
 SAP Data Services Integrator ETL developer with strong ability to write procedures to ETL data into a Data
Warehouse from a variety of data sources including flat files and database links (Postgres, MySQL, and Oracle).
 Designed the ER diagrams, logical model (relationship, cardinality, attributes, and, candidate keys) and
physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata as
per business requirements using Erwin
 Used python script to connect to oracle pull the data or make the data files and load those data to snowflake.
 Worked on multiple Data Marts in Enterprise Data Warehouse Project (EDW) and involved in designing
OLAP data models extensively used slowly changing dimensions (SCD).
 Designed 3rd normal form target data model and mapped to logical model and involved in extensive DATA
validation using ANSI SQL queries and back-end testing
 Generated DDL statements for the creation of new ERwin objects like table, views, indexes, packages and
stored procedures.
 Design MOLAP/ROLAP cubes on Teradata Database using SSAS and used SQL for Querying the database in
UNIX environment and creation of BTEQ, Fast export, Multi Load, TPump, Fast load scripts for extracting data
from various production systems
 Developed automated procedures to produce data files using Microsoft Integration Services (SSIS) and
performed data analysis and data profiling using complex SQL on various sources systems including Oracle and
Netezza
 Worked RDS for implementing models and data on RDS.
 Developed mapping spreadsheets for (ETL) team with source to target data mapping with physical naming
standards, data types, volumetric, domain definitions, and corporate meta-data definitions.
 Used CA Erwin Data Modeler (Erwin) for Data Modeling (data requirements analysis, database design etc.) of
custom developed information systems, including databases of transactional systems and data marts.
 Identified and tracked the slowly changing dimensions (SCD I, II, III & Hybrid/6) and determined the
hierarchies in dimensions.
 Worked on data integration and workflow application on SSIS platform and responsible for testing all new and
existing ETL data warehouse components.
 Designing Star schema and Snow Flake Schema on Dimensions and Fact Tables and worked with Data Vault
Methodology Developed normalized Logical and Physical database models.
 Transformed Logical Data Model to Physical Data Model ensuring the Primary Key and Foreign key
relationships in PDM, Consistency of definitions of Data Attributes and Primary Index considerations.
 Generated various reports using SQL Server Report Services (SSRS) for business analysts and the management
team and wrote and running SQL, BI and other reports, analyzing data, creating
metrics/dashboards/pivots/etc.
 Working along with ETL team for documentation of transformation rules for data migration from OLTP to
warehouse for purpose of reporting.
 Involved in writing T-SQL working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.

Technology: SQL Server 2012, Erwin9.1, Oracle, AWS EC2, AWS RDS, Informatica, Perl, RDS, JDBC, LINUX, NOSQL,
Spark, Scala, Python, MySQL, PostgreSQL, Teradata, SSRS, SSIS, SQL, DB2, Shell Scripting, Tableau, Excel, MDM,
Agile.
Data Analyst
iLabs, India
June 2006 to November 2009
Responsibilities:

 Attended and participated in information and requirements gathering sessions and translated business
requirements into working logical and physical data models for Data Warehouse, Data marts and OLAP
applications.
 Performed extensive Data Analysis and Data Validation on Teradata and designed Star and Snowflake Data
Models for Enterprise Data Warehouse using ERWIN.
 Created and maintained Logical Data Model (LDM) for the project includes documentation of all entities,
attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules,
glossary terms, etc.
 Integrated data from various Data sources like MS SQL Server, DB2, Oracle, Netezza and Teradata using
Informatica to perform Extraction, Transformation, loading (ETL processes) Worked on ETL development
and Data Migration using SSIS and (SQL Loader, PL/SQL).
 Created Entity/Relationship Diagrams, grouped and created the tables, validated the data, identified PKs for
lookup tables.
 Created components to extract application messages stored in XML files.
 Involved in Designed and Developed logical & physical data models and Meta Data to support the requirements
using ERWIN.
 Involved using ETL tool Informatica to populate the database, data transformation from the old database to the
new database using Oracle.
 Involved in modeling (Star Schema methodologies) in building and designing the logical datamodel into
Dimensional Models and Performance query tuning to improve the performance along with index
maintenance.
 Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
 Wrote and executed unit, system, integration and UAT scripts in a Data Warehouse projects.
 Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data
Warehouse, and data mart reporting system in accordance with requirements.
 Responsible for Creating and Modifying T-SQL stored procedures/triggers for validating the integrity of
the data.
 Worked on Data Warehouse concepts and dimensional data modelling using Ralph Kimball methodology.
 Created number of standard reports and complex reports to analyze data using Slice & Dice and Drill Down,
Drill through using SSRS.
 Developed separate test cases for ETL process (Inbound & Outbound) and reporting.

Technology: Oracle , MS Visio, PL-SQL, Microsoft SQL Server, SSRS, T-SQl, Rational Rose, Data warehouse, OLTP,
OLAP, ERWIN , Informatica 9.x, Windows, SQL, PL/SQL, SQL Server, XML files, Talend Data Quality, Oracle 9i/10g,
Flat Files, Windows , SVN.

Education: Bachelor’s in Information Technology, Chaudhary Charan Singh University, Meerut, 2005

You might also like