0% found this document useful (0 votes)
105 views7 pages

Venkata Sai (Sr. GCP Data Engineer)

Uploaded by

Surya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views7 pages

Venkata Sai (Sr. GCP Data Engineer)

Uploaded by

Surya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

VENKATA SAI TEJA

[email protected]
(985) 402-1710
Sr. GCP DEVELOPER / DATA ENGINEER /AZURE /AWS / BIGDATA / ETL / HADOOP.
PROFESSIONAL EXPERIENCE:
 Results-oriented IT professional with over 10+ years of expertise spanning Business Intelligence/Data Analytics
Solutions Architecture, Data Warehouse Development, ETL Design & Development, Migration and Cloud Data
Analytics. Proven track record in Business & System Analysis, Data Quality and Governance, and Test Case
Development, showcasing strong Team Management capabilities.
 Extensive experience in Big Data Engineering and Cloud Data Solutions, particularly Hadoop with hands-on skills in
HDFS, Hive, Pig, HBase, Sqoop, and Kafka. Proficient in Machine Learning and Data Mining for both Structured and
Unstructured Data, with a deep understanding of Predictive Modeling and Data Acquisition. Known for delivering
projects within budget and timeline, effectively managing large-scale initiatives with a focus on Data Quality and
Governance.
 Hands-on expertise in Data Warehouse Architecture including Star and Snowflake Schema Design and OLTP/OLAP
Analysis. Skilled in developing and implementing complex Hadoop Ecosystems, leveraging tools like MapReduce,
Impala, MongoDB, Oozie, and Spark Streaming. Proficient in Spark SQL, Spark Core, DataFrame API, and RDD
architectures for real-time data processing. Experienced with Airflow for scheduling complex workflows, Data
Pipelines, and ETL orchestration.
 Advanced capabilities in Data Integration, Data Transformation, Data Mapping, and Data Cleansing with an in-depth
command over SQL and Python, including libraries like NumPy, SciPy, Pandas, and Scikit-Learn. Skilled in Azure Data
Lake, Databricks, and AWS (EC2, S3, DynamoDB, Redshift) for big data solutions, with strong experience in Data
Migration to cloud ecosystems, including Snowflake.
 Proficient with Machine Learning Models and Statistical Analysis, as well as Text Analytics and Data Visualization tools
such as Tableau and Power BI. Adept at building and automating data workflows using Kafka, Apache Spark, and
Streaming Analytics. Expertise in Big Data Ingestion Tools like Flume and Sqoop.
 Highly skilled in ETL Tools (Informatica PowerCenter, AWS Glue, Talend, SQL Server Integration Services), with a solid
grounding in Data Warehousing, Data Mart and Data Modeling. Extensive hands-on experience in Hadoop
Architecture, MapReduce Programs, Distributed Data Storage Solutions, and Software Development Lifecycle (SDLC)
methodologies, including Agile and Scrum.
 Known for strong problem-solving, analytical, and organizational skills, with a deep understanding of Business Logic and
Workflow Implementation in Distributed Application Environments. Collaborative, self-motivated, and adaptable to
evolving technologies, bringing a robust knowledge of Data Engineering, Big Data Technologies, ETL Development,
and Machine Learning for comprehensive data solutions.

TECHNICAL SKILLS:
Project & Team Leadership: Project Management, Strategic Planning, Team Leadership, and Agile Methodologies
Data Management & Analytics: Business Analysis, Data Architecture, Data Modeling, Business Intelligence, Data
Warehouse Development, Data Governance, Master Data Management, Data Profiling, Data Migration, and Data
Standardization
Process Optimization: ETL/ELT Processes, Process Re-engineering, DevOps (CI/CD), Technical Documentation, and User
Documentation
Data Engineering & Cloud Computing: Big Data Analysis, Cloud Platforms (Azure, AWS, Google Cloud Platform), Data
Lake, and Data Pipeline
Programming & Scripting: SQL, PL/SQL, T-SQL, Unix Shell Scripting, Python (Pandas), Scala, and Perl
Big Data Tools & Frameworks: Hadoop, HDFS, Hive, HiveQL, MapReduce, Spark, Sqoop, Kafka, and Impala
ETL & Integration Tools: Informatica PowerCenter, Talend, AWS Glue, Azure Data Factory, SSIS, and Jupyter Notebooks
Database Platforms: SQL Server, Oracle, DB2, Teradata, Netezza, AWS RDS, AWS Redshift, AWS Snowflake, and Azure
HDInsight
Business Intelligence & Reporting: Tableau, Power BI, QlikView, Crystal Reports, and SSRS
Data Modeling & Warehousing: Star Schema, Snowflake Schema, OLAP, Cubes, Facts and Dimensions, SAS, SSAS, and
Splunk
Version Control & Collaboration: Git, Bitbucket, TFS, JIRA, Confluence, and Microsoft DevOps
Additional Tools & Platforms: EBX-TIBCO, NAS Server, Jenkins, AWS CLI, Erwin, Visual Studio, SharePoint, and Microsoft
Visio

PROFESSIONAL EXPERIENCE:
Shift4 payments, MD June 2023 to Present
Role: Sr. GCP Developer / Lead Data Engineer
Responsibilities:
 Worked as Data Engineer to review business requirements and create source-to-target data mapping documents.
 Participated actively in agile development methodology as a scrum team member.
 Engaged in Data Profiling and merged data from multiple sources.
 Performed Big Data requirement analysis and developed solutions for ETL and Business Intelligence platforms.
 Designed 3NF data models for ODS and OLTP systems, as well as dimensional models using Star and Snowflake
Schemas.
 Worked on the Snowflake environment, managing real-time data loading into HDFS via Kafka.
 Developed a data warehouse model in Snowflake for over 100 datasets.
 Designed and implemented large-scale data solutions on Snowflake Data Warehouse.
 Managed structured and semi-structured data ingestion and processing on AWS using S3 and Python; migrated on-
premises Big Data workloads to AWS.
 Designed data aggregations on Hive for ETL processing on Amazon EMR.
 Migrated data from RDBMS to Hadoop using Sqoop for performance evaluations.
 Implemented Data Validation using MapReduce for data quality checks before loading into Hive tables.
 Developed Hive tables and queries for data processing, generating data cubes for visualization.
 Extracted data from HDFS using Hive and Presto, analyzed data using Spark with Scala and PySpark, and created
nonparametric models in Spark.
 Handled data import and transformation using Hive, MapReduce, and loading into HDFS.
 Configured and used Kafka clusters for real-time data processing with Spark Streaming and RDD to Parquet format in
HDFS.
 Implemented Kafka-Spark pipelines, including the use of Kafka brokers for high-throughput message processing.
 Developed Spark Streaming solutions for reading data from Kafka and applying Change Data Capture (CDC) before
loading into Hive.
 Integrated AWS Kinesis with Kafka clusters for event log data aggregation and analysis.
 Created and managed StreamSets pipelines for event log processing using Spark Streaming.
 Automated ETL processes with Python scripts using Apache Airflow and CRON.
 Utilized Apache Airflow and Genie for job automation on EMR and AWS S3.
 Developed Databricks notebooks using SQL and Python, configuring high concurrency clusters on Azure.
 Managed Azure data product migration from Oracle to Azure Databricks.
 Utilized Apache Spark, MapReduce, and Hadoop ecosystem tools on HDInsight for analytics.
 Processed data on Azure with Data Factory, Spark SQL, and U-SQL in Data Lake and SQL DW.
 Coordinated with Data Governance, Data Quality, and Data Architecture teams.
 Developed MapReduce/Spark Python modules for machine learning and predictive analytics in Hadoop on AWS.
 Built machine learning models in Python with Spark ML, MLib, Scikit-learn, NLTK, and Pandas.
 Created Oozie workflows and maintained effective client and business communications.
 Developed ETL processes in AWS Glue for data ingestion into Redshift.
 Built data validation frameworks in Google Cloud Dataflow with Python.
 Configured AWS EC2, IAM, and S3 data pipelines for internal data sources using Boto API.
 Performed data warehousing and ETL with tools like Informatica, AWS Glue, and Azure Data Factory.
 Designed RESTful APIs for web traffic analysis, utilizing Flask, Pandas, and NumPy.
 Developed SQL and T-SQL procedures for data extraction and transformation.
 Managed AWS services like EC2, VPC, CloudTrail, CloudWatch, CloudFormation, SNS, and RDS.
 Worked extensively on Informatica PowerCenter and IDQ mappings for batch and real-time processing.
 Automated workflows and ETL processes in GCP using Apache Airflow.
 Developed SSIS packages and SQL Server imports for legacy data sources.
 Created dashboards with Tableau for summarizing e-commerce data.
 Extensively used AWS Redshift for ETL processes, Python with Apache Beam in Dataflow for data validation.
 Documented best practices for Docker, Jenkins, Puppet, and GIT.
 Installed and configured Splunk and developed Shell scripts for data processing.
 Managed BigQuery, Dataproc, and Cloud Dataflow jobs via Stackdriver for monitoring.
 Delivered data solutions within PaaS, IaaS, SaaS environments using AWS, GCP, and Kubernetes.
 Extensively used REST API for JSON data handling and API integration with SQL databases.
 Prepared and developed Informatica workflows and ETL processes for data integration with RDBMS.
 Designed ETL pipelines using Talend, Pig, Hive, and AWS Glue for comprehensive data processing.
Environment: Data Analysis, MySQL, HBase, Hive, Impala, Flume, NIFI, Agile, Neo4j, KeyLines, Cypher, Shell Scripting,
Python, SQL, XML, Oracle, JSON, Cassandra, Tableau, Git, Jenkins, AWS Redshift, PostgreSQL, Google Cloud Platform
(GCP), MS SQL Server, BigQuery, Salesforce SQL, Postman, Unix Shell Scripting, EMR, GitHub.

Giant Eagle, Pittsburg, PA September 2020 to May 2023


Role: Azure Data Engineer
Responsibilities:
 Data Engineer with extensive expertise in Hadoop technologies and Big Data analytics. Skilled in planning and executing
big data analytics, predictive analytics, and machine learning initiatives.
 Advanced hands-on experience with Spark Ecosystem (Spark SQL, MLlib, SparkR, Spark Streaming), Kafka, and
predictive analytics using MLlib and R ML packages, including H2O's ML library.
 Designed and developed Spark jobs for ETL processes on large medical membership and claims data.
 Created Airflow scheduling scripts using Python.
 Imported real-time data into Hadoop using Kafka and implemented Oozie jobs for daily processing.
 Developed machine learning applications, statistical analysis, and data visualizations to tackle complex data processing
challenges.
 Aggregated data from diverse public and private databases for actionable insights through complex analyses.
 Designed Natural Language Processing (NLP) models for sentiment analysis and used NLTK in Python for customer
response automation.
 Extensive experience in designing and implementing statistical models, predictive models, and enterprise data models
in both RDBMS and Big Data environments.
 Proficient in predictive modeling with SAS, SPSS, and Python.
 Conducted in-depth statistical analyses using techniques like T-test, F-test, R-squared, P-value, linear regression,
logistic regression, Bayesian methods, and Poisson distribution using Scikit-Learn, SciPy, NumPy, and Pandas.
 Applied clustering algorithms (e.g., Hierarchical, K-means) via Scikit-Learn and SciPy.
 Developed visualizations and dashboards using ggplot2 and Tableau.
 Worked on data warehouse, Data Lake, and ETL systems utilizing SQL and NoSQL.
 Proficient with R, SAS, MATLAB, and Python for building and analyzing datasets.
 Implemented ARMA and ARIMA models for complex financial time series analysis.
 Experienced in Cloudera Hadoop YARN for data analytics and Hive for SQL queries.
 Skilled in business intelligence and data visualization with Tableau.
 Deep understanding of Agile and Scrum processes.
 Validated macro-economic data for predictive analytics on world markets, employing regression, Bootstrap
Aggregation, and Random Forest.
 Set up AWS EMR clusters for processing monthly workloads and developed PySpark UDFs for ETL tasks.
 Utilized Spark SQL and PySpark in Databricks for validating customer data stored in S3.
 Extensive experience with Hadoop, MapReduce, HDFS, and interfacing with ETL servers.
 Identified data patterns, quality issues, and provided actionable insights to business partners.
 Conducted source system analysis, database design, and data modeling using MLDM and Dimensional modeling.
 Developed ecosystem models to support services within enterprise data architecture.
 Worked with MapReduce, FLUME, HIVE, HBase, Pig, and Sqoop for data ingestion and transformations.
 Skilled in MapReduce troubleshooting and data analysis of XML, JSON, and Relational files.
 Developed Kafka and Java MapReduce pipelines to ingest customer data into HDFS.
 Applied Partitioning, Dynamic Partitions, and Buckets in HIVE.
 Knowledgeable in Neo4j and Cypher queries for graph database analysis.
 Proficient in Apache NIFI for data automation in Hadoop.
 Extensive experience in Google Cloud Platform (GCP) and AWS Infrastructure for ETL pipeline development.
 Created Informatica PowerCenter mappings and complex ETL processes from various sources (e.g., Oracle, Teradata,
Sybase).
 Experience in Greenplum and Alteryx for large-scale data loading and audit tracking.
 Skilled in AWS Glue, PySpark, Airflow, GCP services (e.g., BigQuery, DataProc), and AWS Redshift.
 Performed query optimization using explain plans and statistics collection.
 Developed shell scripts for job automation, SQL tuning, and Informatica performance tuning.
 Created and maintained Splunk applications for search queries and dashboards.
 Experience with data profiling and data quality rules using Informatica IDQ.
 Integrated Collibra with Data Lake and managed data ingestion with Apache Kafka, Sqoop, and AWS services.
Environment: Informatica Power Center 9.5, AWS Glue, Talend, Google Cloud Platform (GCP), PostgreSQL Server,
Python, Oracle, Teradata, CRON, JavaScript, KeyLines, Cypher, MS Azure, Cassandra, Avro, HDFS, Pig, Linux, Python
(Scikit Learn/SciPy/NumPy/Pandas), SAS, SPSS, Bitbucket, Eclipse, XML, PL/SQL, SQL Connector, JSON, Tableau, Jenkins.

Ergon, Flowood, Mississippi October 2017 to August 2020


Role: AWS Data Engineer
Responsibilities:
 Worked with the Hadoop ecosystem, including HDFS, HBase, YARN, and MapReduce.
 Utilized Oozie Workflow Engine to run workflow jobs with actions for Hadoop MapReduce, Hive, and Spark.
 Performed Data Mapping and Data Modeling to integrate data across multiple databases into the Enterprise Data
Warehouse (EDW).
 Designed and developed advanced Python programs to prepare, transform, and harmonize datasets for modeling.
 Extensive experience with Hadoop/Big Data technologies for Storage, Querying, Processing, and Data Analysis.
 Developed Spark/Scala and Python scripts with RegEx for big data resources in Hadoop/Hive environments.
 Automated monthly data validation processes to check for nulls and duplicates, generating reports and metrics for
business teams.
 Employed K-means clustering for identifying outliers and classifying unlabeled data.
 Executed data gathering, data cleaning, and data wrangling using Python.
 Generated actionable insights from raw data with statistical techniques, data mining, data cleaning, data quality
assessments using Python (Scikit-Learn, NumPy, Pandas, Matplotlib), and SQL.
 Calculated errors using Machine Learning algorithms like Linear Regression, Ridge Regression, Lasso Regression, Elastic
Net Regression, KNN, Decision Tree Regressor, SVM, Bagging Decision Trees, Random Forest, AdaBoost, and
XGBoost; selected best models based on MAE.
 Improved model accuracy with Ensemble methods through Bagging and Boosting techniques.
 Conducted Segmentation analysis with K-means clustering to identify target groups.
 Optimized models with cross-validation and compared models using stepwise selection based on AIC values.
 Collaborated with cross-functional business teams (e.g., operations, HR, accounting) to analyze financial metrics and
perform ad-hoc analysis.
 Explored and analyzed customer data features using Matplotlib, Seaborn in Python, and Tableau dashboards.
 Applied Kibana for analytics and Data Visualization in Elasticsearch.
 Developed MapReduce and Spark Python modules for machine learning and predictive analytics on AWS Hadoop.
 Experimented with classification algorithms like Logistic Regression, SVM, Random Forest, AdaBoost, and Gradient
Boosting using Scikit-Learn for customer discount optimization.
 Built models with Python and PySpark to predict attendance probabilities for campaigns.
 Predicted customer churn with classification algorithms like Logistic Regression, KNN, and Random Forest.
 Designed data visualizations and dashboards with Tableau for presenting findings to stakeholders.
 Followed Agile methodology, participating in SCRUM meetings, sprint planning, and retrospectives.
 Analyzed requirements and engaged in discussions with Business Analysts.
 Created Technical Design and Integration Solution Design documents.
 Built scalable distributed data solutions using the Hadoop ecosystem.
 Analyzed Hadoop clusters and Big Data tools, including MapReduce and Hive.
 Involved in loading data from Linux file systems and servers using Kafka Producers.
 Enhanced DAT/IDW Framework for real-time and batch data integration into the Hadoop Data Lake.
 Imported/exported data to/from HDFS and Hive using Sqoop and Flume.
 Developed MapReduce programs for data pre-processing in HDFS.
 Experienced in managing Hadoop log files and NoSQL databases (HBase, Cassandra).
 Suggested solutions for module issues and designed High-Level and Low-Level specifications.
 Implemented workflows with Apache Oozie and used SQL for data analysis.
 Developed data access components using JDBC, DAOs, and Beans for data manipulation.
 Validated SQL queries, stored procedures, and embedded SQL for database interactions.
 Built a responsive Single Page Application (SPA) for investors using Bootstrap, RESTful API, and SQL.
 Managed version control with GIT and used Maven for build configurations.
 Conducted unit, integration, and regression testing; reviewed code.
 Defined data models and implemented data transformation with SSIS.
 Created complex reports (e.g., drill-down, parameterized, matrix) using SSRS.
 Developed Spark applications with Spark SQL on Databricks for data transformation.
 Migrated data from Oracle to AWS Redshift, managing S3 buckets for storage and backup.
 Wrote Spark scripts in Python on AWS EMR for data aggregation and validation.
 Created functional requirements for business systems, engaging in the database design process.
 Documented the Power BI architecture for a proof of concept.
 Implemented MSBI solutions with SSIS, SSRS, and SSAS for data reporting and dashboarding.
 Designed mappings using SSIS transformations like OLEDB command, Lookup, and Aggregator.
 Scheduled SSIS Packages with SQL Server Agent for automated maintenance.
 Migrated reports from SSRS to Power BI and developed strategies for data transfer to Amazon Redshift.
 Proficient in SQL, PL/SQL, and UNIX Shell Scripting.
 Created PL/SQL Procedures, Functions, Triggers, and Cursors.
 Loaded data into NoSQL databases (e.g., HBase, Cassandra).
 Expert with Teradata utilities for data loading (FastLoad, MultiLoad, TPump).
 Developed batch jobs with UNIX Shell Scripts and Autosys for production loads.
 Recommended SQL optimizations and indexing for performance improvements.
 Deployed EC2 Instances for Oracle Database management.
 Used Power Query in Power BI for data model cleansing and transformations.
Environment: MS SQL Server 2016, ETL, SSIS, SSRS, SSMS, Cassandra, AWS Redshift, AWS S3, Oracle 12c, Oracle
Enterprise Linux, Teradata, Databricks, Jenkins, JavaScript, XML, HTML, CSS, JSP, JDBC, Eclipse, Maven, Hadoop, HDFS,
HBase, Oozie, Spark, Machine Learning, Big Data, Python, PySpark, DB2, MongoDB, Elasticsearch, Web Services

IBing Software Solutions Private Limited Hyderabad India September 2015 to July 2017
Big Data Engineer / ETL
Responsibilities:
 Utilized Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for machine learning
development, applying algorithms like linear regression, multivariate regression, naive Bayes, Random Forests, K-
means, and KNN for data analysis.
 Extensive experience in designing and implementing statistical models, predictive models, enterprise data models,
metadata solutions, and data lifecycle management across RDBMS and Big Data environments.
 Applied domain knowledge and application portfolio knowledge to shape the future of large-scale business technology
programs.
 Created and modified SQL and PL/SQL database objects, including Tables, Views, Indexes, Constraints, Stored
Procedures, Packages, Functions, and Triggers.
 Created and manipulated large datasets through SQL joins, dataset sorting, and merging.
 Designed ecosystem models (conceptual, logical, physical, canonical) to support enterprise data architecture for services
across the ecosystem.
 Developed Linux Shell scripts using NZSQL/NZLOAD utilities for data loading into Netezza. Designed a system
architecture for an Amazon EC2-based cloud solution.
 Tested complex ETL mappings and sessions to meet business requirements, loading data from flat files and RDBMS
tables to target tables.
 Hands-on experience with database design, relational integrity constraints, OLAP, OLTP, cubes, and normalization
(3NF) and denormalization.
 Developed MapReduce/Spark Python modules for machine learning and predictive analytics in Hadoop on AWS.
 Implemented customer segmentation using unsupervised learning techniques like clustering.
 Used Teradata15 tools and utilities, including Teradata Viewpoint, Multi-Load, ARC, Teradata Administrator, BTEQ, and
other Teradata Utilities.
 Followed J2EE standards for module architecture, covering Presentation-tier, Business-tier, and Persistence-tier.
 Wrote refined SQL queries for extracting attacker records and used Agile/SCRUM for project workflow.
 Leveraged Spring Inversion of Control and Transaction Management, along with designing front-end/user-interface
(UI) using HTML 4.0/5.0, CSS3, JavaScript, JQuery, Bootstrap, and AJAX.
 Managed JavaScript events and functions, implemented AJAX/jQuery for asynchronous data retrieval, and updated CSS
for new component layouts.
 Conducted web service testing with SoapUI and logging with Log4j; performed test-driven development with JUnit and
used Maven for code builds.
 Deployed applications on WebSphere Application Server and managed source code with MKS.
 Conducted data analysis, data migration, data cleansing, data integration, and ETL design using Talend for Data
Warehouse population.
 Developed PL/SQL stored procedures, functions, triggers, views, and packages, using indexing, aggregation, and
materialized views to optimize query performance.
 Created logistic regression models in R and Python for predicting subscription response rates based on customer
variables.
 Developed Tableau dashboards for data visualization, reporting, and analysis, and presented insights to business
stakeholders.
 Conducted FTP operations using Talend Studio for file transfers with tFileCopy, TRleArchive, tFileDelete, and other
components.
 Designed and developed Spark jobs with Scala for batch processing data pipelines.
 Managed Tableau Server for configuration, user management, license administration, and data connections, embedding
views for operational dashboards.
 Collaborated with senior management on dashboard goals and communicated project status daily to management and
internal teams.
Environment: Hadoop Ecosystem (HDFS), Talend, SQL, Tableau, Hive, Sqoop, Kafka, Impala, Spark, Unix Shell Scripting,
Java, J2EE, DB2, JavaScript, XML, Eclipse, AJAX/Query, MKS, SoapUI, Erwin, Python, SQL Server, Informatica, SSRS,
PL/SQL, T-SQL, MLlib, MongoDB, logistic regression, Hadoop, OLAP, Azure, MariaDB, SAP CRM, SVM, JSON, AWS.

Yana Software Private Limited Hyderabad, India June 2014 to August 2015
Hadoop Developer
Responsibilities:
 As a Senior Data Engineer, delivered expertise in Hadoop technologies to support analytics development. Implemented
data pipelines with Python and adhered to SDLC methodologies.
 Participated in JAD sessions for optimizing data structures and ETL processes. Loaded and transformed extensive
structured, semi-structured, and unstructured datasets using Hadoop/Big Data principles.
 Leveraged Windows Azure SQL Reporting to create dynamic reports with tables, charts, and maps.
 Designed a data model (star schema) for the sales data mart using Erwin and extracted data with Sqoop into Hive
tables.
 Developed SQL scripts for table creation, sequences, triggers, and views, and conducted ad-hoc analysis on Azure Data
Bricks using a KANBAN approach.
 Utilized Azure Reporting Services for report management, and debugged production issues in SSIS packages, loading
real-time data from various sources into HDFS using Kafka.
 Created MapReduce jobs for data cleanup, defined ETL/ELT processes, and integrated MDM with data warehouses.
 Created Pig scripts for data movement into MongoDB and developed MapReduce tasks with Hive and Pig.
 Set up Data Marts with star and snowflake schemas and worked with multiple data formats in HDFS using Python.
 Built Oracle PL/SQL functions, procedures, and workflows, and managed Hadoop jobs with Oozie.
 Prepared Tableau dashboards and reports, and translated business requirements into SAS code.
 Migrated ETL processes from RDBMS to Hive and set up an Enterprise Data Lake for storing, processing, and analytics
using AWS.
 Leveraged AWS S3, EC2, Glue, Athena, Redshift, EMR, SNS, SQS, DMS, and Kinesis for diverse data management and
processing tasks.
 Created Glue Crawlers and ETL jobs, performed PySpark transformations, and used CloudWatch for monitoring.
 Employed AWS Athena for data queries and QuickSight for BI reports.
 Used DMS to migrate databases and Kinesis Data Streams for real-time data processing.
 Built Lambda functions to automate processes, used Agile methods for project management, and conducted complex
SQL data analysis.
 Collaborated with MDM teams, created HBase tables, and worked on normalization for OLTP and OLAP systems.
 Developed SSIS packages and SQL scripts, managed Hive tables, and used Informatica for ETL workflows. Designed Data
Marts and applied data governance and cleansing rules.
 Built Hive queries for visualization, repopulated data warehouse tables using PL/SQL, and designed XML schemas.
 Delivered customized reports and interfaces in Tableau and created the data model for the Enterprise Data Warehouse
(EDW).
 Utilized SQL Server Reporting Services (SSRS) to create reports and used Pivot tables for business insights.
Environment: Erwin 9.7, Erwin 9.8, Redshift, Agile, MDM, Oracle 12c, SQL, HBase 1.1, HBase 1.2, UNIX, NoSQL, OLAP,
OLTP, SSIS, Informatica, HDFS, Hive, XML, PL/SQL, RDS, Apache Spark, Kinesis, Athena, Sqoop, Python, Big Data 3.0,
Hadoop 3.0, Azure, Sqoop 14, ETL, Kafka 1.1, MapReduce, MOM, Pigo 17, MongoDB, Hive 2.3, Oozie 4.3, SAS.

You might also like