Anusha
Anusha
Anusha
Irving, Texas, USA- 75038 | MO.: +1 (312) 248 1255 | Email: [email protected] | LinkedIn
SUMMARY
• 3+ years of experience as a Data Engineer in Data Engineering, Data Analysis, ETL, and Data warehousing, Business Intelligence background in
Designing, Developing, Analysis, Implementation, and post implementation support of DWBI applications with a proven track record of designing
and implementing robust and scalable data solutions.
• Expertise in ETL processes, data modeling, and database management, ensuring optimal data flow, organization, and retrieval.
• Proficient in utilizing cloud platforms, including AWS and Azure, to architect and implement end-to-end data pipelines for diverse business needs.
• Skilled in leveraging big data technologies such as Apache Spark and Hadoop to process and analyze large datasets efficiently.
• Experienced in implementing data quality and governance measures, ensuring data accuracy, compliance, and security.
• Demonstrated ability to collaborate with cross-functional teams to understand business requirements and translate them into effective data solutions.
• Strong background in optimizing database performance, conducting query tuning, and implementing best practices for efficient data storage and
retrieval.
• Engineered data pipelines in Azure Data Factory (ADF), orchestrating ETL processes from diverse sources to Azure SQL, Blob storage, and Azure
SQL Data Warehouse.
• Hands-on experience with SQL, NoSQL databases, and data warehouse solutions, contributing to seamless integration and accessibility of data.
• Proven expertise in version control systems (e.g., Git) and agile methodologies, ensuring a collaborative and efficient development process.
• Passionate about staying current with emerging data technologies and trends, continuously seeking opportunities to enhance data engineering
capabilities.
• Worked on architecting, designing, and implementation of large-scale data and analytics solutions on Snowflake Cloud Data Warehouse.
• Efficient in preprocessing data including Data cleaning, Correlation analysis, Imputation, Visualization, Feature Scaling and Dimensionality
Reduction techniques using Machine learning platforms like Python Data Science Packages (Scikit-Learn, Pandas, NumPy).
• Well-Versed in analyzing data using Python, R and SQL.
• Experience in building reliable ETL processes and data pipelines for batch and real - time streaming using SQL, Python, Databricks, Spark,
Streaming, Sqoop, Hive, AWS, Azure, NiFi, Oozie and Kafka.
• Responsible for designing and building new data models and schemas using Python and SQL.
• Involved in developing python scripts, SSIS, Informatica, and other ETL tools for extraction, transformation, loading of data into the data warehouse.
• Building the Tableau dashboards to provide the effectiveness of weekly campaigns and customer acquisition.
• Utilized AWS EMR to transform and move large amounts of data into and out of other AWS Data stores and databases, such a S3 and Dynamo
DB. To support business intelligence objectives, I built data warehousing systems utilizing Amazon S3, Dynamo DB, EC2, and Snowflake.
• Implements best practices to create cloud functions, applications, and databases.
• Responsible for loading the tables from the azure data lake to azure blob storage for pushing them to snowflake.
• Worked in all stages of Software Development Life Cycle (SDLC).
EXPERIENCE
PNC Financial – Data Engineer | USA | Jul 2023 - Present
• Worked on data integration projects using ETL tools like SSIS, Informatica, and Talend Studio to extract data from various sources like Oracle,
MySQL, SQL Server, and load it into Snowflake cloud data warehouse.
• Assemble large, complex data sets that meet functional / non-functional business requirements.
• Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing
infrastructure for greater scalability, etc.
• •Worked on the development and maintenance of data lakes, utilizing Hadoop Distributed File System (HDFS) for scalable storage and
retrieval of structured and unstructured data.
• Collaborated with data engineering teams to implement and enhance Hadoop-based data solutions, leveraging technologies such as
MapReduce and Apache Spark.
• Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL.
• Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key
business performance metrics.
• Collaborated with vendor partners to stay informed about the latest advancements in Azure and cloud technologies.
• Collaborated with data engineering teams to implement Azure-based data solutions, leveraging services such as Azure Data Factory and Azure
Databricks.
• Implemented data lakes on Azure Storage, enabling efficient storage and retrieval of structured and unstructured data.
• Conducted performance tuning for Azure SQL Data Warehouse, optimizing query performance for analytical workloads.
• Implemented Azure Active Directory (AAD) for identity and access management, strengthening security measures and achieving compliance
with industry standards.
• Orchestrated the adoption of Azure DevOps for continuous integration and continuous deployment (CI/CD) pipelines, resulting in a 30%
reduction in deployment time.
• Spearheaded the migration of on-premises applications to Microsoft Azure, achieving a 20% reduction in operational costs and improving
overall scalability.
• Setting up a continuous integration (CI) process, entails creating a Git repository, configuring a Jenkins server, and installing the Git plugin,
creating a new Jenkins job, and configuring it to build the project from the Git repository, configuring build settings, saving the job, running a
test build, and finally publishing the build to a remote Git repository or deploying it to the production environment after a successful build.
• Implemented New ETL Solution based on Business Requirements.
• Design & implement strategies to build new data quality frameworks to replace old systems in place.
• Working with Agile environment and using rally tool to maintain the user stories and tasks. Participated in collaborative team designing and
developing a Snowflake data warehouse.
• Tableau was used to create data visualization dashboards to communicate findings to stakeholders.
• Implemented security best practices and compliance measures in accordance with AWS Well-Architected Framework.
• Implemented Infrastructure as Code using AWS CloudFormation, automating the provisioning and management of AWS resources.
• Designed and implemented serverless applications, leveraging AWS Lambda, API Gateway, and DynamoDB.
• Spearheaded the adoption of serverless architecture using AWS Lambda, reducing development time and infrastructure costs by 25%.
• Conducted training sessions for development teams to promote serverless best practices and coding standards.
EDUCATION
Masters in computer information systems | University of Central Missouri, Lee’s Summit, MO, USA
(Aug 2021-Dec 2022)
Bachelors in computer science | Usha Rama College of Engineering and Technology, Andhra Pradesh, India
(Jul 2016- Sep 2020)
SKILLS
Methodologies: SDLC, Agile, Scrum
Programming Languages and Platforms: Python, Java, C, Scala, SQL, Unix Shell Script
Big Data: Hadoop, HDFS, Sqoop, Hive, HBase, Spark, Kafka, Impala, Nifi, Cassandra, Apache Airflow, Databricks
ETL/ELT Tools: SSRS, SSIS, SSAS, Informatica, Azure Data factory, DBT
Databases: MySQL, SQL Server, Snowflake cloud, SQL, NoSQL, MySQL, SQL Server DB2, PostgreSQL, Oracle, MongoDB
Tools/ IDE/ Build Tools:
PowerBI, Tableau, Talend Studio, Git, Git Bash, Eclipse, IntelliJ, Maven, Jenkins, GitHub, Jira, Snowflakes, Bitbucket, Data pipelines, QlikView
Cloud Computing: AWS (S3, CloudWatch, Athena, RedShift, EMR, EC2, DynamoDB), Azure (Azure Data Factory, Azure Blob, Azure Databricks),
IAM, Secret Manager, S3, Lambda, CloudWatch, Messaging Queue (SNS & SQS), Azure - ADF, Blob Storage
Data Analytics Skills:
Data Cleaning, Data Masking, Data Manipulation, Data Visualization
BI & CRM Tools:
Tableau, Microsoft Business Intelligence (Power BI), Sigma Computing
Packages:
NumPy, Pandas, Matplotlib, SciPy Scikit-learn, TensorFlow.
File Formats:
Parquet, Avro, ORC, JSON
Operating System:
Windows, Linux, Unix, MacOS