Data Resume Snowflake
Data Resume Snowflake
Email: [email protected]
Mobile: 8135563078
Certifications: Google Certified professional Data Engineer
https://www.credly.com/badges/e9f99bfc-b03d-463a-902a-94a00b7033d8/public_url
PROFESSIONAL SUMMARY:
TECHNICAL SKILLS:
Bigdata Ecosystem : Spark, Map Reduce, Hive, HDFS, Yarn, Impala, HBase,
Sqoop, Oozie, Kafka
Hadoop Distribution : Hortonworks, Cloudera, AWS EMR.
NO SQL Databases : HBase, Cassandra, MongoDB
Cloud Services : AWS S3, EMR, Redshift, Athena, Glue Metastore
Programming Languages : Java, Scala, Python, and R.
Databases : Oracle, MySQL, PostgreSQL, Teradata
Build Tools : Devops, Jenkins, Maven, ANT
Development methodologies : Agile/Scrum
Visualization and analytics tool : Tableau, Qlik View, Qlik Sense, Power BI, Amazon Quicksight, Keras
Professional Summary:
Designed and implemented scalable ETL pipelines using Apache Airflow to process and transform
terabytes of data daily.
Developed custom ETL workflows to extract data from Lawson ERP, transform it into analytics-ready
formats, and load it into BigQuery for advanced business intelligence.
Successfully integrated Lawson ERP data systems with Google Cloud Platform (GCP), ensuring seamless
data migration and transformation.
Designed and developed ETL pipelines for data ingestion, transformation, and loading into Teradata data
warehouses.
Integrated Snowflake with Tableau to enable business users to create real-time dashboards and
visualizations, providing insights into large datasets with minimal latency.
Designed and implemented a comprehensive Role-Based Access Control (RBAC) framework in Snowflake,
ensuring granular access management across data environments and compliance with organizational
security policies
Built real-time data pipelines in Matillion, enabling near-instant data ingestion and transformation for
business analytics and reporting.
Implemented data quality checks and monitoring frameworks within Matillion, ensuring accurate and
consistent data transformations for downstream analytics.
Developed automated testing scripts and checks to identify and resolve any data discrepancies or quality
issues after migration to BigQuery or other GCP services.
Designed and developed interactive data portals for business users to access, visualize, and analyze large
datasets from multiple sources in a self-service environment.
Worked on Teradata utilities like TPT (Teradata Parallel Transporter) and BTEQ to extract, load, and
transform large datasets efficiently.
Architected and implemented Snowflake’s data platform, leveraging its unique multi-cluster architecture
for seamless separation of storage and compute, ensuring optimal scalability and performance across large
datasets.
Designed and optimized Snowflake's cloud-native architecture, utilizing separate virtual warehouses for
data processing and storage to improve query performance and isolate workloads efficiently.
Designed and implemented seamless data integration pipelines between Lawson ERP and Google Cloud
Platform (GCP), ensuring accurate and timely data flow.
Built highly efficient batch and streaming data pipelines using Google Dataflow and Apache Beam,
achieving real-time data insights.
Developed and deployed batch and streaming pipelines using Google Cloud Dataflow, leveraging Apache
Beam to process large datasets efficiently.
Developed scalable data pipelines using Apache Beam and GCP Dataflow, processing both streaming and
batch datasets with high efficiency.
Developed scalable ETL pipelines using Google Dataflow and Apache Beam, enabling real-time and batch
data processing.
Optimized cloud storage costs and query performance by designing efficient BigQuery schemas and
partitioning strategies.
Managed and deployed code changes using Git repositories (GitHub, GitLab, Bitbucket) on Unix-based
systems.
Implemented data pipelines to move data between BigQuery, Cloud SQL, and Cloud Datastore using
Dataflow for automated ETL processes.
Automated workflow orchestration using Cloud Composer, minimizing manual intervention and enhancing
reliability.
Created REST APIs for secure data exchange with external systems and partners.
Collaborated with data scientists to implement machine learning workflows in BigQuery ML for predictive
analyticsCollaborated with data analysts to create complex SQL queries in Snowflake, enabling real-time
data analysis and visualization for business stakeholders.
Implemented Snowflake's security features, including role-based access control and data encryption,
ensuring compliance with data privacy regulations.
Conducted performance tuning and capacity planning for Snowflake databases, resulting in improved
scalability and cost-efficiency for the organization.
Technology/Tools: Hadoop, YARN, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop,
Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux.
Technologies: GCP, Google DataProc Cluster, Google Storage, Compute Engine, BigQuery, Hive, Spark, Airflow,
Google Data Tables, Python, Shell, Jenkins, PySpark, Agile, Google Cloud Logging, Pub/Sub, GKE, Cloud Deploy.
Client: Khusaki Technologies Pvt Ltd, India Nov 2018 – Sep 2020
Location: Hyderabad
Role: Data Engineer
· Worked on AWS Glue ETL to ingest data from multiple data sources and perform data validations.
· Developing AWS Glue jobs and monitoring the logs through CloudWatch.
· Created AWS Lambda functions that trigger the glue jobs to perform business transformations and logic upon
the arrival of the files in the S3 bucket.
· Proficient in designing and developing interactive dashboards and reports using Amazon QuickSight to
visualize complex datasets.
· Demonstrated ability to optimize SQL queries and dashboards for enhanced performance and reporting
efficiency.
· Developed and deployed data processing pipelines using Google Cloud Dataproc and Python, enabling
distributed data processing for large-scale data analytics and ETL workflows.
· Optimized SQL queries and dashboard performance, reducing query execution time and enhancing reporting
capabilities.
· Experienced in integrating Amazon QuickSight with various data sources such as Amazon Redshift, Amazon
RDS, and Amazon S3 to create comprehensive analytics solutions.
· Using AWS cloud formation template for provisioning of services and deploying them in multiple
environments through Bamboo CI/CD
· Optimized Python-based data transformations on Dataproc using PySpark to scale data processing tasks and
handle large volumes of structured and unstructured data.
· Skilled in managing data ingestion pipelines, including data extraction, transformation, and loading (ETL)
processes, to populate Amazon Redshift with accurate and timely data from various sources.
Experience in SQL tuning, indexing, distribution, partitioning on large scale transformations.
Proficient in Apache Spark with Scala for large-scale data processing and transformation.
Worked with Apache Flink for out-of-order events and late log data from distributed systems.
· Implemented data ingestion processes to collect streaming data from various sources such as Apache Kafka
· Involved setting up Flink connectors, configuring data ingestion parameters, and ensuring reliable and efficient
data transfer into the Flink application.
· Utilized Docker and Kubernetes to containerize and orchestrate distributed applications, enhancing scalability
and resource management.
· Familiarity with Power BI for data visualization and reporting.
· These concise points effectively highlight your key skills and experiences relevant to the job description.
Technology/Tools: Hadoop, Data Lake, AWS, AWS EMR, Spark, Hive, S3, Athena, Sqoop, Kafka, HBase, PySpark,
Redshift, Step Functions, Python, Spark, Hive, Cassandra, ETL Informatica, Cloudera, Oracle 10g, Microsoft SQL
Server, Control-M, Linux
Client: Brain vision Solutions Pvt Ltd, India Jun 2016 - Oct 2018
Location: Hyderabad
Role: Bigdata/Hadoop Developer
· Used cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query.
· Migrating an entire Oracle database to Big Query and using of Power BI for reporting.
· Worked in migration of RDMS data into Data Lake applications.
· Implemented cloud integrations to GCP and Azure for bi-directional flow setups for data migrations.
· Build data pipelines in Airflow in GCP for ETL/ELT related jobs using different airflow operators.
· Build optimized hive and spark jobs for data cleansing and transformations.
· Integrated Spark Scala with various data sources, such as Apache Hive, HDFS, and external databases,
ensuring seamless data ingestion and extraction processes.
· Collaborated with data source teams to establish efficient and reliable data pipelines.
· Implemented scalable data transformation solutions using Spark Scala, ensuring the system's ability to
handle growing data volumes.
· Conducted regular code reviews to ensure adherence to best practices and coding standards in data
transformation processes.
· Incorporated fault-tolerant techniques to enhance the resilience of data processing pipelines.
· Developed PySpark applications in an optimized way to complete in time.
· Worked on Informatica PowerCenter to design transformations in intermediary layers.
· Worked in Integrating Informatica Cloud with AWS to move data across platforms and to pull in data from
source systems using Informatica MDM.
· Worked on various optimizations techniques in Hive for data transformations and loading.
· Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
· Built API on top of HBase data to expose for external teams for quick lookups.
· Experience in GCP Dataproc, GCS, Cloud functions, Big Query.
· Experience with scripting languages (bash, Perl, Python)
· Performed in-memory computing capacity of Spark to perform procedures such as text analysis and
processing using PySpark.
· Primarily responsible for designing, implementing, Testing, and maintaining database solution for AWS.
· Experience working with Spark Streaming and divided data into different branches for batch processing
through the Spark engine.
· Worked in developed Ni-Fi pipelines for extracting data from external sources.
· Developed Jenkins pipelines for data pipeline deployments.
· Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning,
clustering, and skewing.
· Process and load bound and unbound Data from Google pub/subtopic to Big query using cloud Dataflow
with Python.
· Worked in POC for setup Talend environments and custom libraries for different pipelines.
· Developed various Python and shell scripting for various operations.
· Worked in Agile environment with various teams and projects in fast phase environments.
Technologies: GCP, Google DataProc Cluster, Hive, Talend, Spark, Airflow, Google Data Tables, Python, Shell,
Jenkins, PySpark, Agile, Google Logs Explorer