0% found this document useful (0 votes)
102 views7 pages

Data Resume Snowflake

Karthik N is a seasoned Data Engineer with approximately 8 years of experience in IT, specializing in Big Data technologies such as Hadoop, Spark, and cloud platforms like AWS and GCP. He has a proven track record in designing and implementing scalable ETL pipelines, data integration, and cloud-native applications, with expertise in tools like Matillion, Snowflake, and Apache Airflow. His certifications include being a Google Certified Professional Data Engineer, and he has successfully led projects involving data migration, real-time data processing, and machine learning workflows.

Uploaded by

kalyan.oneitcorp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views7 pages

Data Resume Snowflake

Karthik N is a seasoned Data Engineer with approximately 8 years of experience in IT, specializing in Big Data technologies such as Hadoop, Spark, and cloud platforms like AWS and GCP. He has a proven track record in designing and implementing scalable ETL pipelines, data integration, and cloud-native applications, with expertise in tools like Matillion, Snowflake, and Apache Airflow. His certifications include being a Google Certified Professional Data Engineer, and he has successfully led projects involving data migration, real-time data processing, and machine learning workflows.

Uploaded by

kalyan.oneitcorp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Name: Karthik N

Email: [email protected]
Mobile: 8135563078
Certifications: Google Certified professional Data Engineer
https://www.credly.com/badges/e9f99bfc-b03d-463a-902a-94a00b7033d8/public_url

PROFESSIONAL SUMMARY:

· About 8 years of IT experience in a variety of industries, which includes hands on


experience in Big Data Hadoop, Python and Java development.
· Expertise with the tools in Hadoop Ecosystem including Spark, Hive, HDFS, MapReduce, Sqoop, Kafka, Yarn,
Oozie, and Hbase.
· Excellent knowledge on Distributed components such as HDFS, Job Tracker, Task Tracker, Name Node,
Data Node and Map Reduce programming paradigm.
· Experience developing cloud native applications on platforms like CloudFoundry, Kubernetes, AWS, GCP,
Azure
· Experience in designing and developing production ready data processing applications in Spark using
Scala/Python.
· Proficient in designing and optimizing Matillion data pipelines to ensure efficient data processing,
improving data flow and performance across various business applications.
· Expertise in data integration, data warehousing, and cloud data architecture, leveraging Matillion
alongside other cloud tools to streamline data engineering tasks.
· Designed and implemented end-to-end data integration pipelines to migrate Lawson ERP data into Google
Cloud Platform (GCP) using best practices for scalability and performance.
· Strong experience creating most efficient Spark applications for performing various kinds of data
transformations like data cleansing, de-normalization, various kinds of joins, data aggregation.
· Good experience fine-tuning Spark applications utilizing various concepts like Broadcasting, increasing
shuffle parallelism, caching/persisting dataframes, sizing executors appropriately to utilize the available
resources in the cluster effectively etc.,
· Strong experience automating data engineering pipelines utilizing proper standards and best practices
· Led the migration of on-premise data warehousing solutions
· Implemented and utilized GCP Cost Management Tools such as Budgets and Alerts and Billing Reports to
monitor and optimize cloud resource spending post-migration.
· Designed, developed, and deployed scalable data pipelines using Google Cloud Dataflow for real-time and
batch data processing, optimizing workflows for large-scale data transformations.
· Strong experience in writing applications using Python using different libraries like Pandas, NumPy, SciPy,
Matplotlib etc.
· Good Knowledge in productionizing Machine Learning pipelines (Featurization, Learning, Scoring,
Evaluation) primarily using Spark ML libraries.
· Good exposure with Agile software development process.
· Strong experience on Hadoop distributions like Cloudera, Hortonworks, AWS.
· Good understanding of NoSQL databases and hands-on work experience in writing applications on NoSQL
databases like HBase, Cassandra and MongoDB.
· Experienced in writing complex MapReduce programs that work with different file formats like Text,
Sequence, Xml, parquet, and Avro.
· Strong problem-solving skills and cross communication across the teams and with
clients.
· Extensive Experience on importing and exporting data using stream processing
platforms like Flume and Kafka.
· Very good experience in complete project life cycle (design, development, testing and
implementation) of Client Server and Web applications.
· Design, build, and maintain efficient and reliable Data Lake solutions.
· Design and implement high profile data ingestion pipelines from various sources using Spark and Hadoop
technologies.
· Extensive knowledge in Spark, Hive, Sqoop and HBase.
· Experience in Cloudera, CDP Hadoop distribution.
· Experience in shell script, Scala, Python
· Proficiency in writing complex SQL statements using Hive, SnowSQL and RDBMS standards.
· Demonstrable Experience in designing and implementing modern data warehouse/data lake solution with
understanding of best practices.
· Experience in troubleshooting and fixing real time production jobs in spark and Hadoop.
· Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong
experience in writing complex queries for Oracle.
· Experienced in working with Amazon Web Services (AWS) using S3, EMR, Redshift, Athena, Glue Metastore
etc.,
· Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
· Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and
Agile Scrum.
· Experience of using build tools SBT, Maven.

TECHNICAL SKILLS:

Bigdata Ecosystem : Spark, Map Reduce, Hive, HDFS, Yarn, Impala, HBase,
Sqoop, Oozie, Kafka
Hadoop Distribution : Hortonworks, Cloudera, AWS EMR.
NO SQL Databases : HBase, Cassandra, MongoDB
Cloud Services : AWS S3, EMR, Redshift, Athena, Glue Metastore
Programming Languages : Java, Scala, Python, and R.
Databases : Oracle, MySQL, PostgreSQL, Teradata
Build Tools : Devops, Jenkins, Maven, ANT
Development methodologies : Agile/Scrum
Visualization and analytics tool : Tableau, Qlik View, Qlik Sense, Power BI, Amazon Quicksight, Keras

Professional Summary:

Client: JPMC Aug 2024 – Present


Role: Sr Data Engineer

 Designed and implemented scalable ETL pipelines using Apache Airflow to process and transform
terabytes of data daily.
 Developed custom ETL workflows to extract data from Lawson ERP, transform it into analytics-ready
formats, and load it into BigQuery for advanced business intelligence.
 Successfully integrated Lawson ERP data systems with Google Cloud Platform (GCP), ensuring seamless
data migration and transformation.
 Designed and developed ETL pipelines for data ingestion, transformation, and loading into Teradata data
warehouses.
 Integrated Snowflake with Tableau to enable business users to create real-time dashboards and
visualizations, providing insights into large datasets with minimal latency.
 Designed and implemented a comprehensive Role-Based Access Control (RBAC) framework in Snowflake,
ensuring granular access management across data environments and compliance with organizational
security policies
 Built real-time data pipelines in Matillion, enabling near-instant data ingestion and transformation for
business analytics and reporting.
 Implemented data quality checks and monitoring frameworks within Matillion, ensuring accurate and
consistent data transformations for downstream analytics.
 Developed automated testing scripts and checks to identify and resolve any data discrepancies or quality
issues after migration to BigQuery or other GCP services.
 Designed and developed interactive data portals for business users to access, visualize, and analyze large
datasets from multiple sources in a self-service environment.
 Worked on Teradata utilities like TPT (Teradata Parallel Transporter) and BTEQ to extract, load, and
transform large datasets efficiently.
 Architected and implemented Snowflake’s data platform, leveraging its unique multi-cluster architecture
for seamless separation of storage and compute, ensuring optimal scalability and performance across large
datasets.
 Designed and optimized Snowflake's cloud-native architecture, utilizing separate virtual warehouses for
data processing and storage to improve query performance and isolate workloads efficiently.
 Designed and implemented seamless data integration pipelines between Lawson ERP and Google Cloud
Platform (GCP), ensuring accurate and timely data flow.
 Built highly efficient batch and streaming data pipelines using Google Dataflow and Apache Beam,
achieving real-time data insights.
 Developed and deployed batch and streaming pipelines using Google Cloud Dataflow, leveraging Apache
Beam to process large datasets efficiently.
 Developed scalable data pipelines using Apache Beam and GCP Dataflow, processing both streaming and
batch datasets with high efficiency.
 Developed scalable ETL pipelines using Google Dataflow and Apache Beam, enabling real-time and batch
data processing.
 Optimized cloud storage costs and query performance by designing efficient BigQuery schemas and
partitioning strategies.
 Managed and deployed code changes using Git repositories (GitHub, GitLab, Bitbucket) on Unix-based
systems.
 Implemented data pipelines to move data between BigQuery, Cloud SQL, and Cloud Datastore using
Dataflow for automated ETL processes.
 Automated workflow orchestration using Cloud Composer, minimizing manual intervention and enhancing
reliability.
 Created REST APIs for secure data exchange with external systems and partners.
 Collaborated with data scientists to implement machine learning workflows in BigQuery ML for predictive
analyticsCollaborated with data analysts to create complex SQL queries in Snowflake, enabling real-time
data analysis and visualization for business stakeholders.
 Implemented Snowflake's security features, including role-based access control and data encryption,
ensuring compliance with data privacy regulations.
 Conducted performance tuning and capacity planning for Snowflake databases, resulting in improved
scalability and cost-efficiency for the organization.

Client: FHL BANK April 2022 – Jul 2024


Location: Atlanta, GA
Role: Data Engineer
· Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
· Implemented fault-tolerant data processing pipelines using PySpark's resilient distributed datasets (RDDs)
to handle data processing failures gracefully.
· Designed and implemented data models in PySpark to support analytical queries and reporting
requirements.
· Integrated Snowflake with Tableau to enable business users to create real-time dashboards and
visualizations, providing insights into large datasets with minimal latency.
· Built automated data pipelines in Python and Cloud Composer to extract, transform, and load data into
BigQuery, enabling efficient data processing and reducing manual intervention.
· The data from payment logs is ingested and mapped dynamically to the schema and schema changes are
also handled.
· Designed data models and schemas within Matillion for integrating structured and semi-structured data,
enabling seamless reporting and data analysis.
· Implemented robust data validation and error-handling mechanisms in Matillion for ensuring high-quality
data across multiple pipelines, improving data reliability.
· Designed and implemented a comprehensive Role-Based Access Control (RBAC) framework in Snowflake,
ensuring granular access management across data environments and compliance with organizational
security policies
· Automated data workflows and scheduled batch jobs using Teradata’s Scheduler or external tools like
Apache Airflow and cron jobs.
· Designed and implemented complex ETL/ELT (Extract, Transform, Load) processes using Spark Scala to
transform raw data into meaningful insights.
· Ingested data from Kafka topics and after transformations loaded the data into Elasticsearch.
· Executed data transformation tasks, such as filtering, aggregation, and enrichment, to prepare datasets for
downstream analytics.
· Designed and implemented auto-scaling configurations for Snowflake virtual warehouses, optimizing
resource allocation during varying workloads and keeping costs low during off-peak times.
· Automated data refreshes, reports, and dashboards in data portals using tools like Power BI, Tableau, or
Alteryx, ensuring up-to-date information for decision-making.
· Scheduled and monitored data workflows using cron jobs and other Unix-based scheduling tools.
· Utilized Git in a Unix environment for version control, branching, and collaboration on data processing
scripts and projects.
· Automated the deployment and management of Dataflow pipelines using Terraform and Cloud Deployment
Manager, reducing manual overhead in pipeline setup and scaling.
· worked on real time data ingestion using kafka and Cassandra as a data warehouse.
· Ingesting streaming data into Delta Lake with data bricks
· Utilized BigQuery's Data Transfer Service to automate data ingestion from external sources, such as Google
Analytics and Cloud storage, into BigQuery for further analysis.
· Fine Tuning the spark jobs and monitoring them for failures or performance impact.
· Utilized Python libraries such as Pandas, NumPy, and Dask in conjunction with Dataproc to perform data
wrangling and aggregation on large datasets, ensuring efficient and accurate data pipelines.
· Ensured seamless integration between Lawson ERP systems and GCP by leveraging Cloud storage,
Pub/Sub, and Cloud Dataflow for efficient data ingestion and transformation.
· Responsible for building scalable distributed data solutions using Hadoop.
· Migrated PIG scripts, MR to into Spark Data frames API and Spark SQL to improve performance.
· Used Spark-Streaming APIs to perform transformations and actions on the fly for building the common
learner data model which gets the data from Kafka in near real time and persists into Cassandra.
· Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark, with
Hive and SQL.

Technology/Tools: Hadoop, YARN, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop,
Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux.

Client: Babson Capital Management LLC Oct 2020 - Dec 2021


Location: Hyderabad
Role: Spark Developer
· Experience designing solutions in GCP tools like BigQuery, Cloud storage, GKE, Cloud Deploy, DataProc,
Pub/Sub, DataForm, CloudSQL.
· Leveraged Airflow for scheduling and orchestration of workflows scheduled on a daily basis.
· End to End Data Ingestion to one or more Google Cloud Platform GCP Services and processing the data in
In DataProc Clusters.
· Designed and developed interactive data portals for business users to access, visualize, and analyze large
datasets from multiple sources in a self-service environment.
· Designed and implemented automated workflows for data collection, transformation, and delivery using data
portal tools such as Alteryx, Talend, or proprietary solutions.
· Ingesting incremental data from data sources using nightly batch job and staging them in Google Storage
Buckets.
· Developed Spark Applications by using Scala and Python to implement various data cleansing/validation and
processing activity of large-scale datasets ingested from traditional data warehouse systems.
· Implemented robust data validation and reconciliation processes to ensure that ERP data migrated to GCP was
accurate, complete, and consistent.
· Reading the data from the datalake and joining them with existing data in BigQuery Tables and analyzing the
purchase patterns.
· Developed and maintained PySpark applications for processing large-scale data sets, ensuring efficient data
pipelines and workflows.
· Implemented complex data transformations and aggregations using PySpark RDDs, DataFrames, and Spark
SQL.
· Collaborated with cross-functional teams to understand data requirements and optimize data transformation
workflows for efficiency and scalability.
· Creating testing sets and test cases for the Data Pipelines and Jobs.
· Created GCP Dataproc Clusters and managed the cluster configuration and performance.
· Fine tuning applications for performance and monitoring the jobs for failures.
· Created and scheduled Airflow jobs for multiple workflows and interdepency between the Jobs.
· Creating Data Pipelines to ingest data related to customer payments from the sources.
· Interacting with clients for requirements and queries and story creation and being part of sprint planning.

Technologies: GCP, Google DataProc Cluster, Google Storage, Compute Engine, BigQuery, Hive, Spark, Airflow,
Google Data Tables, Python, Shell, Jenkins, PySpark, Agile, Google Cloud Logging, Pub/Sub, GKE, Cloud Deploy.

Client: Khusaki Technologies Pvt Ltd, India Nov 2018 – Sep 2020
Location: Hyderabad
Role: Data Engineer

· Worked on AWS Glue ETL to ingest data from multiple data sources and perform data validations.
· Developing AWS Glue jobs and monitoring the logs through CloudWatch.
· Created AWS Lambda functions that trigger the glue jobs to perform business transformations and logic upon
the arrival of the files in the S3 bucket.
· Proficient in designing and developing interactive dashboards and reports using Amazon QuickSight to
visualize complex datasets.
· Demonstrated ability to optimize SQL queries and dashboards for enhanced performance and reporting
efficiency.
· Developed and deployed data processing pipelines using Google Cloud Dataproc and Python, enabling
distributed data processing for large-scale data analytics and ETL workflows.
· Optimized SQL queries and dashboard performance, reducing query execution time and enhancing reporting
capabilities.
· Experienced in integrating Amazon QuickSight with various data sources such as Amazon Redshift, Amazon
RDS, and Amazon S3 to create comprehensive analytics solutions.
· Using AWS cloud formation template for provisioning of services and deploying them in multiple
environments through Bamboo CI/CD
· Optimized Python-based data transformations on Dataproc using PySpark to scale data processing tasks and
handle large volumes of structured and unstructured data.
· Skilled in managing data ingestion pipelines, including data extraction, transformation, and loading (ETL)
processes, to populate Amazon Redshift with accurate and timely data from various sources.
 Experience in SQL tuning, indexing, distribution, partitioning on large scale transformations.
 Proficient in Apache Spark with Scala for large-scale data processing and transformation.
 Worked with Apache Flink for out-of-order events and late log data from distributed systems.
· Implemented data ingestion processes to collect streaming data from various sources such as Apache Kafka
· Involved setting up Flink connectors, configuring data ingestion parameters, and ensuring reliable and efficient
data transfer into the Flink application.
· Utilized Docker and Kubernetes to containerize and orchestrate distributed applications, enhancing scalability
and resource management.
· Familiarity with Power BI for data visualization and reporting.
· These concise points effectively highlight your key skills and experiences relevant to the job description.

Technology/Tools: Hadoop, Data Lake, AWS, AWS EMR, Spark, Hive, S3, Athena, Sqoop, Kafka, HBase, PySpark,
Redshift, Step Functions, Python, Spark, Hive, Cassandra, ETL Informatica, Cloudera, Oracle 10g, Microsoft SQL
Server, Control-M, Linux

Client: Brain vision Solutions Pvt Ltd, India Jun 2016 - Oct 2018
Location: Hyderabad
Role: Bigdata/Hadoop Developer
· Used cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query.
· Migrating an entire Oracle database to Big Query and using of Power BI for reporting.
· Worked in migration of RDMS data into Data Lake applications.
· Implemented cloud integrations to GCP and Azure for bi-directional flow setups for data migrations.
· Build data pipelines in Airflow in GCP for ETL/ELT related jobs using different airflow operators.
· Build optimized hive and spark jobs for data cleansing and transformations.
· Integrated Spark Scala with various data sources, such as Apache Hive, HDFS, and external databases,
ensuring seamless data ingestion and extraction processes.
· Collaborated with data source teams to establish efficient and reliable data pipelines.
· Implemented scalable data transformation solutions using Spark Scala, ensuring the system's ability to
handle growing data volumes.
· Conducted regular code reviews to ensure adherence to best practices and coding standards in data
transformation processes.
· Incorporated fault-tolerant techniques to enhance the resilience of data processing pipelines.
· Developed PySpark applications in an optimized way to complete in time.
· Worked on Informatica PowerCenter to design transformations in intermediary layers.
· Worked in Integrating Informatica Cloud with AWS to move data across platforms and to pull in data from
source systems using Informatica MDM.
· Worked on various optimizations techniques in Hive for data transformations and loading.
· Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
· Built API on top of HBase data to expose for external teams for quick lookups.
· Experience in GCP Dataproc, GCS, Cloud functions, Big Query.
· Experience with scripting languages (bash, Perl, Python)
· Performed in-memory computing capacity of Spark to perform procedures such as text analysis and
processing using PySpark.
· Primarily responsible for designing, implementing, Testing, and maintaining database solution for AWS.
· Experience working with Spark Streaming and divided data into different branches for batch processing
through the Spark engine.
· Worked in developed Ni-Fi pipelines for extracting data from external sources.
· Developed Jenkins pipelines for data pipeline deployments.
· Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning,
clustering, and skewing.
· Process and load bound and unbound Data from Google pub/subtopic to Big query using cloud Dataflow
with Python.
· Worked in POC for setup Talend environments and custom libraries for different pipelines.
· Developed various Python and shell scripting for various operations.
· Worked in Agile environment with various teams and projects in fast phase environments.

Technologies: GCP, Google DataProc Cluster, Hive, Talend, Spark, Airflow, Google Data Tables, Python, Shell,
Jenkins, PySpark, Agile, Google Logs Explorer

You might also like