GCP Sample
GCP Sample
Around 11+ years of Cloud Certified Professional IT experience in Building, Deploying, and Scaling Data
Management, Hybrid, Multi-Cloud,DataMigration, and Cloud Architecture development in IT Data
Analytics Projects in various domains like Banking, Insurance, , Telecom & E-Commerce.
Expertise in Google Compute Engine, GCP VMs, Cloud load balancing, Cloud storage, Database (Cloud
SQL, Bigtable, Cloud Datastore), Stack driver monitoring, and Cloud deployment manager.
Built ETL /ELT Data Pipelines, BI models, and dashboards on a public Cloud platform, ETLs to Google
Cloud Platform (GCP) using Cloud-Native tools such as Big Query, Cloud DataProc, Google Cloud
Storage, and Composer.
Hands-on experience in Big Query, GCS bucket, G-Cloud function, Cloud Dataflow, Data Fusion, Pub/Sub,
Cloud Shell, G- Suite, BQ command line utilities, Dataproc, and Operations Suite Stack driver.
Experience in managing GCP resources in the Cloud, Infra Automation, and maintaining Continuous
Integration and Continuous Deployment (CI/CD) pipelines for a fast-paced robust application development
environment.
Participated in deep architectural discussions to Build Confidence and Ensure Customer Success when
building New Solutions and Migrating existing data applications on the GCP. Mentored and trained other
engineers within the organization on modern Big Data and Cloud-Native Data Technologies.
Experience in building and architecting multiple Data pipelines, end-to-end ETL, and ELT processes for Data
ingestion and transformation in GCP and coordinating tasks among the team.
Highly experienced in developing Data Marts and developing warehousing designs using distributed.
Experience with building data pipelines in Python/PySpark/spark-scala/HiveSQL/Presto/Big-Query and
building python DAG in Apache Airflow.
Experience in using different Hadoop ecosystem components such as HDFS, YARN, MapReduce, Spark, Pig,
Sqoop, Hive, Impala, HBase, Kafka, and Crontab tools.
Expertise in Creating, Debugging, Scheduling, and Monitoring jobs using Composer Airflow.
Strong SQL development skills including writing Stored Procedures, Triggers, Views, and User Defined
functions.
Expert in developing SSIS Packages to extract, transform, and load (ETL) data into a Data Warehouse/Data
Mart from heterogeneous sources.
Experience in building efficient pipelines for moving data between GCP and Azure using Azure Data Factory.
Experience in building Power BI reports on Azure Analysis services for better performance when comparing
that to direct query using GCP Big Query.
Experience in data preprocessing (Data Cleaning, Data Integration, Data Reduction, and Data Transformation)
using Python libraries including NumPy, SciPy, and Pandas for data analysis and numerical computations.
Experience working on various file formats including delimited text files, clickstream log files, Apache log files,
Parquet files, Avro files, ISON files, XML files, and others.
Hands-on Bash scripting experience and building data pipelines on Unix/Linux systems. Experience with
scripting languages like PowerShell, Perl, Shell, etc.
Skilled at identifying the right Cloud-Native technology for developing and maintaining big data flows in
organizations.
Experience in designing a Terraform and deploying it in Cloud Deployment Manager to spin up resources
like cloud virtual networks, Compute Engines in public and private subnets along with AutoScaler in Google
Cloud Platform.
Hands-on experience with different programming languages such as Python, R, and SAS, and a good
understanding of NoSQL databases like HBase, and Cassandra.
Experience in developing ETL applications on large volumes of data using different tools: Hive, Spark- Scala,
PySpark, Spark-Sql, and Pig.
Experience in using Sqoop for importing and exporting data from RDBMS to HDFS and Hive.
Expertise in designing and deployment of Hadoop clusters and different Big Data analytic tools including Pig,
Hive, Sqoop, and Apache Spark with Cloudera Distribution.
Certifications:
Education:
Bachelor of Engineering in Computer Science Engineering from Jawaharlal Nehru Technological University
Hyderabad, IN (2012)
Technical Skills:
GCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud SQL, Cloud
Google Cloud Platform
Functions, Cloud Pub/Sub
Reporting & Cloudera Hortonworks, Apache Hadoop,mapR HDFS, Map Reduce, Spark, YARN,
Distribution Power BI, Data Studio, Tableau
Databases HBase, Spark-Redis, Cassandra, Oracle, MySQL, Postgress, Teradata.
Data Services Hive, Pig, Impala, Sqoop, Flume, Kafka.
Scheduling&
Zookeeper, Oozie, Cloudera Manager,Autosys
Monitoring Tools
Eco Systems Apache Spark, , Flume
Cloud Computing
AWS, GCP
Tools
Programming C, Java, Scala, Python, R, SQL, PL/SQL, Pig Latin, HiveQL, Unix,
Languages Shell Scripting.
Java & J2EE
Core Java, Servlets, JSP
Technologies
Build Tools Jenkins, Maven, Gradle
Development Tools Eclipse, IntelJ,pycharm, Microsoft SQL Studio, Toad.
Professional Summary:
Designed, Configured, Maintained, and Managed Google Cloud Virtual Datacenter Cloud computing
platform which hosted Enterprise applications. Used Compute, Storage, Networking, Stackdriver, and other
Google Cloud tools for various operations.
Tasked with the coordination of Cl/CD and DevOps teams in automating the deployment of infrastructure inside
Google Cloud.
Creating fully automated build, test, and deployment processes by leveraging Google Cloud Build as an
automated solution for deploying new versions of containerized web applications and maintaining a
repository of docker image storage in Google Container Registry.
Experience in working with product teams to create various store-level metrics and supporting data pipelines
written in GCP's big data stack.
Deep understanding of moving data into GCP using the Sqoop process, using custom hooks for MySQL, and
using Cloud Data fusion for moving data from Teradata to GCS.
Used Cloud dataflow using python sdk for deploying streaming jobs in GCP as well as batch jobs for custom
cleaning of the Text and JSON files and writing them to Big Query.
Experience in using various operators in Composer/Airflow and have used the Google Cloud client libraries in
Python for Big Query & storage hooks.
Served as an integrator between Data Architects, Data Scientists, and other Data Consumers.
Ability to do proof of concepts for managers in big data/GCP technologies and work closely with the solution
architect to achieve both Short/Long term goals.
Build data pipelines in Airflow/Composer for Orchestrating ETL-related jobs using different airflow
operators.
Performed installation and configuration of secrets management tool (Hashicorp Vault) inside Google
Cloud Platform for managing the user and system credentials.
Creating Google cloud storage (GCS) buckets, maintaining and utilizing the policy management of these
buckets, and using GCS ColdLine for storage and backup on Google cloud.
Leveraged Cloud SDK and Google Cloud CLI to package and deploy containerized web applications onto the
Google Cloud Platform and used Google Kubernetes Engine (GKE) to manage clusters and implemented
autoscaling conditions to accommodate traffic variance.
Automated the IAM secrets policy management for Hashicorp Vault by integrating it with Jenkins,
deployment of PostgreSQL databases, and load balancers for AquaSec container security tool inside GCP
using Cloud SDK and Python.
Designed and developed the REST-based Microservices using Spring Boot.
Used Sqoop import/export to ingest raw data into Cloud Storage by spinning up the Cloud Dataproc cluster.
Experience in GCP Dataproc, GCS, Cloud functions, Cloud SQL & Big Query.
Used Cloud shell SDK in GCP to configure the services Data Proc, Storage, and Big Query.
Build a program with Python and Apache beam and execute it in Cloud Dataflow to run Data validation
between raw source files and big query tables.
Built custom code in Python for tagging tables and columns using a Cloud Data Catalog and built an
application for user provisioning.
Process and load bound and unbound Data from Google Pub/Subtopic to big query using Cloud Dataflow
with Python. Streaming data analysis using Dataflow templates by leveraging Cloud Pub/Sub service.
Monitoring Big Query, Dataproc, and Cloud Data flow jobs via Stack driver for all the different
environments.
Involved in architecting cutting-edge MLOps systems in enterprise environments using GCP Vertex AI.
Creating alerting policies for Cloud Composer, and Cloud Data fusion to notify of any job failure.
Creating a Data Studio report to review billing and usage of services to optimize the queries and contribute
to cost-saving measures.
Used Kubernetes to orchestrate the deployment, scaling, and management of Docker Containers.
Designed and Co-ordinated with the Data Science team in implementing Advanced Analytical Models in
Hadoop Cluster over large Datasets.
Expertise in designing and deployment of Hadoop clusters and different Big Data analytic tools including
Pig, Hive, Sqoop, and Apache Spark, with Cloudera Distribution.
Responsible for estimating the cluster size, monitoring, and troubleshooting the Spark data bricks cluster.
Work related to downloading Big-Query data into Druid, pandas, or Spark data frames for advanced ETL
capabilities.
Worked with Cloud Data Catalog and other Google Cloud APIs for monitoring, query, and billing-related
analysis for Big-Query usage.
Extensive use of Cloud shell SDK in GCP to Configure/Deploy the services like Cloud DataProc (Managed
Hadoop), Google Cloud Storage, and Cloud Big Query.
Created Big Query jobs for loading the data into Big Query tables from data files stored in Google Cloud
storage daily.
Used Data Frame API in Scala for converting the distributed collection of data organized into named
columns, developing predictive analytics using Apache Spark Scala APIs.
Creating architectural stack plan for data access with NoSQL Database Cassandra and Bringing Hadoop and
Cassandra data from various sources using Kafka.
Export analyzed data to relational databases using Sqoop for visualization and generates reports for the BI
team using Tableau.
Developed reports using Tableau which keeps track of the dashboards published to Tableau Server, which
helps us find potential future clients in the organization.
Participating to identify the Big Query usage patterns and tuning Big Query queries fired from dataflow jobs
and more with respect to how app teams can use the Big Query tables for store-level attributes.
Created Big-Query authorized views for row-level security or exposing the data to other teams.
Experience in moving data between GCP and Azure using Azure Data Factory.
Experience in building Power BI reports on Azure Analysis services for better performance.
Environment: GCP, Big Query, GCS Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Azure, Cloud
Shell, Gsutil, Spring boot, Bq Command Line Utilities, Dataproc, VM Instances, Cloud SQL, MySQL, Postgres,
SQL Server, Salesforce SOQL, Python, Scala, Spark, Druid, Hive, Sqoop, Spark-SQL.
Developed Multi-Cloud strategies in better using GCP (for its PAAS) and Azure (for its SAAS).
Involved in loading and transforming large sets of structured, and semi-structured datasets and analyzed
them by running Hive queries.
Developed custom Python program including CI/CD rules for google cloud data catalog for metadata
management.
Designed and developed spark job with Scala to implement an end-to-end data pipeline for batch processing
and fact dimensional modeling and proposed a solution to load it.
Processing data with Scala, spark, spark SQL and load in hive partition tables in parquet file format
Develop spark job with partitioned RDD (like hash, range, custom) for faster processing.
Develop and deploy the outcome using spark and Scala code in the Hadoop cluster running on GCP.
Develop a near real-time data pipeline using Flume, Kafka, and Spark stream to ingest client data from their
weblog server and apply the transformation.
Performs data analysis and design, and creates and maintains large, complex logical and physical data
models, and metadata repositories using ERWIN and MB MDR written shell script to trigger data Stage jobs.
Assist service developers in finding relevant content in the existing reference models.
Like Access, Excel, CSV, Oracle, and flat files using connectors, tasks and transformations provided by
AWS Data Pipeline.
Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
Very good understanding of Microservices architecture, adopting industry best practices for Microservices
while we break down existing apps into microservices.
Wrote Microservices using Spring Boot/Camel and deployed it to the Cloud environment.
Deployed Spring boot applications using Docker and Kubernetes.
Worked on developing PySpark script to encrypt the raw data by using hashing algorithms concepts on client-
specified columns.
Responsible for the Design, Development, and testing of the database and Developed Stored Procedures,
Views, and Triggers.
Developed and maintained CI/CD pipelines on GCP using Cloud Build and Cloud Run, enabling seamless
code deployment, and testing in a controlled environment.
Implemented data versioning and lineage tracking using tools such as Data Catalog and Data Studio,
enabling audibility and traceability of healthcare data in GCP. Experienced in writing live Real-time
Processing and core jobs using Spark Streaming with Kafka as a Data pipeline system.
Worked on Custom Loaders and Storage Classes in PIG to work on several data formats like JSON, XML,
CSV, and generated Bags for processing using pig, etc.
Developed Sqoop and Kafka Jobs to load data from RDBMS, and External Systems into HDFS and HIVE.
Developed Python-based API (RESTful Web Service) to track revenue and perform revenue analysis.
Compiling and validating data from all departments and Presenting to the Director of Operation.
Develop Sqoop script and Sqoop job to ingest data from client-provided database in batch fashion on an
incremental basis.
Use DISTCP to load files from S3 to HDFS and Processing, cleansing, and filter data using Scala, Spark,
Spark SQL, Druid, HIVE, Impala Query, and Load in Hive tables for data scientists to apply their ML
algorithms and generate recommendations as part of data lake processing layer.
Build data pipelines in airflow in GCP for ETL-related jobs using different airflow operators both old and
newer operators.
Created Big-Query authorized views for row-level security or exposing the data to other teams.
Good knowledge of using cloud shell for various tasks and deploying services.
Environment: GCP, Spark, Kafka, snowflake, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, My
SQL, YARN, Oozie, Microservices, Scala, Big Query, Data proc, RDDs, Terraform, Spring boot, EMR, AWS S3,
NoSQL, Hive.
Applied Materials, Hyderabad, IN Jan 2017 – May 2018
Sr GCP Data Engineer
Responsibilities:
Worked with Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud DNS, Cloud
Storage and Cloud Deployment Manager and PaaS, and IaaS concepts of Cloud Computing and
Implementation using GCP.
Used Sqoop to import data into HDFS/Hive from multiple relational databases, performed operations, and
exported the results back.
Involved in migrating on-prem Hadoop system to using GCP (Google Cloud Platform).
Extensively used Spark Streaming to perform the analysis of sales data on real-time regular window time
intervals coming from sources like Kafka.
Performed Spark join Optimizations, and Troubleshooting, Monitored and wrote efficient codes using Scala.
Used big data tools Spark (PySpark, SparkSQL) to conduct real-time analysis of the insurance transaction.
Performed Spark transformations and actions on large datasets. Implemented Spark SQL to perform complex
data manipulations, and to work with large amounts of structured and semi-structured data stored in a
cluster using Data Frames/Datasets.
Migrated previously written cron jobs to airflow/composer in GCP and Worked with big data tools: Hadoop,
Spark, Kafka, etc.
Support existing GCP Data Management implementations.
Created GCP Big Query authorized views for row-level security or exposing the data to other teams.
Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on-premises ETLs
to Google Cloud Platform (GCP) using cloud-native tools such as BIG query, Cloud Data Proc, Google
Cloud Storage, Composer.
Developed Spark code using Scala and Spark-SQL for faster processing of data.
Created Oozie workflow engine to run multiple Spark jobs.
Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop
using Spark SQL, Data Frame, pair RDDs, and Spark YARN.
Experience with Terraform scripts which automates the step execution in EMR to load the data to Scylla DB.
De-normalizing the data as part of transformation which is coming from Netezza and loading it to No Sql
Databases and MySql.
Loading data into snowflake tables from the internal stage using SnowSql.
Used import and export from the internal stage (snowflake) from the external stage (AWS S3).
Very good understanding of Partitions, and bucketing concepts in Hive and designed both Managed and
External tables in Hive to optimize performance.
Collecting and aggregating large amounts of log data using Kafka and staging data in HDFS Data Lake for
further analysis.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and HBase Filters to
compute various metrics for reporting on the dashboard.
Developed shell scripts in a UNIX environment to automate the data flow from source to different zones in
HDFS.
Created and defined job workflows as per their dependencies in Oozie and e-mail notification service upon
completion of the job for the team that request the data and monitored jobs using Oozie on Hortonworks.
Experience in designing both time-driven and data-driven automated workflows using Oozie.
Environment: Hadoop (Cloudera), HDFS, GCP, Kafka, Map Reduce, Hive, Scala, Python, Pig, Sqoop,
AWS, Azure, DB2, UNIX Shell Scripting, JDBC.
Migrating an entire oracle database to BigQuery and using Power BI for reporting. Built data pipelines in
airflow in GCP for ETL-related jobs using different airflow operators.
Experienced in GCP Dataproc, GCS, Cloud functions, and BigQuery.
Experienced in Google cloud components, Google container builders, and GCP client libraries and cloud
SDK'S.
Got involved in migrating the on-prem Hadoop system to using GCP (Google Cloud Platform).
Worked on analyzing and understanding the data from different domains to integrate into Data Market Place.
Developed PySpark programs and created the date frames and worked on transformations.
Working with AWS/GCP cloud using GCP Cloud storage, Data-proc, Data Flow, Big-Query, EMR, S3,
Glacier, and EC2 with EMR Cluster
Experienced in working with Services like Data Lake, Data Lake Analytics, SQL Database, Synapse, Data
Bricks, Data factory, Logic Apps and SQL Data warehouse, and GCP services Like Big Query, Dataproc,
Pub sub, etc.
Worked on analyzing the data using PySpark, and Hive, based on ETL mappings.
Experienced in Implementing Continuous Delivery pipelines with Maven, Ant, Jenkins, and GCP.
Experienced in Hadoop 2.6.4 and Hadoop 3.1.5.
Developed multi-cloud strategies in better using GCP (for its PAAS).
Experienced in migrating legacy systems into GCP technologies.
Storing data files in Google Cloud S3 Buckets daily basis, Using Data Proc, and Big Query to develop and
maintain GCP cloud base solutions.
Developed PySpark script to merge static and dynamic files and cleanse the data.
Worked with Different business units to drive the design & development strategy.
Created functional specifications and technical design documentation. Coordinated with Different Teams like
cloud security, Identity Access Management, platforms, and Networks in order to get all the necessary
accreditations and intake process.
Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines
such as GCP. Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
Compared Self-hosted Hadoop with respect to GCPS Data Proc and explored Big Table (managed HBase)
use cases, and performance evolution.
Build data pipelines in airflow in GCP for ETL-related jobs using different airflow operators.
Designed various Jenkins jobs to continuously integrate the processes and execute CI/CD pipelines using
Jenkins.
Was involved in setting up Apache airflow service in GCP.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, Big-Query
Environment: GCP, PySpark, GCPs Data Proc Big-Query, Hadoop, Hive, GCS, Python, Snowflake, Dynamo DB,
Oracle Database, Power Bi, SDK, Data Flow, Glacier, EC2, EMR Cluster, SQL Database, Synapse, Data Bricks.
Installed, configured, and maintained Apache Hadoop clusters for application development and major
components of the Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie, and Zookeeper.
Implemented six nodes of CDH4 Hadoop Cluster on CentOS.
Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop.
Monitoring the running Map Reduce programs on the cluster.
Responsible for loading data from UNIX file systems to HDFS.
Used HBase-Hive integration, and wrote multiple Hive UDFs for complex queries.
Created multiple Hive tables, and implemented Partitioning, Dynamic Partitioning, and Buckets in Hive for
efficient data access.
Experienced in writing programs using HBase Client API.
Involved in loading data into HBase using HBase Shell, HBase Client API, Pig, and Sqoop.
Experienced in the design, development, tuning, and maintenance of NoSQL databases.
Written Map Reduce program in Python with the Hadoop streaming API.
Developed unit test cases for Hadoop Map Reduce jobs with MRUnit.
Excellent experience in ETL analysis, designing, developing, testing, and implementing ETL processes
including performance tuning and query optimizing of databases.
Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
Worked with application teams to install an operating system, Hadoop updates, patches, and version
upgrades as required.
Used Maven as the build tool and SVN for code management.
Implemented testing scripts to support test-driven development and continuous integration.
Environment: Hadoop, Map Reduce, HDFS, HBase, Hive, Impala, Pig, Java, SQL, Ganglia, Scoop, Flume, Oozie,
Unix, Java, Java Script, Maven, Eclipse.