0% found this document useful (0 votes)
47 views4 pages

Bikash Jha CV Geospatial

The document provides details about the experience and skills of Bikash Jha, a data engineer with over 7 years of experience working with big data. It outlines roles at companies like Planet Labs and Homelike Internet GmbH where responsibilities included building data pipelines and databases, implementing serverless architectures, and developing machine learning models.

Uploaded by

bikash jha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views4 pages

Bikash Jha CV Geospatial

The document provides details about the experience and skills of Bikash Jha, a data engineer with over 7 years of experience working with big data. It outlines roles at companies like Planet Labs and Homelike Internet GmbH where responsibilities included building data pipelines and databases, implementing serverless architectures, and developing machine learning models.

Uploaded by

bikash jha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Leverkusen, Germany 51381

Bikash Jha +49 (0) 157 50325815

Data Engineer [email protected]


linkedin.com/in/bikash-jha/

Profile Skills: Data Structures, Algorithm,

I’m a driven data engineer with 7+ years of experience playing with big datasets. I have Serverless, Data Pipelines
experience and knowledge in a diverse set of disciplines, technologies and tools like Data Technologies: Python 2/3,Golang,
Science, Cloud and data pipelines, and data architecture. Spark, Spark Streaming, Elastic,
Kubernetes, Docker, Airflow
Looking for a career opportunity to apply my skills/experience on challenging projects. Eager
Cloud: AWS, GCP
to build robust databases that lay the groundwork for revealing game-changing insights for
Databases: MySQL, MongoDB,
the business. involving extensive use of current IT techniques to contribute productively
towards the growth of the company thus growing professionally. DynamoDB, PostGre, HBase
Big Data: Kafka, MapR/Cloudera, Hive,
Zookeeper, Oozie, HBase
Monitoring : Grafana , Loki,
Prometheus, StackDriver Logging

Experience
Planet Labs, Berlin, Germany
Senior Data Engineer
OCT 2021 - CURRENT
○ Geospatial Infrastructure & Data Platform Management (Planetary-Variables):
○ Orchestrated the design and management of a comprehensive infrastructure to ingest and process geospatial data from a
constellation of 250+ satellites, including the integration of machine learning models to detect and analyze changes in forest
cover using GeoDiff.
○ Created specialized data structures to accommodate both vector (e.g., feature collection, multi-polygons, points) and raster
(e.g., satellite imagery, elevation models) geospatial datasets, enhancing processing capabilities for detailed deforestation
analysis and environmental monitoring.
○ Real-Time Processing & Advanced Geospatial Operations (Planetary-Variables)::
○ Leveraged PubSub/PubSub-Lite and Apache Spark for real-time geospatial data streaming and on-the-fly analytics,
incorporating machine learning algorithms to identify and respond to rapid environmental changes indicative of
deforestation.
○ Conducted spatial join operations, proximity analysis, and advanced analytics, contextualizing incoming satellite data with
historical deforestation patterns through GeoDiff analysis, enabling timely detection and response strategies.
○ Data Storage, Schema Design & Spatial Indexing:
○ Designed schemas in BigQuery and Bigtable, incorporating spatial indices for optimized query performance on petabytes of
geospatial data, including layers specific to deforestation tracking and monitoring.
○ Implemented a Data Lake (GCS) to store geoParquet formatted specifically for machine learning models focused on
detecting deforestation activities, ensuring efficient data management and accessibility for deep learning applications.
○ Geospatial Query Development & Spatial Analysis:
○ Formulated complex SQL and spatial SQL queries in BigQuery and Bigtable to extract, transform, and analyze geospatial
data, utilizing machine learning outputs to monitor deforestation and land use changes.
○ Employed advanced geospatial techniques such as geostatistical analyses, temporal and spatial trend detection, and
business modeling to provide deep insights for Conservation Service Managers (CSMs), enhancing decision-making in
forest conservation efforts and enabling predictive analysis of environmental impact.
○ Pipeline Orchestration, Backfill & Spatial ETL Operations:
○ Deployed Apache Airflow on a GKE cluster, integrating geospatial ETL Dags with gcsfuse and PubSub.
○ Managed backfill operations with Airflow, ensuring the spatial integrity and accuracy of geospatial datasets over time.
○ Deployment of Spark on GKE-K8s to pod to run spark jobs on Kubernetes with Autoscaling Enabled.
● Reporting & Backend Integration:
○ Golang-Based Reporting System: Implemented a Golang and GORM-based backend for dynamic reporting
○ Real-Time Transaction Processing and Data Integration: Employed Golang to process and integrate millions of transactional
events in real-time, ensuring instant calculations and seamless data flow from PubSub into PostgreSQL and Redis.
○ Developed a Golang library for geometry validation, supporting various shapes such as multipolygons, features, and feature
collections.
○ Legacy Support, Enhancements & Monitoring:
○ Maintained and enhanced legacy batch pipelines on Apache Beam
○ Grafana setup and created metric in grafana (LogQL, StackDriver, Prometheus)
○ Slack integration with grafana and airflow.
○ Gitlab CI/CD , Terraform

Tech Stack: DeepLearning,GCP, BigQuery, Bigtable,Dask, Spark,PubSub,Kubernetes, Python, Golang, Airflow ,Qgis, Mapbox,
PubSub-Lite, GitLab CI,Terraform, Postgres, Redis,

Homelike Internet GmbH, Cologne, Germany


Senior Data Engineer
OCT 2021 - CURRENT
○ Managing 6 different ETL processes to transfer data across MongoDB, BigQuery, SalesForce and several Marketing Channels
○ Designing serverless framework for real-time consumption of user tracking data using kinesis
○ Implementation of Dead-Letter-Queue in Kinesis data stream (Kinesis to lambda to SQS)
○ Conceptualisation of a new Microservices architecture using:
○ Elastic Kubernetes Cluster on AWS
○ E/L/K Stack for monitoring
○ Airflow on Kubernetes with git sync
○ Kafka on K8s
○ Spark on K8s to run pyspark jobs

Tech Stack: GCP, BigQuery, AWS, Lambda, Kinesis, SQS, CloudWatch, Kubernetes, Python

Aurigo Software Technologies, Bangalore, India


Senior Data Engineer
JAN 2021 - SEP 2021
Company Description : Helps state agencies, cities, counties, water authorities, airports, and facility owners plan, build and maintain
capital assets, infrastructure and facilities by combining Data Engineering, Democratized Data Science and Data Orchestration (link)
Project: Serverless / Managed AWS services
○ Implementation of Amazon Kinesis Firehose to collect Realtime, streaming data
○ Designing AWS S3 as the Data Lake to store all Raw Data.
○ Setting up AWS Athena to analyze data in Amazon S3 using standard SQL.
○ Kubernetes on AWS EKS Cluster using Cloud Formation Template.
○ EKS Cluster Auto-scaler and setting up Amazon ECS for container orchestration.
○ AWS lambda function
Project: Microservices Components on Kubernetes.
○ Writing Connector Microservices to fetch data from different sources.
○ Pyspark/PandasDF/Boto3 Connector: Twitter, S3 bucket, DynamoDB, Filesystem.
○ Building Machine Learning Models in Microservice.
○ Dag Engine Microservices: Integration engine for all microservices (ML + Connector)
○ Airflow Microservices: programmatically author and schedule their workflows and monitor them via the built-in Airflow user
interface.
○ Attaching EFS volume mount/AWS RDS instance to Airflow Kubernetes Pods.
○ Fluentd microservices and Elastic stack to fetch/store logs from Airflow.
○ Implementing spark-on-k8s operator pod to run spark jobs on Kubernetes.
○ ML Models: Bert, NLP, PyTorch, Hugging Face libraries.
○ Monitoring Kubernetes cluster health using K8s Dashboard, Prometheus and Grafana.
○ Exploring the GCP Platform for future POC’s and Kafka Microservice.

Tech Stack: Linux, AWS, Kubernetes, Docker, Python, Airflow, DynamoDB

LTI (Larsen & Toubro Infotech), Pune, India


Senior Product Engineer
JAN 2020 - DEC 2020
Company Description : LTI Mosaic Decisions Platform
Responsibilities:
○ Refactoring existing code and evolving new architecture fit for our existing product
○ Writing Connector code using spark/ pySpark / PandasDF for Azure/S3/NO SQL etc
○ Kubernetes setup on AMAZON EKS cluster and implementation of key vault (HashiCorp)
○ Writing our own Kubernetes API to submit spark jobs on K8s and KubeSpawner APIs.
○ Horizontal Autoscaling on Kubernetes Pods and optimization of existing spark Jobs.

Tech Stack: Python , Amazon S3, Azure Blob, Cosmos DB, MongoDB, Kafka, Kubernetes, PySpark

AMDOCS, Pune, India


Software Developer
AUG 2016 - DEC 2019
Company Description : Platform that implements business logic and Allows the marketer to inject machine learning logic run on a
big data system for any decision that needs to be taken within the experience ( Email/Message text etc.)

Project: Auto-ML Product Recommendation


○ Setup/Code for consumers/Producer and offset management in Kafka streaming (Listener balancer)
○ Cleaning up data and data preparation for Model Creation using Pyspark
○ Implementation of ML model (Random Forest/Linear Regression
based on demographic data and supervised learning to predict potential customers
○ Analysis of target customer feedback and provide insight to marketing team

Tech Stack: Pyspark, Python, MapR, Kafka, Hive, Random Forest


Education
University Of Kalyani, West Bengal
Bachelor of Technology in 2016 Information Technology
2016

Awards: Awarded as Employee of the Month 2017, Awarded as Employee of the Quarter Q2 ‘18, On-Site Opportunities - Mexico for AT&T and
Manila for PLDT.

Other Interesting Facts


Languages: English (Professional), Hindi (Native), Bengali (C1)
Interests: contributing to open Source Platform , Volleyball, Cricket

Reference
Available upon request

You might also like