0% found this document useful (0 votes)
191 views5 pages

Dice Resume CV SN

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 5

Shourov Nath

Contact #718-674-4275
Email:[email protected]
Status: :Permanent Resident
New York, New York

PROFESSIONAL SUMMARY

• 8+ years of experience in working with Big DataTechnologies on systems which


comprises massive amounts of data running in highly distributed Hadoop environments.
Hands on
experience in using Hadoop ecosystem components like Hadoop, Hive, Sqoop,
Azure DataBricks ,Azure ADF, Azure Synapse, AWS(S3,EMR,Dynamodb, Redshift)
with Spark(Python/Scala), Pandas, Spark Streaming, Apache Airflow, Spark SQL,
Oozie, Autosys, Jenkins, Zookeeper, Kafka, MapReduce ,Yarn, Docker ,Snowflake,
GitHub.

• Strong Knowledge on architecture and components of Spark and efficient in working


with Spark Core, SparkSQL, Spark streaming with scripting language Python/Scala/java.

• Hands-on-experience with Snowflake utilities, SnowPipe, SnowSQL ,Big data


techniques using python/spark/scala. Experience in configuring Spark Streaming
to receive real time data from the Apache Kafka and store the stream data to
HDFS
using Python.
• Accomplished complex HiveQL queries for required data extraction from Hive tables.
• Pleasant experience of Partitions, bucketing concept in Hive and designed both
Managed and External tables in Hive to optimize performance.
• Strong experience in migrating other databases to Snowflake and experience in
usingSnowflake Clone , Time Travel and building SnowPipe. Involved in converting
Hive/SQL queries into Spark transformations using Spark Data frames and Python.
• Use Spark Data Frames API over Azure cloud,Aws cloud and cloudera platform to
perform analytics on data.
• Used Spark Dataframe Operations to perform required Validations in the data.
• Experience in integrating Hive queries into Spark environment using Spark SQL.
• Good understanding and knowledge of NoSQL databases like MongoDB,
HBase and Cassandra.
• Excellent understanding and knowledge of job workflow scheduling and
locking tools/services like Oozie and Zookeeper.
• Experienced in designing different time driven and data driven automated workflows
using Oozie.
• Knowledge of ETL methods for data extraction, transformation and loading in
corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
• Experience in configuring the Zookeeper to coordinate the servers in clusters and
to maintain the data consistency.
• Experience in importing and exporting the data using Sqoop from HDFS to Relational
Database systems and vice-versa.
• Adroit skill on organizing Microsoft Power Bi, Tableau.
• Good Knowledge in UNIX Shell Scripting for automating deployments and other routine
tasks.
• Experience in relational databases like Oracle, MySQL and SQL Server.
• Experienced in using Integrated Development environments like Eclipse, IntelliJ.
• Used various Project Management services like JIRA, for tracking issues, bugs related
to code and GitHub for various code reviews and worked on various version control
tools like GIT, Bitbucket.
• Experienced in working in SDLC, Agile and Waterfall Methodologies.
• Excellent Communication skills, Interpersonal skills, problem solving skills and a
team player. Ability to quickly adapt to new environments and technologies.
• Good understanding of Scrum methodologies, Test Driven Development and
continuous integration.
• Major strengths are familiarity with multiple software systems, ability to learn quickly new
technologies, adapt to new environments, self-motivated, team player, focused adaptive and
quick learner with excellent interpersonal, technical and communication skills.
• Experience in defining detailed application software test plans, including
organization, participant, schedule, test and application coverage scope.
• Experience in gathering and defining functional and user interface requirements for
software applications.

PROFESSIONAL EXPERIENCE

Oppenheimer
New York, New York
Job Title: Big Data Engineer November 2020– Continuing

Responsibilities
• Create Pipeline and maintain it through Azure Databricks, Azure Data Factory(ADF),
Delta Live Table with the help of Databricks Notebook. Create ADF data flow using
CDC process. Hands-on-experience with Snowflake utilities, SnowPipe, SnowSQL ,Big
data techniques using python/spark/scala.
•Monitor jobs in Azure Databricks pipelines, if any jobs fail,try to figure out and based
on an action plan solve these issues. Developing business logic/transformation
using Spark(python/scala/java) in Databricks Notebooks.
• Develop Sql queries for the analysts. Maintain Orchestration in the ADF and
Databricks pipeline .Strong experience in migrating other databases to
Snowflake and experience in using Snowflake Clone , Time Travel and building
SnowPipe
• Involved in converting SQL queries into Spark transformations using Spark Dataframe
• Configured big data workflows to run on top of spark using ADF scheduler .
• Load and transform data into ADLS Gen2 from a large set of structured data
/Oracle/SQL server using Azure Data Factory(ADF), Azure Synapse, Databricks.
• Worked with PySpark, improving the performance and optimization of the
existing applications running on Azure Portal.
• Worked with parquet data format for faster transforming data to optimize query
performance for ADF, Databricks.
• Worked with different File Formats like TEXT FILE, AVROFILE, ORC and
PARQUET for Spark/sql querying and processing.
• Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml
based upon the job requirement.
• Analyzing the metadata as per the requirements.
• Loaded the data in ADLS Gen2 Storage from the local system/Cloud
• Writing the Python/Spark scripts for processing the data.
• Loaded data from csv files to spark, created data frames and queried data using Spark
SQL.
• Involved in designing Avro schemas for serialization and Converting JSON data
to Parquet file format.
• Used many features like Parallelize, Partitioned, Caching (both in-memory and disk
Serialization), Kryo Serialization, etc. Implemented Spark using Scala/python and
Sparks for faster testing and processing of data
• Provide inputs on long term strategic vision and direction for the data delivery
infrastructure ,including Microsoft BI stack implementations and Azure
Advanced Data Analytic Solutions
• Evaluate existing data platforms and apply technical expertise to create a data
modernization roadmap and architect solutions to meet business and IT
needs.
• Utilized ADF, Databricks ,Synapse to process spark jobs and blob storage services to
process data.
• Make data fabrics to simplify and integrate data management across cloud and
on premises to accelerate digital transformation.
Environment: Azure Data Factory, Azure Synapse , Azure Databricks ,
Snowflake, ADLS Gen2, Azure Sql server, Microsoft Power Bi, Delta
Live structure, Kafka,
Spark(Python/Scala), Pandas, Avro, Parquet, Oracle Database, Linux

Health First
New York, New York
Job Title: Hadoop Developer June 2015 to October 2020
Responsibilities
• Implemented and maintained the monitoring and alerting of production and
corporate servers/storage using AWS Services(EMR,S3,dynamodb etc.)
• Developed python scripts to transform the raw data into intelligent data as specified
by business users.
• Worked in AWS environment for development and deployment of Custom
Hadoop Applications.
• Worked closely with the data modelers to model the new incoming data sets.
• Involved in the start to end process of Hadoop jobs that used various technologies such
as Sqoop, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs).
• Expertise in designing
and deployment of Hadoop cluster and different Big Data analytic tools including Hive,
Oozie, Autosys, Sqoop, Spark, Hive, Cloudera.
• Involved in creating tables, and loading data and writing queries .
• Assisted in upgrading, configuration and maintenance of various Hadoop
infrastructures like Hive,Spark and HBase.
• Exploring with Spark improves the performance and optimization of the existing
algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Spark YARN.
• Developed Spark code using Python/Scala and Spark-SQL/Streaming for faster
testing and processing of data. Configured, deployed and maintained multi-node Dev
and Test.
• Performed transformations, cleaning and filtering on imported data using Hive,
Map Reduce, and loaded final data into HDFS/S3 bucket.
• Worked on tuning Hive to improve performance and solve performance related issues
in Hive with a good understanding of Joins, Group and aggregation and how it does
Map Reduce jobs .
• Developed Spark code using Scala and Spark-SQL/Streaming for faster testing
and processing of data.
• Import the data from different sources like HDFS/S3 into Spark
Dataframe.
• Developed a data pipeline using CICD with Jenkins to store data into
HDFS. Performed real time analysis on the incoming data.
• Used Spark Streaming to divide streaming data into batches as an input to Spark engine
for batch processing.
• Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment:Apache Hadoop, HDFS, MapReduce, Autosys, Sqoop, Jenkins,


Spark,Hive, Hbase, Oozie, Python, Linux. Jupyter Notebook ,Tableau.

Education: Bachelor in Statistics from University of Chittagong-


2012 Masters in Statistics from University of
Chittagong-2014
Certification: 1.Databricks Certified Associate Developer for Apache Spark 3.0
- Python(In progress)
2.Python for financial Analysis-Udemy
3. Python for Data Science-Udemy

You might also like