Dice Resume CV SN
Dice Resume CV SN
Dice Resume CV SN
Contact #718-674-4275
Email:[email protected]
Status: :Permanent Resident
New York, New York
PROFESSIONAL SUMMARY
PROFESSIONAL EXPERIENCE
Oppenheimer
New York, New York
Job Title: Big Data Engineer November 2020– Continuing
Responsibilities
• Create Pipeline and maintain it through Azure Databricks, Azure Data Factory(ADF),
Delta Live Table with the help of Databricks Notebook. Create ADF data flow using
CDC process. Hands-on-experience with Snowflake utilities, SnowPipe, SnowSQL ,Big
data techniques using python/spark/scala.
•Monitor jobs in Azure Databricks pipelines, if any jobs fail,try to figure out and based
on an action plan solve these issues. Developing business logic/transformation
using Spark(python/scala/java) in Databricks Notebooks.
• Develop Sql queries for the analysts. Maintain Orchestration in the ADF and
Databricks pipeline .Strong experience in migrating other databases to
Snowflake and experience in using Snowflake Clone , Time Travel and building
SnowPipe
• Involved in converting SQL queries into Spark transformations using Spark Dataframe
• Configured big data workflows to run on top of spark using ADF scheduler .
• Load and transform data into ADLS Gen2 from a large set of structured data
/Oracle/SQL server using Azure Data Factory(ADF), Azure Synapse, Databricks.
• Worked with PySpark, improving the performance and optimization of the
existing applications running on Azure Portal.
• Worked with parquet data format for faster transforming data to optimize query
performance for ADF, Databricks.
• Worked with different File Formats like TEXT FILE, AVROFILE, ORC and
PARQUET for Spark/sql querying and processing.
• Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml
based upon the job requirement.
• Analyzing the metadata as per the requirements.
• Loaded the data in ADLS Gen2 Storage from the local system/Cloud
• Writing the Python/Spark scripts for processing the data.
• Loaded data from csv files to spark, created data frames and queried data using Spark
SQL.
• Involved in designing Avro schemas for serialization and Converting JSON data
to Parquet file format.
• Used many features like Parallelize, Partitioned, Caching (both in-memory and disk
Serialization), Kryo Serialization, etc. Implemented Spark using Scala/python and
Sparks for faster testing and processing of data
• Provide inputs on long term strategic vision and direction for the data delivery
infrastructure ,including Microsoft BI stack implementations and Azure
Advanced Data Analytic Solutions
• Evaluate existing data platforms and apply technical expertise to create a data
modernization roadmap and architect solutions to meet business and IT
needs.
• Utilized ADF, Databricks ,Synapse to process spark jobs and blob storage services to
process data.
• Make data fabrics to simplify and integrate data management across cloud and
on premises to accelerate digital transformation.
Environment: Azure Data Factory, Azure Synapse , Azure Databricks ,
Snowflake, ADLS Gen2, Azure Sql server, Microsoft Power Bi, Delta
Live structure, Kafka,
Spark(Python/Scala), Pandas, Avro, Parquet, Oracle Database, Linux
Health First
New York, New York
Job Title: Hadoop Developer June 2015 to October 2020
Responsibilities
• Implemented and maintained the monitoring and alerting of production and
corporate servers/storage using AWS Services(EMR,S3,dynamodb etc.)
• Developed python scripts to transform the raw data into intelligent data as specified
by business users.
• Worked in AWS environment for development and deployment of Custom
Hadoop Applications.
• Worked closely with the data modelers to model the new incoming data sets.
• Involved in the start to end process of Hadoop jobs that used various technologies such
as Sqoop, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs).
• Expertise in designing
and deployment of Hadoop cluster and different Big Data analytic tools including Hive,
Oozie, Autosys, Sqoop, Spark, Hive, Cloudera.
• Involved in creating tables, and loading data and writing queries .
• Assisted in upgrading, configuration and maintenance of various Hadoop
infrastructures like Hive,Spark and HBase.
• Exploring with Spark improves the performance and optimization of the existing
algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Spark YARN.
• Developed Spark code using Python/Scala and Spark-SQL/Streaming for faster
testing and processing of data. Configured, deployed and maintained multi-node Dev
and Test.
• Performed transformations, cleaning and filtering on imported data using Hive,
Map Reduce, and loaded final data into HDFS/S3 bucket.
• Worked on tuning Hive to improve performance and solve performance related issues
in Hive with a good understanding of Joins, Group and aggregation and how it does
Map Reduce jobs .
• Developed Spark code using Scala and Spark-SQL/Streaming for faster testing
and processing of data.
• Import the data from different sources like HDFS/S3 into Spark
Dataframe.
• Developed a data pipeline using CICD with Jenkins to store data into
HDFS. Performed real time analysis on the incoming data.
• Used Spark Streaming to divide streaming data into batches as an input to Spark engine
for batch processing.
• Implemented Spark using Scala and SparkSQL for faster testing and processing of data.