0% found this document useful (0 votes)
53 views5 pages

Data Engineer Rithick Bisher

Rithick Bisher is a Data Engineer with over 9 years of experience in Big Data technologies, proficient in tools such as Hadoop, Spark, and various programming languages including Scala, Java, and Python. He has extensive experience in developing and deploying applications in the Hadoop ecosystem, ETL processes, and machine learning algorithms, with a strong background in data architecture and analytics. His recent roles include providing architectural leadership and developing predictive models at Centene Corporation and Chewy, along with optimizing data processing systems at UBS.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views5 pages

Data Engineer Rithick Bisher

Rithick Bisher is a Data Engineer with over 9 years of experience in Big Data technologies, proficient in tools such as Hadoop, Spark, and various programming languages including Scala, Java, and Python. He has extensive experience in developing and deploying applications in the Hadoop ecosystem, ETL processes, and machine learning algorithms, with a strong background in data architecture and analytics. His recent roles include providing architectural leadership and developing predictive models at Centene Corporation and Chewy, along with optimizing data processing systems at UBS.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

RITHICK BISHER

Email: [email protected] PH: 901-492-1051


Data Engineer
PROFESSIONAL SUMMARY:
 9+ years of IT experience in a variety of industries working on Big Data technology using technologies such as
Cloudera and Hortonworks distributions. Hadoop working environment includes Hadoop, Spark, MapReduce,
Kafka, Hive, Ambari, Sqoop, HBase, and Impala.
 Fluent programming experience with Scala, Java, Python, SQL, T - SQL, R.
 Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem
components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL,
Kafka.
 Adept at configuring and installing Hadoop/Spark Ecosystem Components.
 Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark Streaming for processing and
transforming complex data using in-memory computing capabilities written in Scala. Worked with Spark to
improve efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair
RDD’s and Spark YARN.
 Experience in application of various data sources like Oracle SE2 , SQL Server , Flat Files and Unstructured files
into a data warehouse.
 Able to use Sqoop to migrate data between RDBMS, NoSQL databases and HDFS.
 Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as
well as data processing like collecting, aggregating and moving data from various sources using Apache Flume,
Kafka, PowerBI and Microsoft SSIS.
 Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job
Tracker, Task Tracker, Name Node, Data Node and Hadoop MapReduce programming.
 Comprehensive experience in developing simple to complex Map reduce and Streaming jobs using Scala and Java
for data cleansing, filtering and data aggregation. Also possess detailed knowledge of MapReduce framework.
 Used IDEs like Eclipse, IntelliJ IDE, PyCharm IDE, Notepad ++, and Visual Studio for development.
 Seasoned practice in Machine Learning algorithms and Predictive Modeling such as Linear Regression, Logistic
Regression, Bayes, Decision Tree, Random Forest, KNN, Neural Networks, and K-means Clustering.
 Ample knowledge of data architecture including data ingestion pipeline design, Hadoop/Spark architecture, data
modeling, data mining, machine learning and advanced data processing.
 Experience working with NoSQL databases like Cassandra and HBase and developed real-time read/write access
to very large datasets via HBase.
 Developed Spark Applications that can handle data from various RDBMS (MySQL, Oracle Database) and
Streaming sources.
 Proficient SQL experience in querying, data extraction/transformations and developing queries for a wide range
of applications.
 Capable of processing large sets (Gigabytes) of structured, semi-structured or unstructured data.
 Experience in analyzing data using HiveQL, Pig, HBase and custom MapReduce programs in Java 8.
 Experience working with GitHub/Git 2.12 source and version control systems.
 Strong in core Java concepts including Object-Oriented Design (OOD) and Java components like Collections
Framework, Exception handling, I/O system.

TECHNICAL SKILLS:
Languages: Cluster Management & Monitoring Python 3.7.0+, Java 1.8, Scala 2.11.8+, SQL, Cloudera Manager 6.0.0+,
Hortonworks Ambari TSQL, R 3.5.0+, C++, C, MATLAB. 2.6.0+, CloudxLab.
Hadoop Ecosystem: Database Hadoop 2.8.4+, Spark 2.0.0+, MapReduce\ MySQL 5.X, SQL Server Oracle 11g, HBase
HDFS, Kafka 0.11.0.1+, Hive 2.1.0+, HBase \ 1.2.3+, Cassandra 3.11. 1.4.4 +, Sqoop 1.99.7+, Pig 0.17, Flume 1.6.0+,
Keras 2.2.4.
Visualization: Virtualization PowerBI, Oracle BI, Tableau 10.0+. \ VM ware workstation, AWS.\
Operating Systems: Markup Languages Linux, Windows, Ubuntu. \ HTML5, CSS3, JavaScript. \
Other Tools: IDE Jupyter Notebook, KNIME, MS SSMS, Putty, \ Eclipse, GitHub, PyCharm, Maven, IntelliJ, WinSCP,
MS Office 365, Sage Math, SEED \ RStudio, Visual Studio.\ Ubuntu, TensorFlow, NumPy.

PROFESSIONAL EXPERIENCE:
Centene Corporation St Louis, Missouri February 2024 to Present
Sr. Data Engineer
Responsibilities:
 Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on
application architecture.
 Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of
large, business technology programs.
 Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and
visualization and performed Gap analysis.
 Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
Implemented a Python-based distributed random forest via Python streaming.
 Migrated the Application on to AWS Cloud.
 Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services
within the enterprise data architecture (conceptual data model for defining the major subject areas used,
ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem
canonical model for defining the standard messages and formats to be used in data integration services throughout
the ecosystem).
 Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine
learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression,
naive Bayes, Random Forests, K-means, &KNN for data analysis.
 Conducted studies, rapid plots and using advanced data mining and statistical modeling techniques to build a
solution that optimizes the quality and performance of data.
 Demonstrated experience in the design and implementation of Statistical models, Predictive models, enterprise
data model, metadata solution and data lifecycle management in both RDBMS, Big Data environments.
 Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an
existing Oracle database into a new PostgreSQL cluster.
 Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and
developing and enhancing statistical models by leveraging best-in-class modelling techniques.
 Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and
De-normalization of the database.
 Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
 Leveraged ETL methods for ETL solutions and data warehouse tools for reporting and analysis.
 Used CSV Excel Storage to parse with different delimiters in PIG.
 Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of
Spark using Scala.
 Developed multiple MapReduce jobs in java to clean datasets.
 Developed code to write canonical model JSON records from numerous input sources to Kafka queues.
 Worked on customer segmentation using an unsupervised learning technique - clustering.
 Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi-Load, ARC, Teradata
Administrator, BTEQ and other Teradata Utilities.
 Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine
learning methods including classifications, regressions, dimensionally reduction etc.
 Developed Linux Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to the Netezza
database.
 Designed and implemented system architecture for AmazonEC2 based cloud-hosted solution for the client.
 Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data
from source flat files and RDBMS tables to Confidential tables.
Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, Informatica, Java, SSRS, PL/SQL, T-SQL,
Tableau, MLLib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive,
Teradata0, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML,
Cassandra, MapReduce, AWS.

Chewy Dania Beach, FL April 2021 to January 2024


Sr. Data Engineer
RESPONSIBILITIES:
 Supported MapReduce Programs running on the cluster.
 Evaluated business requirements and prepared detailed specifications that follow project guidelines required to
develop written programs.
 Configured Hadoop cluster with Name node and slaves and formatted HDFS.
 Used Oozie workflow engine to run multiple Hive and Pig jobs.
 Performed Map Reduce Programs those are running on the cluster.
 Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
 Analyzed the partitioned and bucketed data and compute various metrics for reporting.
 Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
 Worked on loading the data from MySQL to HBase where necessary using Sqoop.
 Developed Hive queries for Analysis across different banners.
 Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to
database.
 Launching AmazonEC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched
instances with respect to specific applications.
 Exported the result set from Hive to MySQL using Sqoop after processing the data.
 Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
 Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
 Used Hive to partition and bucket data.
 Fetched live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
 Implemented Apache PIG scripts to load data from and to store data into Hive.
 Experience in writing MapReduce programs with JavaAPI to cleanse Structured and unstructured data.
 Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
 Created HBase tables to store various data formats of data coming from different portfolios.
 Worked on improving performance of existing Pig and Hive Queries.
Environment: SQL/Server, Oracle 9i, MS-Office, Apache, Teradata, Informatica, ER Studio, XML, Business Objects.

UBS weehawken, NJ January 2019 to March 2021


Data Engineer
Responsibilities:
 Member of the Business intelligence team, responsible for designing and optimizing systems.
 Optimized system that processes the ~500GB of logs generated by the Nexmo API platform every day and
loading it into the data warehouse.
 Designed and implemented data loading and aggregation frameworks and jobs that will be able to handle
hundreds of GBs of json files, using Spark, Airflow and Snowflake.
 Built tools using Tableau to allow internal and external teams to visualize and extract insights from big data
platforms.
 Responsible for expanding and optimizing data and data pipeline architecture, as well as optimizing data flow and
collection for cross functional teams.
 Build best practice ETLs with Apache Spark to load and transform raw data into easy to use dimensional data for
self-service reporting.
 Improved the deployment and testing infrastructure within AWS, using tools like Jenkins, Puppet and Docker.
 Work closely with the Product, Infrastructure and Core teams, to make sure data needs are considered during
product development and to guide decisions related to data.
Environment: Scala 2.13, Spark 2.4, Spark SQL, Kafka 2.3.0, Apache Airflow 1.10.4, Snowflake, AWS (Redshift,
Jenkins, Docker), Tableau 2019.2

Data Engineer
Careator Technologies Pvt Ltd Hyderabad, India September 2016 to October 2018
Responsibilities:
 Extensively involved in installation and configuration of Cloudera Distribution Hadoop platform.
 Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.)
with Data Frames in Spark.
 Utilized SparkSQL to extract and process data by parsing using Datasets or RDDs in Hive Context, with
transformations and actions (map, flat Map, filter, reduce, reduce By Key).
 Extended the capabilities of Data Frames using User Defined Functions in and Scala.
 Resolved missing fields in Data Frame rows using filtering and imputation.
 Integrated visualizations into a Spark application using Databricks and popular visualization libraries (ggplot,
MatPlotLib).
 Trained analytical models with Spark ML estimators including linear regression, decision trees, logistic
regression, and k-means.
 Performed pre-processing on a dataset prior to training, including standardization, normalization.
 Created pipelines to create a processing pipeline including transformations, estimations, evaluation of analytical
models.
 Evaluated model accuracy by dividing data into training and test datasets and computing metrics using evaluators.
 Tuned training hyper-parameters by integrating cross-validation into pipelines.
 Computed using Spark MLlib functionality that wasn’t present in SparkML by converting DataFrames to RDDs
and applying RDD transformations and actions.
 Troubleshot and tuned machine learning algorithms in Spark.
Environment: Spark 2.0.0, Spark MLlib, Spark ML, Hive 2.1.0, Sqoop 1.99.7, Flume 1.6.0, HBase 1.2.3, MySQL 5.1.73,
Scala 2.11.8, Shell Scripting, Tableau 10.0, Agile

Spark Developer
Brio Technologies Private Limited Hyd India December 2014 to August 2016
Responsibilities:
 Imported required modules such as Keras and NumPy on Spark session, also created directories for data and
output.
 Read train and test data into the data directory as well as into Spark variables for easy access and proceeded to
train the data based on a sample submission.
 The images upon being displayed are represented as NumPy arrays, for easier data manipulation all the images
are stored as NumPy arrays.
 Created a validation set using Keras2DML in order to test whether the trained model was working as intended or
not.
 Defined multiple helper functions that are used while running the neural network in session. Also defined
placeholders and number of neurons in each layer.
 Created neural networks computational graph after defining weights and biases.
 Created a TensorFlow session which is used to run the neural network as well as validate the accuracy of the
model on the validation set.
 After executing the program and achieving an acceptable validation accuracy a submission was created that is
stored in the submission directory.
 Executed multiple SparkSQL queries after forming the Database to gather specific data corresponding to an
image.
Environment: Scala 2.12.8, Python 3.7.2, PySpark, Spark 2.4, Spark ML Lib, Spark SQL, TensorFlow 1.9, NumPy
1.15.2, Keras 2.2.4, PowerBI

You might also like