Ajay Kadiyala Resume 2023 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Ajay Kadiyala

: Bengaluru, 560029, INDIA, IN,

📱: +91-9542380365, 📧: [email protected],

: https://www.linkedin.com/in/ajay026/,

: https://github.com/Ajay026.

Profile Summary:
• Over 5+ Years of overall IT experience in Application Development.
• Working experience in Hadoop ecosystem (Gen-1 and Gen-2) and its various components such as
HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN), Application master
Node manager.
• Experience with components such as Cloudera distribution encompassing components like
MapReduce, Spark, SQL, Hive, HBase, Sqoop, Pyspark.
• Good skills on NoSQL Database- Cassandra.
• Proficient in developing Hive scripts for various business requirements.
• Knowledge in Data Warehousing Concepts in OLTP/OLAP System Analysis and developing Database
Schemas like Star Schema and Snowflake Schema for Relational and Dimensional Modelling.
• Good hands on in creating custom UDF’s in Hive.
• Load and transform large sets of structured, semi-structured and unstructured data from Relational
Database Systems to HDFS and vice-versa using Sqoop tool.
• Working knowledge on Hive UDF and various joins.
• Good Experience on architecture and components of Spark, and efficient in working with Spark Core,
Data Frames/Data Sets/RDD API/Spark SQL, Spark streaming and expertise in building PySpark and
Spark-Scala applications for interactive analysis, batch processing and stream processing.
• Hands-on experience in Spark, Scala, SparkSQL, Hive Context for Data Processing.
• Working knowledge on GCP tools like Cloud Function, Dataproc, Big Query.
• Experience on Azure cloud i.e., ADF, ADLS, Blob Storage, Databricks, Synapse etc.
• Extensive working experience in an Agile development Methodology & Working knowledge on Linux.
• Expertise in working with big data distributions like Cloudera and Hortonworks.
• Automated data pipelines using streams & tasks Involved in loading the structured and semi
structured data into spark clusters using Spark SQL and Data Frames Application programming interface
(API).
• Experience in working with Hive data warehouse tool-creating tables, distributing data by doing static
partitioning and dynamic partitioning, bucketing, and using Hive optimization techniques.
• Experience in tuning and debugging Spark application and using Spark optimization techniques.
• Knowledge on architecture and components of Spark and demonstrated efficiency in optimizing and
tuning compute and memory for performance and price optimization.• Expertise in developing batch
data processing applications using Spark, Hive and Sqoop.
• Experience in working with CSV, JSON, XML, ORC, Avro and Parquet file formats.
• Good experience in creating and designing data ingest pipelines using technologies such as Apache
Kafka.
• Worked on most of the popular AWS stack like S3, EC2, EMR, Athena.
• Good knowledge in working with ETL methods for data extraction, transformation and loading in
corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
• Basic Experience in implementing Snowflake Data Warehouse.
• Experience in working with version control systems like Git, GitHub, CI/CD pipelines.

TECHNICAL SKILLS:
• Big Data Technologies: Hadoop, Spark, Hive, Sqoop, Kafka, PySpark, HBase, Scala Spark &
Snowflake Basic,
• Cloud Technologies: Azure(Azure Storage, Azure Synapse, ADF, Azure Data Bricks), GCP(Bigquery,
Dataproc), AWS basics
• Languages: Scala, Python, SQL
• Databases and Tools: Oracle, MySQL, SQL, NoSQL.
• Platforms: Windows, Linux.
• IDEs: Eclipse, Cloudera, Hortonworks.
• Scheduling: Airflow.
• Project Management Tools: Jira, GitHub.

Certifications:
• Completed Microsoft Azure-Fundamentals (AZ-900).
• Completed Microsoft Azure Data Fundamentals (DP-900)
• Completed Microsoft Azure Data Scientist Associate (D100).
• Completed Microsoft Azure Power BI Data Analyst Associate (PL-300).
• Completed Microsoft Azure Data Engineer Associate (DP-203).
• GCP Associate Cloud Engineer.

PROFESSIONAL EXPERIENCE:

Price Waterhouse Coopers Pvt Ltd: Oct 2021 – Till date


Project : EDP Migration
Remote
June 2022 – Present
Role: Big data Consultant
Reponsibliities:
• Responsible for building scalable distributed data solutions using Spark.
• Ingested log files from source servers into HDFS data lakes using Sqoop.
• Developed Sqoop Jobs to ingest customer and product data into HDFS data lakes.
• Developed Spark streaming applications to ingest transactional data from Kafka topics into
Cassandra tables in near real time.
• Developed an spark application to flatten the transactional data coming from using various
dimensional tables and persist on Cassandra tables.
• Involved in developing framework for metadata management on HDFS data lakes.
• Worked on various hive optimizations like partitioning, bucketing, vectorization, indexing and
using right type of hive joins like Bucket Map Join and SMB join.
• Worked with various files format like CSV, JSON, ORC, AVRO and Parquet.
• Developed HQL scripts to create external tables and analyze incoming and intermediate data
for analytics applications in Hive.
• Optimized spark jobs using various optimization techniques like broadcasting, executor
tuning, persisting etc.
• Responsible for developing custom UDFs, UDAFs and UDTFs in Hive.
• Analyze the tweets json data using hive SerDe API to deserialize and convert into readable
format.
• Orchestrating Hadoop and Spark jobs using Oozie workflow to create dependency of jobs and
run multiple Jobs in sequence for processing data.
• Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

• Tech stack: Scala, Hadoop, Spark, Spark SQL, Spark Streaming, Hive, Cassandra, MySQL,
HDFS,Apache Kafka.

Client : Vodafone(VFQ)
Remote
Oct 2021 – June 2022

Project Description:
Provides a 360 degree view of the customer so that a Salesperson is well aware of all the
facts when talking to customer. This gives a much better chance to close the deal.

This involves building a data lake. Data sources use Hadoop tools to transfer data to and from
HDFS and some of the sources, were imported using sqoop, Then storing the raw data into
HIVE tables in ORC format in order to facilitate the data scientists to perform analytics using
HIVE. New use cases were developed and dumped into a NOSQL database (Hbase) for
further analytics.

Responsibilities:
• Developed SQOOP scripts to import the source data from Oracle database into HDFS for
further processing.
• Developed HIVE Script to store raw data in ORC format.
• Involved in gathering requirements, designing, development and testing.
• Generated reports using Hive for business requirements received on ADHOC basis.

• Environment: Cloudera CDH 5.4.4

• Tech Stack: Hadoop, HDFS, Hive, Sqoop, Hbase.

Accenture Solutions Pvt Ltd: Feb 2019 – Oct 2021


Client : Bank of America

Project Description:
The Project is about to handle Risk Management Team, where the Bank wanted to store,
process manage the huge amount of data in a day-to-day operation, collected from various
sources. The system Majority checks the credibility of the customer looks for the credit risks.

Responsibilities:
• Ingested data from multiple sources like MySQL.
• Created and worked on Sqoop jobs with incremental load.
• Design both managed & External tables in Hive.
• Developed Spark Code in Scala using Spark SQL & Data frames for optimization.
• Creating HBase layer for faster reporting.

• Tech Stack: HDFS · Apache Sqoop · HBase · Apache Spark · Hive · Hadoop

D-VOIS Communications Pvt Ltd: Nov 2017 – Feb 2019


Responsibilities:
• Analyzed data using Hadoop components Hive and Pig Queries, HBase queries.
• Load and transform large sets of structured, semi structured, and unstructured data using
Hadoop/Big Data concepts.
• Involved in loading data from the UNIX file system to HDFS.
• Responsible for creating Hive tables, loading data, and writing hive queries.
• Handled importing data from various data sources, performed transformations using Hive,
Map Reduce/Apache Spark, and loaded data into HDFS.
• Extracted the data from Oracle Database into HDFS using the Sqoop.
• Loaded data from Web servers and Teradata using Sqoop, Spark Streaming API.
• Utilized Spark Streaming API to stream data from various sources. Optimized existing Scala
code and improved the cluster performance.
• Experience in working with Spark applications like batch interval time, level of parallelism,
memory tuning to improve the processing time and efficiency.
• Tech Stack: Hadoop 2.x · HDFS · spark SQL · Eclipse · Apache Kafka · Apache Sqoop · Apache
Spark · Hive · Linux

Freelancing

Project 1: Migration
Remote
April 2022 – June 2022
Role: Data Engineer

Responsibilities:
• Working with Structured data that is being ingested into Azure File storage explorer.
• Create ETL pipeline in Snap logic tool to bring the data into azure Databricks workspace.
• Applied transformation logic including, Spark sql, pyspark operations on data.
• Applied optimizations logics i.e., Partitioning, broadcast joins etc.
• Create ETL pipeline on Databricks transformed data to dump the target directory called
snowflake.
• Analyze the resultant data with data bricks tool.

• Tech Stack: ETL, Databricks, azure, Snap logic, Salesforce.

Project 2: Finance Migration


Remote
Aug 2022 – Dec 2022
Role: Data Engineer

Responsibilities:
• Loaded and transformed large sets of structured and semi structured data.
• Imported data using Sqoop into Hive and Hbase from existing SQL Server.
• Support code/design analysis, strategy development and project planning.
• Create reports for the BI team using Sqoop to export data into HDFS and Hive.
• Involve in Requirement Analysis, Design, and Development.
• Export and Import data into HDFS and Hive using Sqoop.
• Storing the data to HBase tables according to business requirements.
• Creating Hive Tables using advances concepts in Hive like bucketing, partitioning, UDF’s.

• Tech Stack: Hadoop Framework, HDFS, Spark, Spark SQL, Hive, Sqoop.
Side Projects:

• Log Analytics Project with Spark Streaming and Kafka.


• Retail Analytics Project Example using Sqoop, HDFS, and Hive.
• Data Processing and Transformation in Hive using Azure VM.
• Learn Data Processing with Spark SQL using Scala on AWS.
• Streaming Data Pipeline using Spark, HBase and Phoenix.
• Hive Mini Project to Build a Data Warehouse for e-Commerce.
• Analyze Yelp Dataset with Spark & Parquet Format on Azure Databricks.
• Build an Azure Recommendation Engine on Movie lens Dataset.
• Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive.
• Snowflake Azure Project to build real-time Twitter feed dashboard.
• SQL Project for Data Analysis using Oracle Database1-7.
• PySpark Project to Learn Advanced Data Frame Concepts.
• PySpark Project for Beginners to Learn Data Frame Operations.
• Hands-On Real Time PySpark Project for Beginners.

• Link to Repository: https://github.com/Ajay026

Education:
• B Tech (Electronics & Communication) Siddhartha Institute of Engineering and Technology,
Puttur (A.P) India, 2017 with First Division marks 60%.
• Diploma in Electronics and Communication Engineering in Govt Polytechnic college,
Chandragiri (A.P) India, 2014 with 70%

You might also like