Jayasree Yedlapally: Data Architecture Engineering - Senior
Jayasree Yedlapally: Data Architecture Engineering - Senior
Summary
12 years in the IT industry and is an extensively experienced, goal oriented consulting– spearheading
Big Data Architecture, Analytics, Data Science, Machine learning and Cloud Services. Extensive
development experience using Java and J2EE technologies.
Expertise in Architecting big data solutions using data ingestion, data preparation and data
storage.
Designed and develop real-time stream processing application-using spark, Kafka, Python and
apply machine learning.
Exploring with the spark for improving the performance and optimization of the existing
algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Spark YARN.
Writing Scala and Python code to run Spark jobs in DataStax cluster and Cloudera.
Developed software routines in Python, Spark, SQL to automate large datasets calculation and
aggregation
Expert in AmazonEMR, Spark, S3, Boto3, Athena and AWS wrangler.
Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using Sqoop, kafka, Spark,
Spark Structure Streaming, Cassandra.
Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying,
Processing and analysis of data.
Understanding of predictive modeling, NLP and text analysis, Machine Learning
Experience in AWS, implementing solutions using services like (EC2, S3)
Developed Spark streaming application to read data from Kafka topics and save data into
Cassandra.
Implemented continuous integration deployment (CICD) through bamboo for spark jobs and
deploy applications in Pivotal Cloud Foundry.
Have written the HTTP Streams and Cassandra Streams using custom receivers.
Experience in writing Junit test cases for spark jobs using mockito API.
Used the Spark metrics of DSE (User metric System) to measure the behavior of methods in
spark jobs in production environment.
TECHNICAL SKILLS:
Hadoop, HDFS, Hive, Sqoop, Spark, J2EE, Kafka, Cloudera, DataStax, NLP and Data science.
Languages: Python, Scala, Java.
Tools: Pivotal Cloud Foundry, EC2 GPU machine, AWS , Athena, Kafka, Bit bucket, Ozzie, Git,
AWS wrangler.
Splunk, Bamboo for CICD, Mockito
RDBMS / No-SQL: Cassandra, Oracle 11g, My SQL, PostgreSQL.
Functional Expertise:
Strategy, Planning, Scheduling, Development, Execution, Delivery, Engineering, Data Modelling,
Customer Engagement, Hadoop Platforms, Analytics.
Industry Expertise: Executed projects under various domains such Automobile, Health Care and
Retail.
Experience:
Education Qualification:
M.sc Computer science from Osmania University & B.sc Computers from Osmania University.
Work History :
Hitachi Vantara,
Proactive Maintenance Project : Client - Penske, Unites States May 2020 – Present
Predictive maintenance is the project to predict the vehicle failure problems using machine-learning
models. Vehicle Fault code, Event and Odometer real-time data read from Kafka topics and process the
data submit in spark jobs and save output data into PostgreSQL and AWS S3 bucket.
Responsible for normalizing, analyzing and process the vehicle raw data reading from the Kafka
consumer as a batch data into spark as Avro data format.
Process the Event and Odometer data after reading from Kafka topic, and read the metadata
from Athena and apply the filter conditions.
Run the Spark jobs into AmazonEMR and save the parquet-partitioned data into S3 location and
PostgreSQL database tables.
Implemented the multi-threading concept using ThreadPoolExecutor to call the rest API
endpoints.
Implemented the database connection-pool mechanism using SQLAlchemy. Configured the S3
and Athena connectivity using boto3 libraries. Configured the work group and s3 output location
path for Athena.
Worked on the Kafka consumer configuration, and read the messages of a set of partitions of a
topic in Avro and json formats from the Kafka consumer.
Read and write the parquet data to S3 bucket using AWS wrangler.
Deployment of data science model-build code into AWS EC2 GPU machine and saving model
binaries into S3 bucket.
Deployment of various python applications into pivotal cloud environment using Bamboo CICD
process.
Monitor the application health in Pivotal cloud environment and monitor log files.
Guided repair support Project: Client - Penske, Unites States April 2019 – April 2020
Penske guided repair support project is to build a models binaries using the linear model deep learning
(Tensor Flow) and load that model binaries in serving App (REST API) .
• Consume the data from Kafka using structured streams to spark in the JSON format and have done
the page hit, performance and error rate aggregations by using the 5 sec window and 10 sec slide
window.
• Written Cassandra custom sinks using Cassandra Sink Provider.
• Written UDF’s.
• Designed Schema structures for both raw and targeted tables.
• Implemented the Cassandra Streams, HTTP Streams (Experimental).
• Used the Spark SQL to perform the aggregations for both Errors, Page Hits and Page Load timings.
• To schedule the spark jobs used the triggers.
• Have written the custom receivers to stream data from external sources.
• Performance optimization to increase the speed of the jobs.
• Have done the RND to consume the data from pie Kafka, to set the consumer parameters.
• Once deploy the Spark Job in cluster, monitoring the logs from the splunk and do the fixes.
• Written Junit test cases using mockito for the jobs.
• Implemented the user metrics system, to monitor the metrics of the spark job.
• Implemented the Avro file format, using the schema store.
• Technologies: DataStax 6.0, Spark2.2.0, Pie Kafka, Splunk, Java, Scala, Cassandra, IntelliJ, Gradle
and GIT.
Crane Global Solutions Ltd
AL AIN Medical centre, Dubai Mar 2015 - May 2017
• Big Data engineer and Involved in data migration activity using Sqoop, load the data from oracle to
HDFS.
• Used Oozie to scheduling the workflows in spark jobs.
• Written the spark SQL to perform the aggregation.
• Created data model for structuring and storing the data efficiently. Implemented partitioning and
bucketing in Hive.
• Involved in creating Hive tables, loading the data and writing Hive queries
• Worked with parquet file formats.
• Implemented bucketing, partitioning and other query performance tuning techniques.
• Designed, documented standard operational procedures using confluence.
• Technologies: Cloudera, Spark, Scala, Hive, Sqoop, HDFS, Java, Oracle 11g/10g, Oozie, Eclipse,
Maven.