Sreeja Big Data Resume
Sreeja Big Data Resume
Sreeja Big Data Resume
Hadoop Developer
Phone: (512) 982-1932 Email:[email protected].
Professional Summary:
More than8 years of overall IT experience and 4 Years of comprehensive experience as Apache Hadoop
Developer. Expertise in writing Hadoop Jobs for analyzing data using Hive, Pig and Oozie.
Experience in building, maintaining multiple HADOOP clusters (prod, dev etc.,) of different sizes and
configuration and setting up the rack topology for large clusters.
Strong experience using Hadoop Ecosystem tools including Pig, Hive, HBase, Sqoop, Flume, Kafka,
Oozie, Zookeeper, Spark, Scala and Storm.
Hands on using job scheduling and monitoring tools like Kafka, Oozie and Zookeeper.
Good Experience in core and advanced java concepts.
Extensive experience with ETL and Query big data tools like Pig Latin and Hive QL.
Hands on experience in data ingestion tools like Flume and Sqoop.
Involved in writing shell scripts, Ant scripts for Unix OS for application deployments to production region.
Strong debugging and problem solving skills with excellent understanding of system development
methodologies, techniques and tools.
Implemented Proofs of Concept on Hadoop stack and different big data analytic tools.
Experience in migration from different databases (i.e. VSAM, DB2, PLSQL and MYSQL) to Hadoop.
Experience in NoSQL databases like HBase, Cassandra, Redis and MongoDB and Data Modeling
techniques.
Experience in data management and implementation of Big Data applications using HADOOP frameworks.
Good knowledge of spark and spark SQL.
Extensive database experience using SQL Server, Stored Procedures, Cursors, Constraints and Triggers.
Experience in designing, sizing and configuring Hadoop environments.
Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
Worked with application teams to install operating system, patches and version upgrades as required.
Proficiency with the application servers like Web Sphere, WebLogic, JBOSS and Tomcat.
Extensive experience with SQL, PL/SQL and database concepts.
Expertise in debugging, optimizing Oracle, performance tuning with strong knowledge in Oracle 11g and
SQL.
Good experience working with Distributions such as MAPR, Horton works and Cloudera.
Experience in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development,
Testing and Implementation of Enterprise level Data mart and Data warehouses.
Extensive experience in documenting requirements, functional specifications, technical specifications.
Highly motivated, adaptive and quick learner.
Defined and Developed ETL process to automate the data conversions, catalog uploading, error handling
and auditing using Talend.
Experience in Amazon AWS cloud services (EC2, EBS, S3, SQS).
Holds strong ability to handle multiple priorities and work load and also has ability to understand and adapt
to new technologies and environments faster.
Experience in migration from different databases (i.e. VSAM, DB2, PLSQL and MYSQL) to Hadoop.
Experience in NoSQL databases like HBase, Cassandra, Redis and MongoDB.
Experience in working with HP Profile & Project Management.
Technical Skills:
Professional Experience:
Key Achievements:
Worked on analyzing Hadoop stack and different big data analytic tools including Kafka, Storm, Hive, Pig,
HBase database and Sqoop.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from
relational databases into HDFS using Sqoop imports.
Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the
customer, transaction data by date.
Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
Developed Spark scripts by using Scala Shell commands as per the requirement.
Developed and implemented core API services using Scala and Spark.
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO,
PARQUET, JSON, CSV formats.
Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
Involved in performance of troubleshooting and tuning Hadoop clusters.
Created Hive tables, loaded data and wrote Hive queries that run within the map.
Implemented business logic by writing Hive UDFs in Java.
Sreeja.T Sr.Hadoop Developer
Phone: (512) 982-1932 Email:[email protected].
Developed Shell scripts and some of Perl scripts based on the user requirement.
Wrote XML scripts to build OOZIE functionality.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
Extensively worked on creating End-End data pipeline orchestration using Oozie.
Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with
various Proof of Concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop
initiative.
Environment: Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Storm, Oozie, Shell scripting, Cron Jobs, Perl
scripting, Apache Kafka, J2EE.
Key Achievements:
Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Involved in loading and transforming large sets of structured, semi structured and unstructureddata from
relational databases into HDFS using Sqoop imports.
Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the
customer, transaction data by date.
Defined and developed ETL process to Extract-Transform-Load various source format files in to Standard
templates and databases using Talend.
Process Flat files to load data from source systems to targets using direct and indirect methods.
Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
Developed Spark scripts by using Scala Shell commands as per the requirement.
Experience in importing the real time data to hadoop using Kafka and implemented the Oozie job
Developed and implemented core API services using Scala and Spark.
Performed complex data transformations in Spark using Scala.
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO,
PARQUET, JSON, CSV formats.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Installing, Upgrading and Managing Hadoop Clusters
Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
Created Hive tables, loaded data and wrote Hive queries that run within the map.
Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other
sources.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
Extensively worked on creating End-End data pipeline orchestration using Oozie.
Processed the source data to structured data and store in NoSQL database Cassandra.
Created alter, insert and delete queries involving lists, sets and maps in DataStaxCassandra.
Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra
through Java services.
Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
Sreeja.T Sr.Hadoop Developer
Phone: (512) 982-1932 Email:[email protected].
Environment: Map Reduce, HDFS, Spark, Hive, Pig, HBase, SQL, Sqoop, Flume, Oozie, Apache Kafka,
Zookeeper, J2EE, Eclipse, Cassandra.
Key Achievements:
Responsible for loading customer's data and event logs into HBase using Java API.
Created HBase tables to store variable data formats of input data coming from different portfolios
Involved in adding huge volumes of data in rows and columns to store data in HBase.
Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
Involved in initiating and successfully completing Proof of Concept on FLUME for Pre-Processing, Increased
Reliability and Ease of Scalability over traditional MSMQ.
Use Flume to collect the log data from different resources and transfer the data type to hive tables using
different SerDe'sto store in JSON, XML and Sequence file formats.
Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build
risk profile for such sites.
End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data
sets.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive
queries and Pig Scripts.
Created User accounts and given the users the access to the Hadoop Cluster
Implemented the secure authentication for the Hadoop Cluster using Kerberos Authentication protocol.
Developed Pig Latinscripts to extract the data from the web server output files to load into HDFS.
Developed the Pig UDF'S to pre-process the data for analysis.
Experience in working with various kinds of data sources such as MongoDB Solar and Oracle.
Successfully loaded files to Hive and HDFS from Mongo DB Solr.
Experience in managing development time, bug tracking, project releases, development speed, release forecast,
scheduling and many more. Using a custom framework of Nodes and MongoDBto take care of the back-end
calls with a lightning fast speed.
Monitored Hadoop cluster job performance and performed capacity planning and managed nodes on
Hadoop cluster.
Responsible for using Cloudera Manager, an end to end tool to manage Hadoop operations.
Environment: Hadoop, Big Data, HDFS, Pig, Hive, MapReduce, Spark,Sqoop, Cloudera manager, LINUX,
CDH4, FLUME, HBase, Pig, Hive.
Key Achievements:
Installed and configured HadoopMapReduce, HDFS, Developed multiple MapReducejobs in java for data
cleaning and preprocessing.
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and
Sqoop.
Responsible for building scalable distributed data solutions using Hadoop.
Created HBase Tables to store variable data formats of PII data coming from different portfolios.
Sreeja.T Sr.Hadoop Developer
Phone: (512) 982-1932 Email:[email protected].
Implemented a script to transmit sysprin information from Oracle to HBase using Sqoop.
Implemented best income logic using Pig scripts and UDFs.
Implemented test scripts to support test driven development and continuous integration.
Worked on tuning the performance Pig queries.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as
required.
Responsible to manage data coming from different sources.
Involved in loading data from UNIX file system to HDFS.
Load and transform large sets of structured, semi structured and unstructured data.
Experienced in managing and reviewing Hadoop log files.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Tested MapReduce programs using MR unit.
Load and transform large sets of structured, semi structured and unstructured data that includes Avro, sequence
files and xml files.
Involved in loading data from UNIX file system to HDFS.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map
reduce way.
Environment: Hadoop, Big Data, HDFS, MapReduce, Sqoop, Oozie, Pig, Hive, Flume, LINUX, Java, Eclipse,
Cassandra.
Key Achievements:
Developed the application using Struts Framework that leverages classical Model View Layer (MVC)
Architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration)
and activity diagrams were used.
Gathered business requirements and wrote functional specifications and detailed design documents
Extensively used Core Java, Servlets, JSP and XML.
Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i
database.
Implemented Enterprise Logging service using JMS and Apache CXF.
Developed Unit Test Cases and used JUNIT for unit testing of the application
Implemented Framework Component to consume ELS service.
Implemented JMS producer and Consumer using Mule ESB.
Wrote SQL queries, stored procedures, and triggers to perform back-end database operations
Designed Low Level design documents for ELS Service.
Involved in Server Side coding using Spring MVC framework.
Applied design patterns including MVC Pattern, DAO Pattern and Singleton.
Involved in writing Spring configuration XML file that contains declarations and other dependent object
declaration.
Development of UI models using HTML5, JSP, JSF, JavaScript, JSON and CSS.
Closely worked with QA, Business and Architect to solve various Defects in quick and fast to meet deadlines.
Environment: Java , Spring core, JMS Web services, JMS, JDK, SVN, Maven, Mule ESB Mule, WAS7,
Ajax, SAX.
Sreeja.T Sr.Hadoop Developer
Phone: (512) 982-1932 Email:[email protected].
Key Achievements:
Environment: Java, Servlets, JSP, Hibernate, JUnit Testing, Oracle DB, SQL, Jasper Reports, iReport,
Maven, Jenkins.