0% found this document useful (0 votes)
93 views3 pages

Bigdata Hadoop: Fundamentals Hive

This document outlines a Big Data training course offered by TechGeest Solutions. The course covers fundamental Big Data concepts like Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Sqoop and HBase. It also covers more advanced topics like Spark, Scala, Spark SQL, Data Frames, Spark Streaming and Kafka. The course includes lectures, hands-on labs and real-world projects. Students will learn to install Hadoop clusters, run MapReduce jobs, load and query data using Hive and Spark SQL, build real-time streaming applications using Spark Streaming and Kafka, and more.

Uploaded by

Quantico Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views3 pages

Bigdata Hadoop: Fundamentals Hive

This document outlines a Big Data training course offered by TechGeest Solutions. The course covers fundamental Big Data concepts like Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Sqoop and HBase. It also covers more advanced topics like Spark, Scala, Spark SQL, Data Frames, Spark Streaming and Kafka. The course includes lectures, hands-on labs and real-world projects. Students will learn to install Hadoop clusters, run MapReduce jobs, load and query data using Hive and Spark SQL, build real-time streaming applications using Spark Streaming and Kafka, and more.

Uploaded by

Quantico Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

TechGeest Solutions

www.techgeest.com Opp Manyatha Tech Park,


+91-9620828049, 8095799993 Gate No:1 (IBM), 2nd Floor,
(By Real Time Expert) Siddhartha Learning Academy,
Above Kuttunad Restaurant

 Hive
 BigData Hadoop o Comparison with RDBMS
 Fundamentals o HQL
o Data Storage and Analysis o Data types
o Comparison with RDBMS o Importing and Exporting
 Hadoop – A Brief History o Partitioning and Bucketing – Advanced.
o Joins and Join Optimization.
 HDFS
o Functions- Built in & user defined
o Blocks
o Advanced Optimization of HQL
o NN & DN
o Storage File Formats – Advanced
o HDFS Federation & High Availability
o Loading and Storing Data
 HDFS Clients o SerDes – Advanced
o HDFS Command Line
 Sqoop
o HDFS CLI – File System Operations Lab
o Introduction
o HDFS Web UI
o Import – Deep dive
o HDFS Java Client
o Export – Deep dive
o HDFS Java Client – File System Operations
o Sqoop Optimization – Incremental Load
Lab
o Real time scenarios
o CRUD Operations using Java Client
 YARN – Cluster Management (Hadoop  Flume
o Configure Flume and Import data
2.x) o Architecture and LAB
o How Yarn Applications run?
o YARN vs Map Reduce
 Oozie
o Different workflow jobs
o YARN Scheduling
o Ooze scheduler. LAB
 Capacity, Fair Scheduler, FIFO
 Map Reduce  HBase
o MR Programming Model o NoSQL databases Introduction
o Input Formats o CAP theorem
o Output Formats o HBase Architecture
o Compression o HBase Clients – Java Client
o Serialization & Data Types o Loading Data
o File Based Data structures o Hive – HBase Integration
o Sequence file, Map File, ORC, Parquet  Monitoring the Cluster
o Tuning Map Reduce Jobs o Horton Works Ambari
o Advanced Map Reduce o Cloudera Manager
 Joins -Map-side, o MapR MCS
 Reduce-side o HUE, RM UI
 Distributed Cache  Real Time Project

1/2 | P a g e
TechGeest Solutions
www.techgeest.com Opp Manyatha Tech Park,
+91-9620828049, 8095799993 Gate No:1 (IBM), 2nd Floor,
(By Real Time Expert) Siddhartha Learning Academy,
Above Kuttunad Restaurant

o Architecture o Catalyst Query Optimization


o Terminology used o Hands-on
o Production implementation o Creating (CSV, JSON) Data Frames

 Cont.… o Querying with Data Frame API and SQL


o Caching and Re-using Data Frames

 SPARK & SCALA 


o Process Hive data in Spark

 Scala Basics  Spark Streaming


o Lecture, Functional language o Lecture, Streaming Sources
o Scala Vs Java o DStream APIs and Stateful Streams
o Hands-On o Hands-On
o Strings, Numbers o Creating DStreams from Sources
o List, Array, Map, Set o Operating on DStream Data
o Control Statements, collections o Structured Streaming
o Functions, methods
 Kafka
o Pattern matching
o Kafka introduction
 Spark Overview o Installation
o Lecture o Kafka integration with Spark
o The power of Spark? o Integration with Flume
o Spark Ecosystem
 Labs:
o Spark Components vs Hadoop
o Covers All Certification Syllabus
o Hands-On
o Real Time use cases and Data sets
o Installation & Eclipse configuration
covered
o Programs in Command line Interface &
o Word count, Sensors(Weather
Eclipse
Sensors)Dataset, Social Media data sets
o Process Local, HDFS files
like YouTube, Twitter data analysis,
 RDD Fundamentals o Unix Basics Lab
o Lecture, Purpose and Structure of RDDs o SparkSQL, Hadoop, Hive, Sqoop, Oozie,
o Transformations, Actions, and DAG HBase, Flume Installations –Pseudo Mode
o Key-Value Pair RDDs
 Master Projects:
o Hands-On
o Real-time BigData EDW
o Creating RDDs from Data Files
o Real-time Streaming Application
o Reshaping Data to Add Structure
o Real-time concepts covered are
o Interactive Queries Using RDDs
 Spark SQL, SCALA
 SparkSQL and Data Frames  Hive - Advanced topics
o Lecture  Sqoop import/export
o Spark SQL and Data Frame Uses  Oozie Scheduling
o Data Frame / SQL APIs  How Hadoop MR used in DW

2/2 | P a g e
TechGeest Solutions
www.techgeest.com Opp Manyatha Tech Park,
+91-9620828049, 8095799993 Gate No:1 (IBM), 2nd Floor,
(By Real Time Expert) Siddhartha Learning Academy,
Above Kuttunad Restaurant

 RDBMS concepts, ETL tool


concepts, Integration with
Reporting tools

3/2 | P a g e

You might also like