Big Data Notes
Big Data Notes
Assignment - 01
● Access a Hadoop cluster with YARN installed, get familiar
with Hadoop ecosystem components, and basic knowledge of
cluster resource management.
SUB POINTS
• Introduction.
• What is Hadoop?.
• Hadoop Ecosystem Components.
• What is YARN?.
• Components of YARN.
• How YARN Works.
• Benefits of YARN.
• Introduction.
Features of Hive
• Hive is fast and scalable.
• It provides SQL-like queries (i.e., HQL) that are implicitly transformed to MapReduce
or Spark jobs.
• It is capable of analyzing large datasets stored in HDFS.
• It allows different storage types such as plain text, RCFile, and HBase. It uses
indexing to accelerate queries.
• It can operate on compressed data stored in the Hadoop ecosystem.
• It supports user-defined functions (UDFs) where user can provide its functionality.
Limitations of Hive
• Hive is not capable of handling real-time data.
• It is not designed for online transaction processing.
• Hive queries contain high latency.
THANK YOU