0% found this document useful (0 votes)
13 views8 pages

Hive Basics MCA

Apache Hive is a data warehouse infrastructure built on Hadoop that allows for querying large datasets using HiveQL, a SQL-like language. It consists of components such as Metastore and Execution Engine, and supports both Managed and External Tables. Hive is optimized for batch processing and analytical queries, and allows for user-defined functions to enhance its capabilities.

Uploaded by

Jega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Hive Basics MCA

Apache Hive is a data warehouse infrastructure built on Hadoop that allows for querying large datasets using HiveQL, a SQL-like language. It consists of components such as Metastore and Execution Engine, and supports both Managed and External Tables. Hive is optimized for batch processing and analytical queries, and allows for user-defined functions to enhance its capabilities.

Uploaded by

Jega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Basic Concepts of Hive

For MCA Students


Presented by [Your Name]
Introduction to Hive
• • Apache Hive is a data warehouse
infrastructure built on top of Hadoop.
• • It allows querying and managing large
datasets using HiveQL, a SQL-like language.
• • Hive is best suited for batch processing and
analytical queries.
Architecture of Hive
• • Hive consists of components like Metastore,
Driver, Compiler, Execution Engine, and HDFS.
• • Queries in Hive are converted into
MapReduce, Tez, or Spark jobs for execution.
• • The Metastore stores schema and metadata
for tables.
HiveQL and its Features
• • HiveQL is a SQL-like language used to
interact with Hive.
• • It supports SELECT, INSERT, UPDATE, DELETE,
GROUP BY, and JOIN operations.
• • It simplifies data querying for users familiar
with SQL.
Tables in Hive
• • Hive supports two types of tables: Managed
Tables and External Tables.
• • Managed tables store data in HDFS and
delete data when the table is dropped.
• • External tables reference data stored
externally and do not delete data when
dropped.
Querying Data with Hive
• • HiveQL allows data retrieval using SELECT
statements.
• • Queries can include filtering, sorting,
aggregation, and joins.
• • Example: SELECT * FROM students WHERE
age > 20;
User-Defined Functions (UDFs) in
Hive
• • Hive allows users to create custom UDFs for
additional functionality.
• • UDFs can be written in Java and registered in
Hive.
• • Example: A UDF to convert temperature
from Celsius to Fahrenheit.
Comparison with Traditional
Databases
• • Hive follows a schema-on-read approach,
unlike traditional databases which use
schema-on-write.
• • It is optimized for read-heavy analytical
queries rather than transactional processing.
• • Hive scales horizontally by distributing
computations across Hadoop clusters.

You might also like