Apache Hive is a data warehouse infrastructure built on Hadoop that allows for querying large datasets using HiveQL, a SQL-like language. It consists of components such as Metastore and Execution Engine, and supports both Managed and External Tables. Hive is optimized for batch processing and analytical queries, and allows for user-defined functions to enhance its capabilities.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
13 views8 pages
Hive Basics MCA
Apache Hive is a data warehouse infrastructure built on Hadoop that allows for querying large datasets using HiveQL, a SQL-like language. It consists of components such as Metastore and Execution Engine, and supports both Managed and External Tables. Hive is optimized for batch processing and analytical queries, and allows for user-defined functions to enhance its capabilities.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8
Basic Concepts of Hive
For MCA Students
Presented by [Your Name] Introduction to Hive • • Apache Hive is a data warehouse infrastructure built on top of Hadoop. • • It allows querying and managing large datasets using HiveQL, a SQL-like language. • • Hive is best suited for batch processing and analytical queries. Architecture of Hive • • Hive consists of components like Metastore, Driver, Compiler, Execution Engine, and HDFS. • • Queries in Hive are converted into MapReduce, Tez, or Spark jobs for execution. • • The Metastore stores schema and metadata for tables. HiveQL and its Features • • HiveQL is a SQL-like language used to interact with Hive. • • It supports SELECT, INSERT, UPDATE, DELETE, GROUP BY, and JOIN operations. • • It simplifies data querying for users familiar with SQL. Tables in Hive • • Hive supports two types of tables: Managed Tables and External Tables. • • Managed tables store data in HDFS and delete data when the table is dropped. • • External tables reference data stored externally and do not delete data when dropped. Querying Data with Hive • • HiveQL allows data retrieval using SELECT statements. • • Queries can include filtering, sorting, aggregation, and joins. • • Example: SELECT * FROM students WHERE age > 20; User-Defined Functions (UDFs) in Hive • • Hive allows users to create custom UDFs for additional functionality. • • UDFs can be written in Java and registered in Hive. • • Example: A UDF to convert temperature from Celsius to Fahrenheit. Comparison with Traditional Databases • • Hive follows a schema-on-read approach, unlike traditional databases which use schema-on-write. • • It is optimized for read-heavy analytical queries rather than transactional processing. • • Hive scales horizontally by distributing computations across Hadoop clusters.