The document discusses Hive data modeling concepts including primitive and complex data types, tables, partitioning, bucketing, external tables, SerDe, and HiveQL commands like CREATE, SELECT, INSERT, and ALTER. Hive allows modeling structured data and querying it using a SQL-like language.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
30 views6 pages
HIVE Data Types
The document discusses Hive data modeling concepts including primitive and complex data types, tables, partitioning, bucketing, external tables, SerDe, and HiveQL commands like CREATE, SELECT, INSERT, and ALTER. Hive allows modeling structured data and querying it using a SQL-like language.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6
• Primitive Data Types:
– Hive supports a range of primitive data types similar
to those in most programming languages. These include INT, BIGINT, FLOAT, DOUBLE, BOOLEAN, STRING, and TIMESTAMP, among others. • Complex Data Types: – Hive also supports complex data types that are useful for representing structured data. Examples include: • Arrays: Ordered collections of elements of the same type. • Maps: Key-value pairs where keys and values can be of any data type. • Structs: Similar to structs in programming languages, representing a collection of named fields. Data Models with HIVE • Table: – The basic building block in Hive is the table. Tables in Hive define the structure of the data and how it is stored. They are similar to tables in relational databases and can be partitioned for better performance. • Partitioning: – Hive allows you to partition data in a table based on one or more columns. This is particularly useful when dealing with large datasets, as it helps optimize queries by reducing the amount of data that needs to be scanned. • Bucketing: – Bucketing is another technique in Hive for organizing data. It involves dividing data into buckets based on a hash function applied to one or more columns. Bucketing can improve query performance by reducing the number of files that need to be read. • External Tables: – Hive supports external tables, where the data is stored outside of the Hive warehouse directory. This is useful when you want to manage data that is generated or updated by processes outside of Hive. • SerDe (Serializer/Deserializer): – Hive uses SerDe for processing data during loading and unloading. SerDe allows Hive to work with various data formats like JSON, XML, Avro, etc. It defines how data is serialized and deserialized. • Data Modeling with HiveQL: – Hive uses HiveQL, a SQL-like language, for querying data. Through HiveQL, users can define and manipulate the data model, including creating tables, altering their structures, and performing various transformations. Fundamental Commands of HQL • Hive Query Language (HiveQL) is similar to SQL, and it allows users to interact with Hive to query and manage data. Here are some commonly used HiveQL commands along with their syntax: – Create Database: – Create a new database in Hive. – sqlCopy code – CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT 'database_comment'] [LOCATION 'hdfs_path'] [WITH DBPROPERTIES ('key1'='value1', 'key2'='value2')]; • Use Database: – Set the default database for the session. – sqlCopy code – USE database_name; • Create Table: – Create a new table in Hive. – sqlCopy code – CREATE TABLE [IF NOT EXISTS] table_name ( column1 data_type, column2 data_type, ... ) [COMMENT 'table_comment'] [PARTITIONED BY (partition_column data_type, ...)] [CLUSTERED BY (bucketed_column1, bucketed_column2, ...) INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION 'hdfs_path'] [TBLPROPERTIES ('key1'='value1', 'key2'='value2')]; • Load Data: – Load data into a Hive table. – sqlCopy code – LOAD DATA [LOCAL] INPATH 'local_path' [OVERWRITE] INTO TABLE table_name [PARTITION (partition_column=value, ...)]; • Select Data: – Retrieve data from one or more tables. – sqlCopy code – SELECT [ALL | DISTINCT] column1, column2, ... FROM table_name [WHERE condition] [GROUP BY column1, column2, ...] [HAVING condition] [ORDER BY column1 [ASC | DESC], column2 [ASC | DESC], ...] [LIMIT n]; • Insert Data: – Insert data into a Hive table. – sqlCopy code – INSERT [OVERWRITE | INTO] TABLE target_table [PARTITION (partition_column=value, ...)] select_statement; • Alter Table: – Modify the structure of an existing table. – sqlCopy code – ALTER TABLE table_name ADD COLUMNS (new_column data_type, ...); • Describe Table: – View the details of a table's structure. – sqlCopy code – DESCRIBE [EXTENDED | FORMATTED] table_name; • Drop Database: – Remove a database and its associated tables. • sqlCopy code • DROP DATABASE [IF EXISTS] database_name [CASCADE]; • Drop Table: – Remove a table from the database. • sqlCopy code • DROP TABLE [IF EXISTS] table_name;