0% found this document useful (0 votes)
37 views5 pages

Hive Query

The document provides Hive queries to: 1) Create an external employee table from a CSV file and skip the header 2) Load data into the employee table from the CSV file 3) Create a bucketed table named emp_bucket partitioned by gender 4) Set the Hive bucketing property and insert data from the employee table into the emp_bucket table

Uploaded by

PRAMOTH KJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views5 pages

Hive Query

The document provides Hive queries to: 1) Create an external employee table from a CSV file and skip the header 2) Load data into the employee table from the CSV file 3) Create a bucketed table named emp_bucket partitioned by gender 4) Set the Hive bucketing property and insert data from the employee table into the emp_bucket table

Uploaded by

PRAMOTH KJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

12/11/23, 11:16 PM ChatGPT

You:

Write Hive query: a) Create external employee table with schema emp_id, empname, age, gender, salary, department, country. Skip first header.
b) load data into employee table using file employee_data.csv c) create partitioned table named as emp_partition, partition by country. d) Set hive
dynamic partition property and enable it. e) Insert data from employee table into emp_partition table. seperate code for each question

------------------

ChatGPT:

Certainly! Below are the separate Hive queries for each of the specified tasks:

a) Create external employee table with schema emp_id, empname, age, gender, salary, department, country. Skip first header.

sql
CREATE EXTERNAL TABLE IF NOT EXISTS employee (
emp_id INT,
empname STRING,
age INT,
gender STRING,
salary DOUBLE,
department STRING,
country STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/path/to/external_table_location'
TBLPROPERTIES ("skip.header.line.count"="1");

Make sure to replace /path/to/external_table_location with the actual HDFS or local file system path where you want to store the external
table.

b) Load data into the employee table using the file employee_data.csv.
https://chat.openai.com/c/1b011b8a-51a0-411c-b906-76e1fb0560fd 1/5
12/11/23, 11:16 PM ChatGPT

sql
LOAD DATA LOCAL INPATH '/path/to/employee_data.csv' INTO TABLE employee;

Replace /path/to/employee_data.csv with the actual path to your CSV file.

c) Create a partitioned table named emp_partition, partition by country.

sql
CREATE TABLE IF NOT EXISTS emp_partition (
emp_id INT,
empname STRING,
age INT,
gender STRING,
salary DOUBLE,
department STRING
)
PARTITIONED BY (country STRING);

d) Set Hive dynamic partition property and enable it.

sql
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

e) Insert data from the employee table into the emp_partition table.

sql
INSERT OVERWRITE TABLE emp_partition
PARTITION (country)
SELECT emp_id, empname, age, gender, salary, department, country
FROM employee;

These queries assume that you have a CSV file named employee_data.csv with the specified columns and data. Adjust the paths and column
names as needed for your specific use case.

https://chat.openai.com/c/1b011b8a-51a0-411c-b906-76e1fb0560fd 2/5
12/11/23, 11:16 PM ChatGPT

------------------

You:

Write Hive query: a) Create external employee table with schema emp_id, empname, age, gender, salary, department, country. Skip first header.
b) load data into employee table using file employee_data.csv c) create bucket table named as emp_bucket, bucketed by gender. d) Set hive
bucketing property and enable it. e) Insert data from employee table into emp_bucket table.

------------------

ChatGPT:

Certainly! Below are the separate Hive queries for each of the specified tasks:

a) Create an external employee table with the schema emp_id, empname, age, gender, salary, department, country. Skip the first header.

sql
CREATE EXTERNAL TABLE IF NOT EXISTS employee (
emp_id INT,
empname STRING,
age INT,
gender STRING,
salary DOUBLE,
department STRING,
country STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/path/to/external_table_location'
TBLPROPERTIES ("skip.header.line.count"="1");

https://chat.openai.com/c/1b011b8a-51a0-411c-b906-76e1fb0560fd 3/5
12/11/23, 11:16 PM ChatGPT

Make sure to replace /path/to/external_table_location with the actual HDFS or local file system path where you want to store the external
table.

b) Load data into the employee table using the file employee_data.csv.

sql
LOAD DATA LOCAL INPATH '/path/to/employee_data.csv' INTO TABLE employee;

Replace /path/to/employee_data.csv with the actual path to your CSV file.

c) Create a bucketed table named emp_bucket, bucketed by gender.

sql
CREATE TABLE IF NOT EXISTS emp_bucket (
emp_id INT,
empname STRING,
age INT,
salary DOUBLE,
department STRING,
country STRING
)
CLUSTERED BY (gender) INTO <number_of_buckets> BUCKETS;

Replace <number_of_buckets> with the desired number of buckets. The number of buckets should be chosen based on the data distribution and
performance requirements.

d) Set Hive bucketing property and enable it.

sql
SET hive.enforce.bucketing=true;

e) Insert data from the employee table into the emp_bucket table.

sql

https://chat.openai.com/c/1b011b8a-51a0-411c-b906-76e1fb0560fd 4/5
12/11/23, 11:16 PM ChatGPT

INSERT OVERWRITE TABLE emp_bucket


SELECT emp_id, empname, age, salary, department, country, gender
FROM employee;

These queries assume that you have a CSV file named employee_data.csv with the specified columns and data. Adjust the paths and column
names as needed for your specific use case. Additionally, make sure to choose an appropriate number of buckets based on your data distribution.

------------------

https://chat.openai.com/c/1b011b8a-51a0-411c-b906-76e1fb0560fd 5/5

You might also like