Hive Function Cheat Sheet

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Hive Functions Cheat-sheet, by Qubole

How to create and use Hive Functions, Listing of Built-In Functions that are supported in Hive

Hive Function Meta commands Mathematical Functions


SHOW FUNCTIONS lists Hive functions and operators Return Type Name (Signature) Description
DESCRIBE FUNCTION [function name] displays short description of the function BIGINT round(double a) Returns the rounded BIGINT value of the double
DOUBLE round(double a, int d) Returns the double rounded to d decimal places
DESCRIBE FUNCTION EXTENDED [function name] access extended description of the function
BIGINT floor(double a) Returns the maximum BIGINT value that is equal or less than the double
BIGINT ceil(double a), ceiling(double a) Returns the minimum BIGINT value that is equal or greater than the double
Types of Hive Functions double rand(), rand(int seed) Returns a random number (that changes from row to row) that is distributed uniformly from 0 to 1.
Specifiying the seed will make sure the generated random number sequence is deterministic.
UDF is a function that takes one or more columns from a row as argument and returns a single value or object. Eg: concat(col1, col2)
double exp(double a) Returns eawhere e is the base of the natural logarithm
UDAF- aggregates column values in multiple rows and returns a single value. Eg: sum(c1)
double ln(double a) Returns the natural logarithm of the argument
UDTF takes zero or more inputs and and produces multiple columns or rows of output. Eg: explode() double log10(double a) Returns the base-10 logarithm of the argument
Macros a function that users other Hive functions. double log2(double a) Returns the base-2 logarithm of the argument
double log(double base, double a) Return the base "base" logarithm of the argument
How To Develop UDFs double pow(double a, double p), power(double a, double p) Return ap
double sqrt(double a) Returns the square root of a
package org.apache.hadoop.hive.contrib.udf.example; string bin(BIGINT a) Returns the number in binary format
import java.util.Date; string hex(BIGINT a) hex(string a) If the argument is an int, hex returns the number as a string in hex format. Otherwise if the number is a string,
import java.text.SimpleDateFormat; it converts each character into its hex representation and returns the resulting string.
import org.apache.hadoop.hive.ql.exec.UDF; string unhex(string a) Inverse of hex. Interprets each pair of characters as a hexidecimal number and converts to the character
represented by the number.
@Description(name = "YourUDFName",
string conv(BIGINT num, int from_base, int to_base), Converts a number from a given base to another
value = "_FUNC_(InputDataType) - using the input datatype X argument, "+
conv(STRING num, int from_base, int to_base)
"returns YYY.",
double abs(double a) Returns the absolute value
extended = "Example:\n" int double pmod(int a, int b) pmod(double a, double b) Returns the positive value of a mod b
+ " > SELECT _FUNC_(InputDataType) FROM tablename;") double sin(double a) Returns the sine of a (a is in radians)
double asin(double a) Returns the arc sin of x if -1<=a<=1 or null otherwise
public class YourUDFName extends UDF{ double cos(double a) Returns the cosine of a (a is in radians)
.. double acos(double a) Returns the arc cosine of x if -1<=a<=1 or null otherwise
public YourUDFName( InputDataType InputValue ){ double tan(double a) Returns the tangent of a (a is in radians)
double atan(double a) Returns the arctangent of a
..;
double degrees(double a) Converts value of a from radians to degrees
}
double radians(double a) Converts value of a from degrees to radians
int double positive(int a), positive(double a) Returns a
public String evaluate( InputDataType InputValue ){ int double negative(int a), negative(double a) Returns -a
..; float sign(double a) Returns the sign of a as '1.0' or '-1.0'
} double e() Returns the value of e
} double pi() Returns the value of pi

How To Develop UDFs, GenericUDFs, UDAFs, and UDTFs Date Functions


public class YourUDFName extends UDF{ Return Type Name (Signature) Description
string from_unixtime(bigint unixtime[, string format]) Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the
public class YourGenericUDFName extends GenericUDF {..}
timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00"
public class YourGenericUDAFName extends AbstractGenericUDAFResolver {..}
bigint unix_timestamp() Gets current time stamp using the default time zone.
public class YourGenericUDTFName extends GenericUDTF {..} unix_timestamp(string date) Converts time string in formatyyyy-MM-dd HH:mm:ssto Unix time stamp, return 0 if fail: unix_timestamp
bigint
('2009-03-20 11:30:01') = 1237573801
How To Deploy / Drop UDFs bigint unix_timestamp(string date, string pattern) Convert time string with given pattern to Unix time stamp, return 0 if fail: unix_timestamp('2009-03-20',
'yyyy-MM-dd') = 1237532400
At start of each session: to_date(string timestamp) Returns the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"
string
ADD JAR /full_path_to_jar/YourUDFName.jar; year(string date) Returns the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01")
int
CREATE TEMPORARY FUNCTION YourUDFName AS 'org.apache.hadoop.hive.contrib.udf.example.YourUDFName'; = 1970
int month(string date) Returns the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month
At the end of each session: ("1970-11-01") = 11
DROP TEMPORARY FUNCTION IF EXISTS YourUDFName; day(string date) dayofmonth(date) Return the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1
int
int hour(string date) Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12
int minute(string date) Returns the minute of the timestamp
int second(string date) Returns the second of the timestamp
int weekofyear(string date) Return the week number of a timestamp string: weekofyear("1970-11-01 00:00:00") = 44, weekofyear
("1970-11-01") = 44
int datediff(string enddate, string startdate) Return the number of days from startdate to enddate: datediff('2009-03-01', '2009-02-27') = 2
string date_add(string startdate, int days) Add a number of days to startdate: date_add('2008-12-31', 1) = '2009-01-01'
string date_sub(string startdate, int days) Subtract a number of days to startdate: date_sub('2008-12-31', 1) = '2008-12-30'
timestamp from_utc_timestamp(timestamp, string timezone) Assumes given timestamp ist UTC and converts to given timezone (as of Hive0.8.0)
timestamp to_utc_timestamp(timestamp, string timezone) Assumes given timestamp is in given timezone and converts to UTC (as of Hive0.8.0)

www.qubole.com QUESTIONS? CALL US 855-HADOOP-HELP

You might also like