This document provides a cheat sheet on Hive functions including built-in functions and how to create user-defined functions (UDFs). It lists common function categories like mathematical, string, and conversion functions and provides the function signature and description. It also describes how to develop custom UDFs in Java and annotate them to register the function in Hive.
This document provides a cheat sheet on Hive functions including built-in functions and how to create user-defined functions (UDFs). It lists common function categories like mathematical, string, and conversion functions and provides the function signature and description. It also describes how to develop custom UDFs in Java and annotate them to register the function in Hive.
This document provides a cheat sheet on Hive functions including built-in functions and how to create user-defined functions (UDFs). It lists common function categories like mathematical, string, and conversion functions and provides the function signature and description. It also describes how to develop custom UDFs in Java and annotate them to register the function in Hive.
This document provides a cheat sheet on Hive functions including built-in functions and how to create user-defined functions (UDFs). It lists common function categories like mathematical, string, and conversion functions and provides the function signature and description. It also describes how to develop custom UDFs in Java and annotate them to register the function in Hive.
How to create and use Hive Functions, Listing of Built-In Functions that are supported in Hive
Hive Function Meta commands Mathematical Functions
SHOW FUNCTIONS lists Hive functions and operators Return Type Name (Signature) Description DESCRIBE FUNCTION [function name] displays short description of the function BIGINT round(double a) Returns the rounded BIGINT value of the double DOUBLE round(double a, int d) Returns the double rounded to d decimal places DESCRIBE FUNCTION EXTENDED [function name] access extended description of the function BIGINT floor(double a) Returns the maximum BIGINT value that is equal or less than the double BIGINT ceil(double a), ceiling(double a) Returns the minimum BIGINT value that is equal or greater than the double Types of Hive Functions double rand(), rand(int seed) Returns a random number (that changes from row to row) that is distributed uniformly from 0 to 1. Specifiying the seed will make sure the generated random number sequence is deterministic. UDF is a function that takes one or more columns from a row as argument and returns a single value or object. Eg: concat(col1, col2) double exp(double a) Returns eawhere e is the base of the natural logarithm UDAF- aggregates column values in multiple rows and returns a single value. Eg: sum(c1) double ln(double a) Returns the natural logarithm of the argument UDTF takes zero or more inputs and and produces multiple columns or rows of output. Eg: explode() double log10(double a) Returns the base-10 logarithm of the argument Macros a function that users other Hive functions. double log2(double a) Returns the base-2 logarithm of the argument double log(double base, double a) Return the base "base" logarithm of the argument How To Develop UDFs double pow(double a, double p), power(double a, double p) Return ap double sqrt(double a) Returns the square root of a package org.apache.hadoop.hive.contrib.udf.example; string bin(BIGINT a) Returns the number in binary format import java.util.Date; string hex(BIGINT a) hex(string a) If the argument is an int, hex returns the number as a string in hex format. Otherwise if the number is a string, import java.text.SimpleDateFormat; it converts each character into its hex representation and returns the resulting string. import org.apache.hadoop.hive.ql.exec.UDF; string unhex(string a) Inverse of hex. Interprets each pair of characters as a hexidecimal number and converts to the character represented by the number. @Description(name = "YourUDFName", string conv(BIGINT num, int from_base, int to_base), Converts a number from a given base to another value = "_FUNC_(InputDataType) - using the input datatype X argument, "+ conv(STRING num, int from_base, int to_base) "returns YYY.", double abs(double a) Returns the absolute value extended = "Example:\n" int double pmod(int a, int b) pmod(double a, double b) Returns the positive value of a mod b + " > SELECT _FUNC_(InputDataType) FROM tablename;") double sin(double a) Returns the sine of a (a is in radians) double asin(double a) Returns the arc sin of x if -1<=a<=1 or null otherwise public class YourUDFName extends UDF{ double cos(double a) Returns the cosine of a (a is in radians) .. double acos(double a) Returns the arc cosine of x if -1<=a<=1 or null otherwise public YourUDFName( InputDataType InputValue ){ double tan(double a) Returns the tangent of a (a is in radians) double atan(double a) Returns the arctangent of a ..; double degrees(double a) Converts value of a from radians to degrees } double radians(double a) Converts value of a from degrees to radians int double positive(int a), positive(double a) Returns a public String evaluate( InputDataType InputValue ){ int double negative(int a), negative(double a) Returns -a ..; float sign(double a) Returns the sign of a as '1.0' or '-1.0' } double e() Returns the value of e } double pi() Returns the value of pi
How To Develop UDFs, GenericUDFs, UDAFs, and UDTFs Date Functions
public class YourUDFName extends UDF{ Return Type Name (Signature) Description string from_unixtime(bigint unixtime[, string format]) Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the public class YourGenericUDFName extends GenericUDF {..} timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00" public class YourGenericUDAFName extends AbstractGenericUDAFResolver {..} bigint unix_timestamp() Gets current time stamp using the default time zone. public class YourGenericUDTFName extends GenericUDTF {..} unix_timestamp(string date) Converts time string in formatyyyy-MM-dd HH:mm:ssto Unix time stamp, return 0 if fail: unix_timestamp bigint ('2009-03-20 11:30:01') = 1237573801 How To Deploy / Drop UDFs bigint unix_timestamp(string date, string pattern) Convert time string with given pattern to Unix time stamp, return 0 if fail: unix_timestamp('2009-03-20', 'yyyy-MM-dd') = 1237532400 At start of each session: to_date(string timestamp) Returns the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01" string ADD JAR /full_path_to_jar/YourUDFName.jar; year(string date) Returns the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") int CREATE TEMPORARY FUNCTION YourUDFName AS 'org.apache.hadoop.hive.contrib.udf.example.YourUDFName'; = 1970 int month(string date) Returns the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month At the end of each session: ("1970-11-01") = 11 DROP TEMPORARY FUNCTION IF EXISTS YourUDFName; day(string date) dayofmonth(date) Return the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1 int int hour(string date) Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12 int minute(string date) Returns the minute of the timestamp int second(string date) Returns the second of the timestamp int weekofyear(string date) Return the week number of a timestamp string: weekofyear("1970-11-01 00:00:00") = 44, weekofyear ("1970-11-01") = 44 int datediff(string enddate, string startdate) Return the number of days from startdate to enddate: datediff('2009-03-01', '2009-02-27') = 2 string date_add(string startdate, int days) Add a number of days to startdate: date_add('2008-12-31', 1) = '2009-01-01' string date_sub(string startdate, int days) Subtract a number of days to startdate: date_sub('2008-12-31', 1) = '2008-12-30' timestamp from_utc_timestamp(timestamp, string timezone) Assumes given timestamp ist UTC and converts to given timezone (as of Hive0.8.0) timestamp to_utc_timestamp(timestamp, string timezone) Assumes given timestamp is in given timezone and converts to UTC (as of Hive0.8.0)