MaxComputeSQL Only Modules
MaxComputeSQL Only Modules
6.MaxCompute SQL
6.1. Overview
6.1.1. Scenarios
T his t opic describes t he scenarios of MaxComput e SQL.
MaxComput e SQL offline comput ing is applicable t o scenarios where large volumes of dat a (t erabyt es)
need t o be processed, but do not have high real-t ime requirement s. In such scenarios, it t akes a
relat ively long t ime t o prepare and submit each job. MaxComput e SQL is not well-suit ed for businesses
t hat require t o process t housands of t ransact ions per second. MaxComput e SQL online comput ing
provides near real-t ime (NRT ) processing capabilit ies.
MaxComput e SQL uses t he synt ax t hat is similar t o SQL synt ax. It can be considered as a subset of
st andard SQL. However, MaxComput e SQL is not equivalent t o a dat abase. It does not have common
dat abase charact erist ics, such as t ransact ions, primary key const rains, and indexes. T he maximum lengt h
of SQL st at ement s current ly support ed by MaxComput e is 2 MB.
Common reserved words are list ed as follows. For a complet e list of reserved words, see Reserved
words.
It is easy t o underest imat e t he number of part it ions generat ed when mult i-level part it ions are used.
When a huge number of part it ions are generat ed, you must evaluat e t he original dat a t o det ermine if
t here are excessive part it ions.
You can creat e up t o six levels of part it ions. For some MaxComput e commands, t he synt ax differs
bet ween part it ioned and non-part it ioned t ables. For more informat ion, see DDL st at ement s and DML
st at ement s.
For more informat ion about t he t able creat ion st at ement , see Creat e a t able.
Bigint – Y Y N N Y
Do uble Y – Y N N Y
St ring Y Y – Y N Y
Dat et ime N N Y – N N
Bo o lean N N N N – N
Decimal Y Y Y N N –
Y indicat es t hat t he t ype can be convert ed. N indicat es t hat t he t ype cannot be convert ed.
Not e
When double t ype values are convert ed t o bigint , t he fract ional is t runcat ed. For example,
cast (1.6 as bigint ) = 1.
When a st ring t hat meet s double t ype requirement s is convert ed t o bigint , t he st ring is first
convert ed t o t he double t ype before it is convert ed t o t he bigint t ype. Hence, t he fract ional
is t runcat ed. For example, cast ("1.6" as bigint ) = 1.
When a st ring t hat meet s bigint t ype requirement s is convert ed t o t he double t ype, one
decimal is ret ained. For example, cast ("1" as double) = 1.0.
T o convert a const ant st ring t o t he decimal t ype, enclose t he const ant st ring wit hin a pair
of quot at ion marks. If t he value is not enclosed in quot at ion marks, it is t reat ed as a double
t ype value. For example, cast ("1.234567890123456789" as decimal).
Unsupport ed explicit t ype conversion operat ions cause an except ion.
If a conversion fails during execut ion, t he syst em ret urns an error and exit s.
T he dat et ime dat a conversion uses t he default format yyyy-mm-dd hh:mi:ss. For more
informat ion, see Convert dat a bet ween st ring and dat et ime t ypes.
Some t ypes cannot be explicit ly convert ed, but can be convert ed using built -in SQL
funct ions. For example, t he t o_char funct ion can be used t o convert boolean t ype values t o
t he st ring t ype. For more informat ion, see T O_CHAR. T he t o_dat e funct ion can be used t o
convert st ring t ype values t o t he dat et ime t ype. For more informat ion, see T O_DAT E.
For more informat ion about CAST , see CAST .
When t he values of t he decimal t ype are out of t he value range, t he cast st ring t o decimal
operat ion may ret urn an error, such as most significant bit overflow or least significant bit
overflow t runcat ion.
BOOLEAN T F F F F F
T INY INT F T T T T T
SMALLINT F F T T T T
INT F F F T T T
BIGINT F F F F T T
FLOAT F F F F F T
DOUBLE F F F F F F
DECIMAL F F F F F F
ST RING F F F F F F
V ARCHAR F F F F F F
T IMEST AMP F F F F F F
BINARY F F F F F F
BOOLEAN F F F F F F
T INY INT T T T T F F
SMALLINT T T T T F F
INT T T T T F F
BIGINT T T T T F F
FLOAT T T T T F F
DOUBLE T T T T F F
DECIMAL F T T T F F
ST RING T T T T F F
V ARCHAR T T T T F F
T IMEST AMP F F T T T F
BINARY F F F F F T
T indicat es t hat t he t ype conversion can be performed, while F indicat es t hat t he t ype conversion
cannot be performed.
Not e
An unsupport ed implicit t ype conversion will cause an except ion.
If t he conversion fails, an error is ret urned.
Implicit t ype conversion is aut omat ically performed by MaxComput e based on cont ext . If t he
t ypes do not mat ch, we recommend t hat you perform explicit t ype conversion using cast .
T he rules of implicit t ype conversion are applied t o different specific scopes. In cert ain
scenarios, only part of t he rules will t ake effect . For more informat ion, see t he scope of
implicit t ype conversions.
Relat ional operat ors include equal t o (=), not equal t o (<>), less t han (<), less t han or equal t o (<=),
great er t han (>), great er t han or equal t o (>=), IS NULL, IS NOT NULL, LIKE, RLIKE, and IN. T he implicit
conversion rules of LIKE, RLIKE, and IN are different from t hose of t he ot her relat ional operat ors. T hese
t hree operat ors are described in a separat e sect ion. T he rules described in t his sect ion do not apply t o
t hese t hree operat ors. T he following t able list s implicit conversion rules when different t ypes of dat a
are involved in relat ional calculat ions.
BOOLEAN N N N N – N
Not e
If implicit t ype conversion is not support ed bet ween t wo values t o be compared, t he
relat ional operat ion cannot be complet ed and an error is ret urned.
For more informat ion about relat ional operat ors, see Relat ional operat ors.
Special relat ional operat ors are LIKE, RLIKE, and IN.
Not e t he following point s for t he t wo relat ional operat ors in implicit t ype conversion:
T he source and pat t ern paramet ers of LIKE and RLIKE must be of t he st ring t ype.
Ot her t ypes are not support ed by t his operat ion and cannot be implicit ly convert ed t o t he ST RING
t ype.
If t he value of source or pat t ern is NULL, t he operat ion ret urns NULL.
IN is used as follows:
T he memory used by t he compiler increases wit h t he number of paramet ers used by t he IN operat ion.
An IN operat ion wit h 5,000 paramet ers consumes 17 GB of memory wit h t he GCC compiler. We
recommend t hat you limit t he number of paramet ers t o around 1,024. In t his case, memory consumpt ion
will peak at 1 GB and compilat ion will only t ake 39 seconds.
Implicit t ype conversion wit h arit hmet ic operat ors
Arit hmet ic operat ors include plus (+), minus (-), mult iplier (*), divider (/), and percent (%). T he implicit
conversion rules are as follows:
Only t he ST RING, BIGINT , DECIMAL, and DOUBLE t ypes can be used in arit hmet ic operat ions.
Before an arit hmet ic operat ion, ST RING values are implicit ly convert ed t o DOUBLE values.
When an arit hmet ic operat ion involves values of bot h t he BIGINT and DOUBLE t ypes, BIGINT values are
implicit ly convert ed t o DOUBLE values.
T he DAT ET IME and BOOLEAN t ypes cannot be used in arit hmet ic operat ions.
Not e For more informat ion about arit hmet ic operat ors, see Arit hmet ic operat ors.
Not e For more informat ion about logical operat ors, see Logical operat ors.
In a call of a funct ion, if t he dat a t ype of an input paramet er is not consist ent wit h t he dat a t ype
defined in t he funct ion, t he dat a t ype of t he input paramet er is convert ed t o t he funct ion-defined
dat a t ype.
T he paramet ers of each built -in SQL funct ion on MaxComput e can have different requirement s for
implicit t ype conversion. For more informat ion, see Built -in funct ions.
If t he ret urned dat a t ypes are only bigint and double, t hey are convert ed t o t he double t ype.
If dat a of t he st ring t ype is also ret urned, all dat a t ypes are convert ed t o st ring. If a dat a t ype
cannot be convert ed t o st ring (for example, boolean), an error is ret urned.
Conversion bet ween ot her t ypes is not allowed.
Mo nt h mm 01–12
Day dd 01–28,29,30,31
Ho ur hh 00–23
Minut e mi 00–59
Seco nd ss 00–59
ms ff3 00–999
Not e
Leading zeros cannot be omit t ed. For example, 2017-1-9 12:12:12 is an invalid st ring and
cannot be convert ed int o dat et ime. It must be writ t en as 2017-01-09 12:12:12.
Only st rings t hat meet t he preceding format requirement s can be convert ed int o dat et ime.
For example, cast ("2017-12-31 02:34:34" as dat et ime) convert s t he "2017-12-31 02:34:34"
st ring int o dat et ime. Similarly, when dat et ime is convert ed int o st rings, t he default
conversion format is yyyy-mm-dd hh:mi:ss. If you at t empt t o convert t he following
examples (or similar st rings), t he operat ion will fail and cause an except ion.
MaxCompput e provides t he t o_dat e funct ion, which convert s a st ring t ype t hat does not meet t he
dat et ime format int o dat et ime t ype. For more informat ion, see T O_DAT E.
6.2. Operators
6.2.1. Relational operators
T his t opic describes relat ional operat ors in MaxComput e SQL operat ors.
Relational operators
Operator Description
Operator Description
'aab' = FALSE'a%b'like
'a\%b' = T RUE'axb'like
'a\%b' = FALSE
Double t ype values have variable precision. We recommend t hat you do not use t he equal sign (=) t o
compare t wo double t ype values. You can subt ract bet ween t wo values of t he double t ype, and t hen
t ake t he absolut e value of t he result for comparison. When t he absolut e value is negligible, t he t wo
values of t he double t ype are considered equal. For example:
Not e
ABS is a built -in funct ion provided by MaxComput e t o t ake t he absolut e value of it s input .
For more informat ion, see ABS.
A value of t he double t ype in MaxComput e can ret ain 16 valid digit s.
Arithmetic operators
Operator Description
If A or B is NULL, NULL is returned. Otherwise, the result of A / B is returned. If both A and B are
A / B
of the bigint type, the result is of the double type.
+A A is returned.
Not e
Only values of t he st ring, bigint , double, and decimal t ypes can be used in arit hmet ic
operat ions. Values of t he dat at ime and boolean t ypes are not allowed in t hese operat ions.
Before t he operat ion, values of t he st ring t ype are convert ed t o t he double t ype by implicit
t ype conversion.
When values of t he bigint and double t ypes are involved in an operat ion, values of t he
bigint t ype are convert ed t o t he double t ype by implicit t ype conversion first . T he ret urned
result is a value of t he double t ype.
When bot h A and B are of t he bigint t ype, t he ret urned result of A / B is a value of t he
double t ype. T he ret urned result s of t he ot her arit hmet ic operat ions are values of t he bigint
t ype.
Bitwise operators
Operator Description
Returns the bitwise AND result of A and B. For example, 1 & 2 returns 0 and 1 & 3 returns 1. T he
A & B bitwise AND result of NULL in combination with another value is always NULL. A and B must be
of the bigint type.
Returns the bitwise OR result of A and B. For example, 1 | 2 returns 3 and 1 | 3 returns 3. T he
A | B bitwise OR result of NULL in combination with another value is always NULL. A and B must be of
the bigint type.
Not ice Bit wise operat ors only support bigint t ype dat a and do not support implicit t ype
conversion.
Logical operators
Operator Description
Not e Only dat a of t he boolean t ype can be involved in logic operat ions. T hese operat ions
do not support implicit t ype conversion.
Not e
T able and column names are not case-sensit ive.
If you do not specify t he IF NOT EXIST S opt ion and anot her t able wit h t he same name exist s,
an error is ret urned. If you specify t his opt ion, a message t hat indicat es t he operat ion
succeeded is ret urned. T he message is ret urned regardless of whet her an exist ing t able wit h
t he same name exist s. T he message is ret urned even if t he schema of t he exist ing t able is
different from t hat of t he t able you want t o creat e. In addit ion, t he met adat a of t he
exist ing t able does not change.
A t able can cont ain a maximum of 1,200 column definit ions.
Support ed dat a t ypes are BIGINT , DOUBLE, BOOLEAN, DAT ET IME, DECIMAL, ST RING, ARRAY
<T >, and MAP <T 1, T 2>.
Not e If you need t o use t he following newly support ed dat a t ypes: T INYINT ,
SMALLINT , INT , FLOAT , VARCHAR, T IMEST AMP, or BINARY, you must add t he set odps.s
ql.type.system.odps2=true; flag before t he CREAT E T ABLE st at ement . T hen, commit
t hem for execut ion.
MaxComput e allows you t o specify t he default value of a column by using DEFAULT value. If
t he value of a column is not specified in an INSERT operat ion, t he default value is used for
t his column.
A t able or column name cannot cont ain special charact ers. It can cont ain only lowercase
let t ers, uppercase let t ers, digit s, or underscores (_). A name must st art wit h a let t er and can
be up t o 128 byt es in lengt h.
T he part it ioned by opt ion specifies t he part it ion field. T he value can only be a st ring. T he
name of a part it ion key column cannot cont ain double-byt e charact ers. It must st art wit h a
let t er, eit her in lowercase or uppercase, followed by let t ers or digit s. T he name can be up t o
128 byt es in lengt h. T he name can cont ain t he following special charact ers: ! _ : $ . # @ and
spaces. Ot her charact ers, such as \t , \n, and /, are considered undefined charact ers. Aft er
you use part it ion fields t o define part it ions for a t able, a full t able scan is no longer
t riggered when you add part it ions, updat e part it ion dat a, or read part it ion dat a. T his
improves processing efficiency.
A comment is a valid st ring t hat can be up t o 1,024 byt es in lengt h.
T he lifecycle opt ion indicat es t he lifecycle of t he t able in days. T he CREAT E T ABLE LIKE
st at ement does not replicat e t he lifecycle at t ribut e from t he source t able.
T heoret ically, a source t able can have up t o six levels of part it ions. Use as few part it ions as
possible t o avoid ext reme t able expansion of st orage.
You can configure t he maximum number of t able part it ions for a project . T he default
maximum number is 60,000.
ST ORED AS specifies t he st orage format of t he t able. T he default value is CFile2. AliORC in
C++ is now available. It is developed by t he MaxComput e st orage t eam. AliORC is fully
compat ible wit h t he open source Opt imized Row Columnar (ORC). Compared wit h CFile2,
AliORC frees up more t han 10% of ext ra st orage space and improves read performance by
more t han 20%.
Examples
T he following example describes how t o creat e a t able named sale_det ail t o st ore sales records. T he
sale_dat e and region columns of t he t able are used as part it ion key columns.
Use t he following create table…as select... st at ement t o creat e a t able and replicat e dat a t o it :
create table sale_detail_ctas1 as select * from sale_detail;
Not e If t he sale_det ail t able cont ains dat a, all t he dat a is replicat ed t o sale_det ail_ct as1.
T he sale_det ail t able is a part it ioned t able. However, t he t able creat ed by t he create table...as
select... st at ement does not replicat e t he part it ion at t ribut e of sale_det ail. Part it ion key
columns in sale_det ail become st andard columns in sale_det ail_ct as1. T herefore, sale_det ail_ct as1
is a non-part it ioned t able t hat has five columns.
In t he create table...as select... st at ement , if you use const ant s as column values in t he SELECT
clause, we recommend t hat you specify column aliases:
Not e
If you do not specify column aliases, t he fourt h and fift h columns of sale_det ail_ct as3 creat ed in
t he following example are aut omat ically named _c3 and _c4.
In t his case, t o reference sale_det ail_ct as3 again, you must enclose _c3 and _c4 in t wo pairs of
grave accent s ('). If you execut e t he select c3, _c4 from sale_det ail_ct as3 st at ement , an error is
ret urned. T he column name in a MaxComput e SQL st at ement cannot st art wit h underscores (_).
T herefore, grave accent s (') must be used. We recommend t hat you use aliases t o avoid t his issue.
T o ensure t hat t he dest inat ion t able has t he same schema as t he source t able, use t he following
create table...like st at ement :
Not e T he schema of sale_det ail_like is exact ly t he same as t hat of sale_det ail. Bot h t ables
have t he same at t ribut es, such as column names, column comment s, and t able comment s, except
for t he lifecycle. However, dat a in sale_det ail is not replicat ed t o sale_det ail_like.
MaxComput e allows you t o execut e t he DESC st at ement t o view t able informat ion.
desc <table_name>;
desc extended <table_name>;-- View table information and extended information.
MaxComput e allows you t o use t he SHOW CREAT E T ABLE st at ement t o generat e a DDL st at ement for
t able creat ion. T his facilit at es t he SQL-based rebuild of t he t able schema.
Command synt ax :
Not e If t he command is run wit hout t he IF EXIST S opt ion and t he t able does not exist , an
except ion is ret urned. Wit h t his opt ion, a success is ret urned regardless of whet her t he t able exist s.
Example :
Command synt ax :
Not e
T he rename operat ion only changes t he t able name, not t he t able dat a.
If t he t able specified by new_t able_name already exist s, an error is ret urned.
If t he t able specified by t able_name does not exist , an error is ret urned.
Example :
Command synt ax :
Not e
t able_name must be an exist ing t able.
A comment can cont ain a maximum of 1,024 byt es.
Example :
alter table sale_detail set comment 'new coments for table sale_detail';
You can run t he desc command t o view t he modified comment in t he t able. For more informat ion, see
Obt ain t able informat ion.
Command synt ax :
Not e
T he days paramet er indicat es t he lifecycle of a t able. Unit : days. It must be a posit ive
int eger.
If t he t able specified by t able_name is a non-part it ioned t able, and is not modified in t he
period specified by t he days paramet er since t he last modificat ion dat e, MaxComput e
aut omat ically clears t he t able (similar t o t he DROP T ABLE operat ion). In MaxComput e, t he
Last Dat aModifiedT ime value of a t able is updat ed each t ime dat a in t he t able is modified.
MaxComput e det ermines whet her t o clear a t able based on it s Last Dat aModifiedT ime and
lifecycle set t ings.
If t he t able specified by t able_name is a part it ioned t able, MaxComput e det ermines whet her
t o clear each part it ion based on t he Last Dat aModifiedT ime value. Unlike non-part it ioned
t ables, a part it ioned t able is not delet ed aft er t he last part it ion is reclaimed.
You can configure a lifecycle for t ables, but not for part it ions.
You can specify a lifecycle when creat ing a t able.
Example :
Synt ax
Not e
T ABLE DISABLE LIFECYCLE
It prevent s t he reclamat ion of a t able and it s part it ions based on t he lifecycle
feat ure. T his opt ion has a higher priorit y t han part it ion_spec enable lifecycle.
T he lifecycle set t ings and t he part it ion_spec enable/disable flag of a t able are
ret ained.
You can st ill modify t he lifecycle set t ings of a t able and it s part it ions.
Example
Command synt ax :
Not e
If t he specified t able_name does not exist , an error is ret urned.
T his operat ion modifies t he Last Dat aModifiedT ime value of t he t able. In t his case,
MaxComput e considers a change t o t he t able dat a, and recalculat es t he lifecycle.
For more informat ion about how t o modify t he Last Dat aModifiedT ime value of a part it ion, see Modify
t he Last Dat aModifiedT ime value of a part it ion.
Command synt ax :
Not e T his st at ement is used t o clear dat a from a specified non-part it ioned t able. T o clear
dat a from a part it ioned t able, run t he ALTER TABLE table_name DROP PARTITION
(partition_spec) st at ement .
If a project does not have enough space, you can use t he t able archiving feat ure in MaxComput e t o
compress dat a by about 50%. T he archiving feat ure uses a compression algorit hm wit h a higher
compression rat io. It saves dat a as redundant array of independent disks (RAID) files. Dat a is no longer
simply st ored in t hree copies. Inst ead, six copies and t hree check blocks are maint ained t o increase t he
effect ive st orage rat io from 1:3 t o 1:1.5. T he archive feat ure consumes only half of t he usual physical
space.
However, t his feat ure comes at a price. If a dat a block or machine is damaged, t he t ime required t o
rest ore t he dat a is longer, and t he read performance is affect ed. T herefore, t his feat ure is suit able for
compressing cold dat a for st orage. For example, you can st ore large volumes out dat ed log dat a as
RAID files for a long t ime.
Command synt ax :
Example :
Summary:
table name: test0128 /pt=a instance count: 1 run time: 21
before merge, file count: 1 file size: 456 file physical size: 1368
after merge, file count: 1 file size: 512 file physical size: 768
Not e
T he out put shows t he changes in logical size and physical size during t he archiving process. In t he
archiving process, mult iple small files are aut omat ically merged. Aft er t he archive operat ion is
complet e, you can run t he desc extended command t o check whet her t he dat a in t he part it ion
has been archived, and view t he physical space usage:
Command synt ax :
Example :
Command synt ax :
Not e
T o creat e a view, you must have read permissions on t he t able referenced by t he view.
Views in MaxComput e are not mat erialized views. View operat ions involve accessing dat a of
referenced t ables. Not e t hat changes t o your permission on t he referenced t able can result
in changes t o your permission on t he view.
A view can cont ain only one valid SELECT st at ement .
A view can reference ot her views but cannot reference it self. Circular reference is not
support ed.
You cannot writ e dat a t o a view. For example, t he INSERT INT O and INSERT OVERWRIT E
operat ions do not work on views.
If t he t able referenced by a view changes, you may no longer be able t o access t he view. For
example, a view becomes inaccessible aft er t he t able it references is delet ed. You must
maint ain t he mappings bet ween referenced t ables and views properly.
If t he CREAT E VIEW st at ement is run wit hout t he IF NOT EXIST S opt ion and t he view already
exist s, an except ion is ret urned. In t his case, you can run t he CREAT E VIEW or REPLACE VIEW
st at ement t o recreat e a view. T he permissions on t he recreat ed view remain unchanged.
Example :
Command synt ax :
Not e If t he command is run wit hout t he IF EXIST S opt ion and t he view does not exist , an error
is ret urned.
Example :
Command synt ax :
Not e If a view wit h t he same name already exist s, an error is ret urned.
Example :
Synt ax
alter table table_name add [if not exists] partition partition_spec;-- Add a partition.
alter table table_name add [if not exists] partition partition_spec [PARTITION partition_sp
ec PARTITION partition_spec...];-- Add multiple partitions at a time.
partition_spec:(partition_col1 = partition_col_value1, partition_col2 = partiton_col_value2
, ...)
Not e
If you do not specify t he IF NOT EXIST S opt ion and anot her part it ion wit h t he same name
exist s, an error is ret urned.
A MaxComput e t able can cont ain a maximum of 60,000 part it ions.
T o add a part it ion t o a t able t hat has mult i-level part it ions, you must specify all part it ioning
column values.
Examples
T he following examples show how t o add part it ions t o t he sale_det ail t able:
Synt ax
alter table table_name drop [if exists] PARTITION partition_spec; -- Delete a partition.
alter table table_name drop [if exists] PARTITION partition_spec,PARTITION partition_spec,[
PARTITION partition_spec....] ;-- Delete multiple partitions at a time.
partition_spec:: (partition_col1 = partition_col_value1, partition_col2 = partiton_col_valu
e2, ...)
Not e If you do not specify t he IF EXIST S opt ion and t he part it ion you want t o delet e does
not exist , an error is ret urned.
Example
Execut e t he following st at ement t o delet e a part it ion from t he sale_det ail t able:
Command synt ax :
Not e
A column can only be one of t he following t ypes: bigint , double, boolean, dat et ime,
decimal, st ring, t inyint , smallint , int , float , varchar, binary, t imest amp, array, map, or st ruct .
You can creat e up t o 1,200 columns in a single t able in MaxComput e.
Command synt ax :
Not e
You must specify an exist ing column for old_col_name.
You cannot name a column in t he t able new_col_name.
Command synt ax :
Not e
T he comment cannot exceed 1,024 byt es.
T he dat a t ype and posit ion of a column cannot be changed.
Command synt ax :
Not e
If t he specified t able_name or part it ion_col does not exist , an error is ret urned.
If t he specified part it ion_col_value does not exist , an error is ret urned.
T his operat ion modifies t he Last Dat aModifiedT ime value of t he t able. In t his case,
MaxComput e considers a change t o t he t able or part it ion value, and recalculat es t he
lifecycle.
For more informat ion about how t o modify t he Last Dat aModifiedT ime value of a t able, see Modify t he
Last Dat aModifiedT ime value of a t able.
Command synt ax :
Not e
T his command cannot modify t he names of part it ion columns. It can only modify t he values
of t he columns.
T o modify t he values in one or more part it ions in t he case of mult i-level part it ions, you must
specify values of part it ions at each level.
Synt ax
Not e
If you do not specify t he IF EXIST S opt ion and t he part it ion you want t o merge does not
exist , an error is ret urned.
If you specify t he IF EXIST S opt ion but no part it ions meet t he merge condit ions, no new
part it ions are generat ed.
If source dat a is concurrent ly modified by operat ions such as INSERT , RENAME, or DROP when
you execut e t he preceding st at ement , an error is ret urned even t hough you have specified
t he IF EXIST S opt ion.
If t he PURGE at t ribut e is specified, merged part it ions cannot be rest ored by using t he
Kunlunjing.
Ext ernal t ables, shard t ables, and t ables wit h ext reme st orage are not support ed. Xlib or Algo t ables
t hat depend on t he file order are not support ed. If you merge part it ions of a clust ered t able, t he
clust ered at t ribut e is removed from t he part it ions.
Hash operat ions are performed by Cat alogServer on t ables t o merge part it ions. A capacit y limit is
imposed on merged part it ions. A hard link in t he Apsara Dist ribut ed File Syst em can have a maximum
of seven replicas.
You can merge a maximum of 4,000 part it ions at a t ime.
T he number of part it ions t hat can wait on Cat alogServer t o be merged is 10 million.
If an error t hat indicat es Cat alogServer is busy occurs, t ry again lat er.
If a hard link in t he Apsara Dist ribut ed File Syst em is fault y, purge t he recycle bin and t hen t ry again.
Example
T he following code shows t he part it ions and dat a of t he t b_t est t able:
Execut e t he following st at ement t o merge all part it ions t hat meet t he hh='00' condit ion int o t he
ds=20181101/hh=00/mm=00 part it ion:
Execut e t he following st at ement t o view t he part it ions of t he t able aft er t hey are merged:
Dat a in t wo part it ions t hat meet t he hh='00' condit ion is merged int o t he ds=20181101/hh=00/mm=00
part it ion.
When you merge part it ions, you can specify mult iple predicat e condit ions. For example, you can
execut e t he following st at ement t o merge all t he part it ions t hat remain t o t he
ds=20181101/hh=00/mm=00 part it ion:
T he INSERT OVERWRIT E and INSERT INT O st at ement s are commonly used for dat a processing in
MaxComput e SQL. T hey are used t o save t he comput ing result s in t he t arget t able for t he next
comput ing. T he INSERT INT O st at ement adds dat a t o a t able or part it ion. T he INSERT OVERWRIT E
st at ement clears t he original dat a before insert ing dat a t o a t able or part it ion.
Command synt ax :
Example :
Not e When dat a is updat ed using an INSERT operat ion, t he mapping bet ween t he source
and t arget t ables depends on t he column sequence in t he SELECT clause, inst ead of t he mapping
of column names bet ween bot h t ables.
When dat a is insert ed int o a part it ioned t able, t he part it ion columns cannot appear in t he SELECT list .
MaxComput e SQL allows you t o insert dat a t o different result t ables or part it ions by using one SQL
st at ement .
Command synt ax :
from from_statement
insert overwrite | into table tablename1 [partition (partcol1=val1, partcol2=val2 ...)] sel
ect_statement1
[insert overwrite | into table tablename2 [partition ...] select_statement2]
Not e
A SQL st at ement t ypically support s up t o 256 out put s. A synt ax error is ret urned if more
t han 256 out put s are specified.
In a MULT I INSERT st at ement , you can specify a t arget part it ion in a part it ioned t able or
specify a non-part it ioned t able only once.
T he INSERT OVERWRIT E and INSERT INT O operat ions cannot be performed simult aneously on
different part it ions in a part it ioned t able. Ot herwise, an error is ret urned.
Example :
When you run t he INSERT OVERWRIT E st at ement on a part it ioned t able, you can specify t he part it ion
values in t he st at ement . Anot her flexible met hod is t o specify part it ion column names inst ead of
set t ing part it ion values. In t he meant ime, specify t he part it ion values in t he corresponding columns of a
SELECT clause.
Command synt ax :
insert overwrite table tablename partition (partcol1, partcol2 ...) select_statement from f
rom_statement;
Not e
When you run a SQL dynamic part it ion st at ement in a dist ribut ed environment , a single
process can out put up t o 512 dynamic part it ions. If t he number of dynamic part it ions
exceeds t his limit , an except ion is ret urned.
Current ly, a SQL dynamic part it ion st at ement can generat e up t o 2,000 dynamic part it ions. If
t he number of dynamic part it ions exceeds t his limit , an except ion is ret urned.
T he dynamic part it ion values cannot be NULL. Ot herwise, an except ion is ret urned.
If a t arget t able has mult i-level part it ions, you can specify some part it ions as st at ic
part it ions in an INSERT st at ement . However, t he st at ic part it ions must be high-level
part it ions.
Example :
create table total_revenues (revenue bigint) partitioned by (region string); insert overwri
te table total_revenues partition(region)
select total_price as revenue, region from sale_detail;
Not e In t he preceding example, you do not know which part it ions are generat ed before
running t he SQL st at ement . T he part it ions generat ed are det ermined by t he value of t he region
field aft er t he execut ion of t he SELECT st at ement . T his is why t he part it ions are called dynamic
part it ions.
Ot her examples:
Synt ax
T he SELECT st at ement reads dat a from a t able. You can specify t he names of t he columns you want
t o read or use an ast erisk (*) t o represent all columns.
Examples
Not e T he SELECT st at ement can only ret urn a maximum of 1,000 rows of result s. However,
no such limit s are imposed when SELECT is used as a clause. If SELECT is used as a clause, t he
clause ret urns all result s in response t o t he query from t he upper layer. T o obt ain more t han
1,000 rows of result s by using t he SELECT st at ement , you must use T unnel t o download t he
ent ire t able or a t emporary t able ret urned by t he SELECT operat ion. For more informat ion, see
MaxComput e T unnel.
Examples
Filter conditions
like, rlike /
You can specify part it ions in t he WHERE clause of t he SELECT st at ement t o avoid a full t able scan.
Examples
Not ice T o check whet her part it ion pruning t akes effect , execut e t he EXPLAIN SELECT
st at ement . A common user-defined funct ion (UDF) or t he met hod t hat is used t o specify
part it ion condit ions in a JOIN operat ion can cause part it ion pruning t o fail t o t ake effect .
UDFs support part it ion pruning. T hese UDFs are execut ed as small jobs and t hen replaced wit h t he
execut ion result s.
@com.aliyun.odps.udf.annotation.UdfProperty(isDeterministic=true)
Add t he set odps.sql.udf.ppr.deterministic = true; flag before SQL st at ement s. T hen, all
UDFs in t he SQL st at ement s are considered det erminist ic.
Not e T his met hod is used wit h limit s. T his met hod backfills part it ions wit h execut ion
result s. A maximum of 1,000 part it ions can be backfilled. If an annot at ion is added t o t he UDF
class, an error t hat indicat es more t han 1,000 part it ions are backfilled may be ret urned. If you
want t o ignore t he error, add t he set odps.sql.udf.ppr.to.subquery = false; flag t o
disable t his feat ure globally. Aft er t his feat ure is disabled, UDF-based part it ion pruning
becomes invalid.
T he WHERE clause in an SQL st at ement can include t he BET WEEN...AND condit ion. Example:
Not e T he number of condit ions t hat can be specified in t he WHERE clause cannot exceed
256.
Examples
DIST INCT : If duplicat e rows exist , add DIST INCT before t he field t o remove duplicat e values. In t his
case, only one value is ret urned. If you use ALL, all duplicat e values are ret urned. If you do not specify
t he DIST INCT opt ion, t he st at ement ret urns all duplicat e values, same as t he result obt ained by using
t he ALL opt ion.
Examples
GROUP BY: T his clause is used t o perform group-based queries. In most cases, t his clause is used wit h
aggregat e funct ions. If a SELECT st at ement includes aggregat e funct ions, t he key of t he GROUP BY
clause can be t he names of columns in t he input t able or an expression composed of input t able
columns. T he key cannot be t he aliases of t he columns in t he out put t able of t he SELECT operat ion.
-- The columns in the sale_detail table are in the format of key-value pairs.
select region, sum(total_price) from sale_detail group by 1;
-- Equivalent to the following statement:
select region, sum(total_price) from sale_detail group by region;
Examples
Not e T he GROUP BY operat ion is performed before t he SELECT operat ion during t he
parsing of SQL st at ement s. T herefore, GROUP BY uses only t he column names or expressions of
t he input t able as keys. For more informat ion about aggregat e funct ions, see Aggregat e
funct ions.
ORDER BY: T his clause is used for global sort ing based on specific columns. T o sort records in
descending order, use t he DESC keyword. T he ORDER BY clause must be used wit h t he LIMIT clause
because records are globally sort ed. In an ORDER BY operat ion, NULL is considered t he lowest of all
values. T his rule is consist ent wit h MySQL, but is different from Oracle. Different from t he GROUP BY
clause, t he columns in t he ORDER BY clause must be t he aliases of t he columns in t he SELECT
operat ion. If you want t o query a column but t he column alias is not specified in t he SELECT
operat ion, t he column name is used as t he column alias.
-- The columns in the sale_detail table are in the format of key-value pairs.
select region, sum(total_price) from sale_detail order by 2 limit 100;
-- Equivalent to the following statement:
select region, sum(total_price) from sale_detail order by sum(total_price) limit 100;
Examples
Not e T he number in t he LIMIT clause is a const ant t hat limit s t he number of out put rows. If
a SELECT st at ement is execut ed wit hout t he LIMIT clause, it can ret urn a maximum of 5,000 rows.
T he screen display limit may vary wit h project s and can be configured in t he console.
T he OFFSET clause can be used wit h t he ORDER BY LIMIT clause t o skip t he number of rows specified
by OFFSET .
Examples
DIST RIBUT E BY: T his clause is used t o shard dat a based on hash values of specific columns. T he
DIST RIBUT E BY clause must be followed by t he alias of an out put column from t he SELECT operat ion.
Examples
SORT BY: T his clause is used for part ial sort ing. T he DIST RIBUT E BY clause must be placed before t he
SORT BY clause. In pract ice, t he SORT BY clause is used t o part ially sort t he result s of t he DIST RIBUT E
BY clause. T he SORT BY clause must be followed by t he alias of an out put column from t he SELECT
operat ion.
Examples
select region from sale_detail distribute by region sort by region; select region as r fr
om sale_detail sort by region;
-- An error is returned because the SORT BY clause does not follow a DISTRIBUTE BY clause
.
T he ORDER BY and GROUP BY clauses cannot be used wit h t he DIST RIBUT E BY and SORT BY clauses.
T he ORDER BY and GROUP BY clauses must be followed by t he alias of an out put column from t he
SELECT operat ion.
Not e
T he key of t he ORDER BY, SORT BY, or DIST RIBUT E BY clause must be t he alias of an out put
column from t he SELECT operat ion.
T he SELECT operat ion is performed before t he ORDER BY, SORT BY, and DIST RIBUT E BY
clauses during t he parsing of SQL st at ement s. T herefore, only t he aliases of out put
columns from t he SELECT operat ion can be used as keys.
6.4.2.2. Subquery
T his t opic describes how t o use t he SELECT st at ement for subquery operat ions.
A common SELECT st at ement reads dat a from mult iple t ables, for example, select column_1, column_2
... from t able_name. T he query object can be anot her SELECT operat ion, which is a subquery.
Command synt ax :
Example :
Not e In a FROM clause, a subquery can be used as a t able, which support s a JOIN operat ion
wit h ot her t ables or subqueries.
Synt ax
Not e T he UNION ALL clause is used t o combine t wo or more dat aset s ret urned from a SELECT
operat ion int o one dat aset . If duplicat e rows exist in t he result s, all rows t hat meet t he condit ion
are ret urned, wit h duplicat e rows ret ained.
MaxComput e SQL does not support t he combinat ion of t wo t op-level query result s. T o combine t hem,
rewrit e t hem int o a subquery.
select * from (
select * from sale_detail where region = 'hangzhou' union all
select * from sale_detail where region = 'shanghai') t;
T he synt ax t hat uses a pair of parent heses t o specify t he priorit y of UNION ALL is support ed.
Example:
SELECT * FROM src UNION ALL (SELECT * FROM src2 UNION ALL SELECT * FROM src3);
-- Execute the UNION ALL clause for the src2 and src3 tables. Then, execute the UNION ALL c
lause for the src table based on the obtained result.
Not ice
For a UNION ALL operat ion, all subqueries must have t he same number of columns, column
names, and column t ypes. If t he column names are inconsist ent , use column aliases.
In most cases, MaxComput e allows a UNION ALL operat ion for a maximum of 256 subqueries.
If t he limit is exceeded, a synt ax error is ret urned.
MaxComput e support s mult iple JOIN operat ions in an SQL st at ement . JOIN does not support Cart esian
product s (JOIN wit hout an ON clause).
Synt ax
join_table:
table_reference join table_factor [join_condition]
| table_reference {left outer|right outer|full outer|inner} join table_reference join_condi
tion
table_reference: table_factor
join_table
table_factor: tbl_name [alias]
table_subquery alias
( table_references )
join_condition:
on equality_expression ( and equality_expression )*
T ake not e of t he following point s when you perform a JOIN operat ion:
LEFT OUT ER JOIN: ret urns all rows in t he left t able, such as shop in t he following example. T he
ret urned rows include t he rows t hat do not mat ch any rows in t he right t able, such as sale_det ail in
t he following example.
Example
select a.shop_name as ashop, b.shop_name as bshop from shop a left outer join sale_detail
b on a.shop_name=b.shop_name;
-- Both the shop and sale_detail tables have the shop_name column. You must use aliases t
o distinguish the columns in the SELECT operation.
RIGHT OUT ER JOIN: ret urns all rows in t he right t able, such as sale_det ail in t he following example. T he
ret urned rows include t he rows t hat do not mat ch any rows in t he left t able, such as shop in t he
following example.
Example
select a.shop_name as ashop, b.shop_name as bshop from shop a right outer join sale_detai
l b on a.shop_name=b.shop_name;
-- Both the shop and sale_detail tables have the shop_name column. You must use aliases t
o distinguish the columns in the SELECT operation.
FULL OUT ER JOIN: ret urns all rows in bot h t he left and right t ables.
Example
select a.shop_name as ashop, b.shop_name as bshop from shop a full outer join sale_detail
b on a.shop_name=b.shop_name;
INNER JOIN: only ret urns t he rows in which t wo t ables can be mapped. T he INNER keyword can be
omit t ed.
Example
Join condit ion: You must use equi-joins and combine condit ions by using AND. A maximum of 128 JOIN
operat ions are support ed in an SQL st at ement . You can use non-equi joins or combine condit ions by
using OR in a MAPJOIN operat ion.
Example
select a.* from shop a full outer join sale_detail b on a.shop_name=b.shop_name full oute
r join sale_detail c on a.shop_name=c.shop_name;
-- A maximum of 128 JOIN operations are supported in an SQL statement.
select a.* from shop a join sale_detail b on a.shop_name <> b.shop_name;
-- An error is returned because MaxCompute does not support non-equi joins.
NAT URAL JOIN: In a NAT URAL JOIN operat ion, t he condit ions used t o join t wo t ables are aut omat ically
det ermined based on t he common fields in t he t wo t ables. MaxComput e support s OUT ER NAT URAL
JOIN. You can use t he USING clause so t hat t he JOIN operat ion ret urns common fields only once.
Example
-- To join the src table that contains the key1, key2, a1, and a2 columns and the src2 ta
ble that contains the key1, key2, b1, and b2 columns, execute the following statement:
SELECT * FROM src NATURAL JOIN src2;
-- Both the src and src2 tables include the key1 and key2 fields. In this case, the prece
ding statement is equivalent to the following statement:
SELECT src.key1 as key1, src.key2 as key2, src.a1, src.a2, src2.b1, src2.b2 FROM src INNE
R JOIN src2 ON src.key1 = src2.key1 AND src.key2 = src2.key2;
T he synt ax t hat uses a pair of parent heses t o specify t he priorit ies of JOIN operat ions is support ed.
Example
When t he volume of dat a is small, MAPJOIN accelerat es t he execut ion process by using SQL t o load all
t he specified small t ables int o t he program memory t hrough t he JOIN operat ion.
Example
Not ice
T he limit here refers t o t he original size of dat a. If you run t he desc command t o obt ain t he
compressed size, you must mult iply it by t he compression rat io.
In MaxComput e SQL, you cannot use non-equi joins or t he OR logic in t he ON condit ion. However, you
can do t his in MAPJOIN. Example:
MaxComput e SQL provides t he EXPLAIN operat ion, which displays t he descript ion of t he ult imat e
execut ion plan st ruct ure of DML st at ement s. An execut ion plan is t he program t hat is ult imat ely used t o
execut e SQL semant ics.
Command synt ax :
EXPLAIN <DMLquery>;
Not e
Example :
EXPLAIN
SELECT abs(a.key), b.value FROM src a JOIN src1 b ON a.value = b.value;
Not e Because t his query only needs one job (job0), only one line of informat ion is needed.
In Job job0:
root Tasks: M1_Stg1, M2_Stg1
J3_1_2_Stg1 depends on: M1_Stg1, M2_Stg1
Not e
Job0 cont ains t hree t asks, among which M1_St g1 and M2_St g1 are execut ed first , and J3_1_2_St g1
is execut ed aft er t he first t wo t asks are finished.
Naming rules for t asks: MaxComput e provides four t ask t ypes: MapT ask, ReduceT ask, JoinT ask, and
LocalWork. T he first let t er of a t ask name indicat es t he t ype of t he current t ask (for example,
M2St g1 is a MapT ask). T he number immediat ely following t he first let t er represent s t he current
T ask ID, which is unique among all t asks in t he current query. T he numbers separat ed by
underscores (_) represent t he immediat e dependencies of t he current t ask. For example,
J3_1_2_St g1 means t hat t he current t ask (ID 3) is dependent on t asks wit h ID 1 and ID 2.
T he t hird part is t he operat or st ruct ure in t he t asks, where each operat or st ring describes t he
execut ion semant ics of a t ask.
In Task M1_Stg1:
Data source: yudi_2.src #### "Data source" describes the input content of the current tas
k TS: alias: a #### TableScanOperator
RS: order: + #### ReduceSinkOperator keys:
a.value values:
a.key partitions:
a.value
In Task J3_1_2_Stg1:
JOIN: a INNER JOIN b #### JoinOperator
SEL: Abs(UDFToDouble(a._col0)), b._col5 #### SelectOperator FS: output: None #### FileSin
kOperator
In Task M2_Stg1:
Data source: yudi_2.src1 TS: alias: b
RS: order: + keys:
b.value values:
b.value partitions:
b.value
O perators
Operator Description
Operator Description
Not e
If a query is complex and has t oo many EXPLAIN result s, t he API rest rict ion is t riggered, and
incomplet e result s are displayed. In t his case, t he query can be split , and t he EXPLAIN
operat ion can be performed on each part t o show t he st ruct ure of t he job.
T he maximum number of part it ions in a query is 10,000. Input t ing t oo many part it ions
leads t o over-lengt h Dat a source cont ent . T o circumvent t his limit , you can filt er out most
part it ions by adding a query filt er.
Not ice Many examples in t his t opic are demonst rat ed using MaxComput e St udio. We
recommend t hat you inst all MaxComput e St udio before you proceed wit h subsequent operat ions.
6.4.6.2. Example
T he following example is for your reference.
1. Prepare dat a.
Not e You can also execut e mult iple SELECT st at ement s t o obt ain t he same result .
Not ice Expressions not used in GROUPING SET S use NULL as placeholders. You can execut e
UNION st at ement s on grouping set s.
Example
GROUP BY CUBE(a, b, c)
-- Equivalent to the following statement:
GROUPING SETS((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),())
GROUP BY ROLLUP(a, b, c)
-- Equivalent to the following statement:
GROUPING SETS((a,b,c),(a,b),(a), ())
GROUP BY CUBE ( (a, b), (c, d) )
-- Equivalent to the following statement:
GROUPING SETS (
( a, b, c, d ),
( a, b ),
( c, d ),
( )
)
GROUP BY ROLLUP ( a, (b, c), d )
-- Equivalent to the following statement:
GROUPING SETS (
( a, b, c, d ),
( a, b, c ),
( a ),
( )
)
GROUP BY a, CUBE (b, c), GROUPING SETS ((d), (e))
-- Equivalent to the following statement:
GROUP BY GROUPING SETS (
(a, b, c, d), (a, b, c, e),
(a, b, d), (a, b, e),
(a, c, d), (a, c, e),
(a, d), (a, e)
)
GROUP BY grouping sets((b), (c),rollup(a,b,c))
-- Equivalent to the following statement:
GROUP BY GROUPING SETS (
(b), (c),
(a,b,c), (a,b), (a), ()
)
GROUPING allows you t o specify t he name of a column as a paramet er. If t he specified lines are
aggregat ed based on a column whose name is used as a paramet er in t his funct ion, 0 is ret urned,
indicat ing t hat NULL is an ent ered value. Ot herwise, 1 is ret urned, indicat ing t hat NULL is a placeholder.
GROUPING_ID can be used t o specify t he names of one or more columns as paramet ers. T he GROUPING
result s in t hese columns are formed int o int egers by using Bit Map.
Example:
6.4.7. IF statement
MaxComput e SQL support s t he IF-ELSE st at ement .
You can use t he IF-ELSE st at ement t o execut e SQL script s wit h specific condit ions. T he condit ion in t he
IF-ELSE st at ement can be a st andard variable or a scalar subquery t hat ret urns only one column value
from one row.
T he IF st at ement allows t he syst em t o aut omat ically select t he execut ion logic based on t he specified
condit ions. MaxComput e support s t he following IF synt ax:
IF (condition) BEGIN
statement 1
statement 2
...
END
IF (condition) BEGIN
statements
END ELSE IF (condition2) BEGIN
statements
END ELSE BEGIN
statements
END
Not e T he BEGIN and END condit ional clause can be omit t ed because it cont ains only one
st at ement , similar t o '{ }' in Java.
T he IF st at ement can cont ain t wo t ypes of condit ions: expressions and scalar subqueries. Bot h of t hem
are of t he BOOLEAN t ype.
Expressions: A BOOLEAN-t ype expression in t he IF-ELSE st at ement det ermines which branch is
execut ed at t he compiling st age. Example:
@date := '20190101';
@row TABLE(id STRING); -- Declare the row variable. The type of the row is Table and sche
ma is STRING.
IF ( cast(@date as bigint) % 2 == 0 ) BEGIN
@row := SELECT id from src1;
END ELSE BEGIN
@row := SELECT id from src2;
END
INSERT OVERWRITE TABLE dest SELECT * FROM @row;
Scalar subqueries: A BOOLEAN-t ype scalar subquery in t he IF-ELSE st at ement det ermines which
branch is execut ed at t he running st age. T herefore, you must submit mult iple jobs. Example:
@i bigint;
@t table(id bigint, value bigint);
IF ((SELECT count(*) FROM src WHERE a = '5') > 1) BEGIN
@i := 1;
@t := select @i, @i*2;
END ELSE
BEGIN
@i := 2;
@t := select @i, @i*2;
END
select id, value from @t;
Descript ion:
SELECT T RANSFORM: T he SELECT T RANSFORM keyword can be replaced wit h t he MAP or REDUCE
keyword while maint aining t he same semant ic meaning. However, we recommend t hat you
useSELECT T RANSFORM because it s synt ax is simpler.
(arg1, arg2 ...): argument s in t he T RANSFORM clause. T heir format is similar t o t hose of it ems in t he
SELECT clause. In t he default format , t he result s of expressions for each argument are combined by
using \t aft er t hey are implicit ly convert ed int o st rings. T he argument s are t hen ent ered int o t he
specified child process.
Not e T he default format is configurable. For more informat ion, see ROW FORMAT .
USING: specifies t he command used t o st art a child process. Not e t he following point s about t he
USING clause.
In most MaxComput e SQL st at ement s, t he USING clause can only specify resources. However, in t he
SELECT T RANSFORM st at ement , t he USING clause can specify commands t o ensure compat ibilit y
wit h Hive synt ax.
T he format of t he USING clause is similar t o t he synt ax of a Shell script . However, a Shell script is
not act ually expect ed t o st art t he child process. T he child process is creat ed based on t he
command input . Because of t his, a number of Shell funct ions, such as input and out put redirect ion,
pipe, and loop, are unavailable. A Shell script can be used as t o st art a child process if necessary.
RESOURCES: specifies t he resources t hat t he specified child process can access. You can use one of
t he following met hods t o specify resources:
Use t he RESOURCES clause. Example: using ‘sh foo.sh bar.txt’ Resources ‘foo.sh’,’bar.txt’
.
Add t he set odps.sql.session.resources=foo.sh,bar.txt; clause before SQL st at ement s.
Not ice T his clause t akes effect globally once it is specified. All SELECT T RANSFORM
st at ement s will be able t o access t he resources specified by t his clause.
ROW FORMAT : specifies t he input or out put format . T wo ROW FORMAT clauses are used in t he
synt ax: t he first one specifies t he input format , and t he second one specifies t he out put format . \t
is used t o separat e columns, \n is used t o separat e rows, and NULL is represent ed by \N .
Not ice
For field_delimit er, charact er_escape, and line_separat or, only one charact er can be
accept ed. If you specify a st ring, t he first charact er in t he st ring t akes priorit y over t he
ot hers.
T here are a variet y of Hive synt axes t o specify format s. MaxComput e support s synt axes
such as input RecordReader, out put RecordReader, and Serdeinput . T o use t hese format s,
you must enable Hive compat ibilit y by adding t he set odps.sql.hive.compatible=true;
clause before SQL st at ement s. If you specify a synt ax such as input RecordReader or
out put RecordReader support ed by Hive, st at ement s may be execut ed at lower speeds.
Not e
You can specify dat a t ypes in t he AS clause, as in as(col1:bigint , col2:boolean). By
default , st rings are ret urned if you do not specify dat a t ypes, as in as(col1, col2).
T he out put is obt ained by parsing t he st dout of t he child process. If t he specified dat a
t ypes do not include ST RING, t he syst em implicit ly calls t he CAST funct ion. Runt ime
except ions may occur when t he CAST funct ion is called.
You cannot specify dat a t ypes for only some of t he columns, as in as(col1, col2:bigint ).
If you skip t he AS clause, t he field preceding t he first \t in t he st dout is a key, and all t he
following part s are a value. T his is equivalent t o as(key, value).
Not e In addit ion t o language ext ensions, SELECT T RANSFORM also provides simple feat ures
of AWK, Pyt hon, Perl, and Shell t o compile script s in commands. You do not need t o compile script
files or upload resources separat ely.
You can upload script files for complex cases, as in t he following example Pyt hon script call.
1. Compile a Pyt hon script file. In t his example, t he file name is myplus.py.
#! /usr/bin/env python
import sys
line = sys.stdin.readline()
while line:
token = line.split('\t')
if (token[0] == '\\N') or (token[1] == '\\N'):
print '\\N'
else:
print int(token[0]) + int(token[1])
line = sys.stdin.readline()
Not e You can also add resources from t he Dat aWorks console.
+-----+
| cnt |
+-----+
| 5 |
| 7 |
| 9 |
+-----+
Pyt hon script s are not subject t o any format requirement s and do not require a Pyt hon framework t o be
run in MaxComput e. In MaxComput e, Pyt hon commands can be used as t he input of t he T RANSFORM
clause. For example, you can call Shell script s by running Pyt hon commands.
1. Compile a Java script file and export it as a JAR package. In t his example, t he name of t he JAR
package is Sum.jar.
package com.aliyun.odps.test;
import java.util.Scanner;
public class Sum {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
while (sc.hasNext()) {
String s = sc.nextLine();
String[] tokens = s.split("\t");
if (tokens.length < 2) {
throw new RuntimeException("illegal input");
}
if (tokens[0].equals("\\N") || tokens[1].equals("\\N")) {
System.out.println("\\N");
}
System.out.println(Long.parseLong(tokens[0]) + Long.parseLong(tokens[1]));
}
}
}
+-----+
| cnt |
+-----+
| 5 |
| 7 |
| 9 |
+-----+
You can use t he preceding met hod t o run most Java ut ilit ies.
Alt hough UDT F frameworks are provided for Java and Pyt hon, it is easier t o compile code by using
SELECT T RANSFORM. SELECT T RANSFORM is a simpler process because it is not subject t o any format
requirement s and can be called offline. T he pat hs for Java and Pyt hon offline script s can be obt ained
from t he JAVA_HOME and PYT HON_HOME environment variables.
Not ice PHP and Ruby are not deployed in t he MaxComput e clust er and cannot be called.
More oft en, you can use eit her t he map or reduce keywords t o produce t he same result s.
T he advant ages of UDT Fs and SELECT T RANSFORM are list ed in t he following sect ions.
Advantages of UDTFs
Out put and input follow specified dat a t ypes and do not require conversion.
Processes are not suspended if t he operat ing syst em pipe is empt y or fully occupied. T he operat ing
syst em pipe has a 4 KB buffer.
Const ant paramet ers do not need t o be t ransmit t ed.
Purpose: It is used t o ret urn t he union of t wo dat a set s, t he int ersect ion of t wo dat a set s,
or t he complement of t he second dat aset in t he f irst dat aset .
Descript ion:
UNION: ret urns t he union of t wo dat aset s. It combines t he t wo dat aset s int o one dat aset .
INT ERSECT : ret urns t he int ersect ion of t wo dat aset s. It out put s t he records cont ained in bot h
dat aset s.
EXCEPT : ret urns t he complement of t he second dat aset in t he first dat aset . It out put s t he records
t hat are cont ained in t he first dat aset , but not in t he second dat aset .
MINUS: equivalent t o EXCEPT .
Examples:
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 4 |
| 1 | 2 |
| 1 | 2 |
| 3 | 4 |
+------------+------------+
Ret urned result : equivalent t o SELECT DISTINCT * FROM (< the result of UNOIN ALL >) t; .
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 4 |
| 3 | 4 |
+------------+------------+
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b)
INTERSECT ALL
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);
Ret urned result : deduplicat ion is skipped in INT ERSECT ALL. It seems t hat t here is a hidden serial
number behind t he same row and each row can be displayed separat ely.
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 2 |
| 3 | 4 |
+------------+------------+
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b)
INTERSECT
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);
Ret urned result : SELECT DISTINCT * FROM (< the result of INTERSECT ALL >) t; .
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 3 | 4 |
+------------+------------+
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b)
EXCEPT ALL
SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
Ret urned result : deduplicat ion is skipped in EXCEPT ALL. T here is a hidden serial number behind t he
same row and each row can be displayed separat ely.
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 2 |
| 3 | 4 |
| 7 | 8 |
+------------+------------+
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b)
EXCEPT
SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
Ret urned result : equivalent t o Select distinct * FROM left_branch limit t all select distinct
* FROM right_branch; .
+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 7 | 8 |
+------------+------------+
Not e
Sort ing may be skipped in t he preceding operat ions.
T he left and right branches in t he preceding operat ions must have t he same number of
columns. In addit ion, if dat a t ypes in t he left and right branches are not consist ent , t hey may
be implicit ly convert ed. Due t o compat ibilit y issues, implicit conversion is not carried out
bet ween ST RING and no-ST RING t ypes for t he preceding operat ions.
Up t o 256 branches are allowed in t he preceding operat ions. An error is ret urned if more
branches are used.
If t he UNION st at ement is followed by t he CLUST ER BY, DIST RIBUT E BY, SORT BY, ORDER BY or
LIMIT clause and you add set odps.sql.type.system.odps2=false; , t he SET st at ement is
applicable t o t he last select_statement; of t he UNION st at ement . If you add set odps.
sql.type.system.odps2=true; , t he SET st at ement is applicable t o all select _st at ement s of
t he UNION st at ement . Example:
set odps.sql.type.system.odps2=true;
SELECT explode(array(3, 1)) AS (a) UNION ALL SELECT explode(array(0, 4, 2)) AS (a
) ORDER
BY a LIMIT 3;
+------+
| a |
+------+
| 0 |
| 1 |
| 2 |
+------+
Descript ion:
number: double, bigint or decimal t ype. When t he input is of t he bigint t ype, a value of t he bigint t ype
is ret urned; when t he input is of t he double t ype, a value of t he double t ype is ret urned. If t he input is
of t he st ring t ype, it is implicit ly convert ed int o a value of t he double t ype before t his comput at ion. If
t he input is of anot her t ype, an error is ret urned.
Ret urned value: double, bigint , or decimal t ype, depending on t he t ype of t he input . If t he input is
NULL, NULL is ret urned.
Not e When t he input is of t he bigint t ype and is out of t he maximum range of t he bigint
t ype, t he ret urned value is of t he double t ype. In t his case, t he precision may be diminished.
Example:
abs(null) = null
abs(-1) = 1
abs(-1.2) = 1.2
abs("-2") = 2.0
abs(122320837456298376592387456923748) = 1.2232083745629837e32
T he following example shows t he usage of a complet e ABS funct ion in SQL. Ot her built -in funct ions
(except window funct ions and aggregat ion funct ions) are in similar usage t o t his funct ion and are not
shown here.
6.7.1.2. ACOS
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. Value range: -1 t o 1. If t he input is of t he st ring or bigint t ype, it is
implicit ly convert ed int o a value of t he double t ype before t his comput at ion. For all ot her input t ypes,
an error is ret urned.
Ret urned value: double or decimal t ype. Value range: 0 t o π. If number is NULL, NULL is ret urned.
Example:
acos("0.87") = 0.5155940062460905
acos(0) = 1.5707963267948966
6.7.1.3. ASIN
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. Value range: -1 t o 1. If t he input is of t he st ring or bigint t ype, it is
implicit ly convert ed int o a value of t he double t ype before t his comput at ion. For all ot her input t ypes,
an error is ret urned.
Ret urned value: double or decimal t ype. Value range: -π/2 t o π/2. If number is NULL, NULL is ret urned.
Example:
asin(1) = 1.5707963267948966
asin(-1) = -1.5707963267948966
6.7.1.4. ATAN
Funct ion declarat ion:
Descript ion:
number: double t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a value of
t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value: double t ype. Value range: -π/2 t o π/2. If number is NULL, NULL is ret urned.
Example:
atan(1) = 0.7853981633974483;
atan(-1) = -0.7853981633974483
6.7.1.5. CEIL
Command synt ax :
Purpose : It is uesed t o ret urn t he smallest int eger t hat is equal t o or great er t han t he input value.
Descript ion:
Ret urned value : bigint t ype. If t he input is NULL, NULL is ret urned.
Example :
ceil(1.1) = 2
ceil(-1.1) = -1
6.7.1.6. CONV
Command synt ax :
Purpose : It is ued t o convert a number from one numeric base number syst em t o anot her.
Descript ion:
input : an int eger of t he st ring t ype t o be convert ed. It accept s values of t he bigint and double t ypes
by means of implicit conversion.
from_base, t o_base: a number syst em value in decimal form. Value range: 2, 8, 10, and 16. It accept s
values of t he st ring and double t ypes by means of implicit conversion.
Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned. T he conversion process runs at a 64-
bit precision. An error is ret urned when overflow occurs. If t he input is a negat ive value (beginning wit h
'-'), an error is ret urned. If t he input is a decimal, it is convert ed t o an int eger before hex conversion. T he
decimal part is left out .
Example :
6.7.1.7. COS
Command synt ax :
Purpose : It is used t o ret urn t he cosine of a number. T he input must be a radian value.
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed t o a
value of t he double t ype. For all ot her input t ypes, an error is ret urned.
Ret urned value : double or decimal t ype. If t he input is NULL, NULL is ret urned.
Example :
cos(3.1415926/2) = 2.6794896585028633e-8
cos(3.1415926) = -0.9999999999999986
6.7.1.8. COSH
Command synt ax :
Descript ion:
number: double or decimal. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed t o a value
of t he double t ype. For all ot her input t ypes, an error is ret urned.
Ret urned value : double or decimal. If t he input is NULL, NULL is ret urned.
6.7.1.9. COT
Funct ion declarat ion:
Purpose : It is used t o ret urn t he cot angent of a number. T he input must be a radian value.
Descript ion:
number: double or decimal. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed a value of
t he double t ype. For all ot her input t ypes, an error is ret urned.
Ret urned value : double or decimal t ype. If t he input is NULL, NULL is ret urned.
6.7.1.10. EXP
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.
6.7.1.11. FLOOR
Funct ion declarat ion:
Purpose : It is used t o ret urn t he round-down int eger t hat is less t han or equal t o number.
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : bigint t ype. If number is NULL, NULL is ret urned.
Example :
floor(1.2) = 1
floor(1.9) = 1
floor(0.1) = 0
floor(-1.2) = -2
floor(-0.1) = -1
floor(0.0) = 0
floor(-0.0) = 0
6.7.1.12. LN
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : double or decimal t ype. If t he input is NULL, negat ive, or zero, NULL is ret urned.
6.7.1.13. LOG
Funct ion declarat ion:
base: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
x: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : logarit hm value of t he double or decimal t ype. If eit her base or x is NULL, negat ive, or
zero, NULL is ret urned. If base is 1 (which leads t o division by zero), NULL is ret urned.
6.7.1.14. POW
Command synt ax :
Descript ion:
x: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
y: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : double or decimal t ype. If x or y is NULL, NULL is ret urned.
6.7.1.15. RAND
Command synt ax :
Purpose : It is used t o ret urn a random number of t he double t ype from 0 t o 1 based on t he seed.
Descript ion:
Seed: opt ional, bigint t ype. It is t he seed of a random number, and det ermines t he st art value of t he
random number sequence.
Example :
6.7.1.16. ROUND
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed
int o a value of t he double t ype before t his comput at ion. If t he input is of anot her t ype, an error is
ret urned.
decimal_place: a const ant of t he bigint t ype. It indicat es t he specified decimal place t o which t he
result is t o be rounded off. For all ot her input t ypes, an error is ret urned. If it is omit t ed, t he number is
rounded t o t he ones place. T he default value is 0.
Ret urned value : double or decimal t ype. If number or decimal_places is NULL, NULL is ret urned.
Not e decimal_places can be negat ive. Negat ive numbers are count ed from t he decimal point
t o left and t he decimal part is left out ; if t he value of decimal_places is great er t han t he lengt h of
t he int eger part , 0 is ret urned.
Example :
round(125.315) = 125.0
round(125.315, 0) = 125.0
Round (125.315, 1) = 125.3
round(125.315, 2) = 125.32
round(125.315, 3) = 125.315
round(-125.315, 2) = -125.32
round(123.345, -2) = 100.0
round(null) = null
round(123.345, 4) = 123.345
round(123.345, -4) = 0.0
6.7.1.17. SIN
Funct ion declarat ion:
Purpose : It is used t o ret urn t he sine of a number. T he input must be a radian value.
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.
6.7.1.18. SINH
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.
6.7.1.19. SQRT
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. It must be great er t han 0. If it is less t han 0, an error is ret urned. If t he
input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a value of t he double t ype before t his
comput at ion. For all ot her t ypes of input s, an error is ret urned.
Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.
6.7.1.20. TAN
Funct ion declarat ion:
Purpose : It is used t o ret urn t he t angent of a number. T he input must be a radian value.
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned.
Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.
6.7.1.21. TANH
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned.
Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.
6.7.1.22. TRUNC
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed
int o a value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is
ret urned.
decimal_places: a const ant of t he bigint t ype. It indicat es t he decimal place t o which a number is t o
be t runcat ed. Numbers of ot her t ypes are implicit ly convert ed int o values of t he bigint t ype. If it is
omit t ed, t he result is t runcat ed t o t he ones place by default .
Ret urned value : double or decimal t ype. If number or decimal_places is NULL, NULL is ret urned.
Not e
T he t runcat ed part is supplement ed wit h 0.
decimal_places can be negat ive. Negat ive numbers are t runcat ed from t he decimal point t o
t he left and t he decimal part is left out . If t he value of decimal_places is great er t han t he
lengt h of t he int eger part , 0 is ret urned.
Example :
trunc(125.815) = 125.0
trunc(125.815, 0) =125.0
trunc(125.815, 1) = 125.80000000000001
trunc(125.815, 2) = 125.81
trunc(125.815, 3) = 125.815
trunc(-125.815, 2) = -125.81
trunc(125.815, -1) = 120.0
trunc(125.815, -2) = 100.0
trunc(125.815, -3) = 0.0
trunc(123.345, 4) = 123.345
trunc(123.345, -4) = 0.0
set odps.sql.type.system.odps2=true;
Not e You must submit and execut e t he SET st at ement and t he SQL st at ement s of t he new
funct ions simult aneously.
T he mat hemat ical funct ions described in subsequent t opics are new in MaxComput e 2.0.
6.7.1.24. LOG2
Funct ion declarat ion:
Descript ion:
Example :
log2(null) = null
log2(0) = null
log2(8) = 3.0
6.7.1.25. LOG10
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype.
Ret urned value : double t ype. If t he input is 0 or NULL, NULL is ret urned.
Example :
log10(null) = null
log10(0) = null
log10(8) = 0.9030899869919435
log10('abc') = null
6.7.1.26. BIN
Command synt ax :
number: bigint .
Ret urned value : st ring t ype. If t he input is 0, 0 is ret urned. If t he input is NULL, NULL is ret urned.
Example :
bin(0) = '0'
bin(null) = 'null'
bin(12) = '1100'
6.7.1.27. HEX
Funct ion declarat ion:
Descript ion:
number: If t his value is of t he bigint t ype, t he hexadecimal format of t he number is ret urned. If t his
value is of t he st ring t ype, t he hexadecimal value of t he st ring is ret urned.
Ret urned value : st ring t ype. If t he input is 0, 0 is ret urned. If t he input is NULL, NULL is ret urned.
Example :
hex(0) = '0'
hex('abc') = '616263'
hex(17) = '11'
hex('17') = '3137'
hex(null) = 'null'
6.7.1.28. UNHEX
Funct ion declarat ion:
Purpose : It is used t o ret urn t he regular charact er st ring represent ed in t he hexadecimal format .
Descript ion:
Ret urned value : binary t ype. If t he input is 0, a failure is ret urned. If t he input is NULL, NULL is ret urned.
Example :
unhex('616263') = 'abc'
unhex(616263) = 'abc'
6.7.1.29. RADIANS
Command synt ax :
Descript ion:
Ret urned value : double t ype. If t he input is NULL, NULL is ret urned.
Example :
radians(90) = 1.5707963267948966
radians(0) = 0.0
radians(null) = null
6.7.1.30. DEGREES
Funct ion declarat ion:
Descript ion:
Ret urned value : double t ype. If t he input is NULL, NULL is ret urned.
Example :
degrees(1.5707963267948966) = 90.0
degrees(0) = 0.0
degrees(null) = null
6.7.1.31. SIGN
Funct ion declarat ion:
Purpose : It is used t o indicat e t he sign of t he input dat a. 1.0 indicat es posit ive and -1.0 indicat es
negat ive. 0.0 indicat es 0.
Descript ion:
Ret urned value : double t ype. If t he input is 0, 0.0 is ret urned. If t he input is NULL, NULL is ret urned.
Example :
sign(-2.5) = -1.0
sign(2.5) = 1.0
sign(0) = 0.0
sign(null) = null
6.7.1.32. E
Funct ion declarat ion:
DOUBLE e()
Example :
e() = 2.718281828459045
6.7.1.33. PI
Funct ion declarat ion:
DOUBLE pi()
Example :
pi() = 3.141592653589793
6.7.1.34. FACTORIAL
Funct ion declarat ion:
Descript ion:
Ret urned value : bigint t ype. If t he input is 0, 1 is ret urned. If t he input is NULL or any value out side t he
range of 0 t o 20, NULL is ret urned.
Example :
6.7.1.35. CBRT
Command synt ax :
Descript ion:
Ret urned value : double t ype. If t he input is NULL, NULL is ret urned.
Example :
cbrt(8) = 2
cbrt(null) = null
6.7.1.36. SHIFTLEFT
Funct ion declarat ion:
Descript ion:
Example :
shiftleft(1,2) = 4
-- Shift left the binary value of 1 by two places (1<<2, 0001 changed to 0100)
shiftleft(4,3) = 32
-- Shift left the binary value of 4 by three places (4<<3, 0100 changed to 100000)
6.7.1.37. SHIFTRIGHT
Funct ion declarat ion:
Descript ion:
Example :
shiftright(4,2) = 1
-- Shift right the unsigned binary value of 4 by two places (4>>2, 0100 changed to 0001)
shiftright(32,3) = 4
-- Shift right the unsigned binary value of 32 by two places (32>>3, 100000 changed to 0100
)
6.7.1.38. SHIFTRIGHTUNSIGNED
Funct ion declarat ion:
Purpose : It is used t o shift right an unsigned value by a given number of places (>>>).
Descript ion:
Example :
shiftrightunsigned(8,2) = 2
-- In this example, shift right the unsigned binary value of 8 (1000 in binary) by two plac
es and return 2 (0010 in binary).
shiftrightunsigned(-14,2) = 1073741820
-- Shift right the unsigned binary value of -14 by two places (-14>>>2, 11111111 11111111 1
1111111 11110010 changed to 00111111 11111111 11111111 11111100)
Purpose : It is used t o ret urn t he number of charact ers in st r1 t hat appear in st r2 (repeat ed charact ers
are not count ed).
Descript ion:
st r1 and st r2: st ring t ype. Bot h must be valid UT F-8 st rings. If invalid charact ers are found during
mat ching, a negat ive value is ret urned.
Ret urned value : bigint t ype. If any input is NULL, NULL is ret urned.
Example :
char_matchcount('abd', 'aabc') = 2
-- The a and b characters in str1 appear in str2.
6.7.2.2. CHR
Command synt ax :
Descript ion:
ascii: ASCII value of t he bigint t ype. If t he input is of t he st ring, double, or decimal t ype, it is implicit ly
convert ed int o a value of t he bigint t ype before t his comput at ion. If t he input is of anot her t ype, an
error is ret urned.
Ret urned value : st ring t ype. T he paramet er value range is from 0 t o 255. A value out of range will
cause an error. If t he input is NULL, NULL is ret urned.
6.7.2.3. CONCAT
Command synt ax :
a, b...: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype. For all ot her input t ypes, an error is ret urned.
Ret urned value : st ring t ype. If t here is no input or if any input is NULL, NULL is ret urned.
Example :
6.7.2.4. INSTR
Funct ion declarat ion:
Descript ion:
st r1: st ring t ype. It indicat es a st ring t o be searched. If t he input is of t he bigint , decimal, double, or
dat et ime t ype, it is implicit ly convert ed int o a value of t he st ring t ype before t his comput at ion. For all
ot her input t ypes, an error is ret urned.
st r2: st ring t ype. It indicat es a subst ring t o be searched out . If t he input is of t he bigint , decimal,
double, or dat et ime t ype, it is implicit ly convert ed int o a value of t he st ring t ype before t his
comput at ion. For all ot her input t ypes, an error is ret urned.
st art _posit ion: bigint t ype. If it is of anot her t ype, an error is ret urned. It indicat es which charact er in
st r1 t he search will st art wit h. T he default st art posit ion is t he first charact er, marked as 1.
nt h_appearance: bigint t ype. If it is great er t han 0, it indicat es t he posit ion where t he subst ring
mat ches t he st ring for t he nt h_appearance t ime. If it is of anot her t ype or if it is less t han or equal t o
0, an error is ret urned.
Not e
If st r2 is not found in st r1, 0 is ret urned.
If any input is NULL, NULL is ret urned.
If st r2 is NULL, t he mat ching will always be successful. T herefore, 1 is ret urned for inst r('abc',
'').
Example :
6.7.2.5. IS_ENCODING
Funct ion declarat ion:
Purpose : It is used t o det ermine whet her an input st ring can be convert ed from a specified charact er
set (from_encoding) t o anot her charact er set (t o_encoding). It can be used t o det ermine whet her t he
input is garbled. from_encoding is usually set t o ut f-8, and t o_encoding is set t o gbk.
Descript ion:
st r: st ring t ype. If t he input is NULL, NULL is ret urned. Null is considered t o belong t o any charact er set .
from_encoding, t o_encoding: st ring t ype. T hey indicat e t he source and t he dest inat ion charact er
set s respect ively. If t he input is NULL, NULL is ret urned.
Ret urned value : boolean t ype. If a st ring is convert ed successfully, t rue is ret urned. Ot herwise, false is
ret urned.
Example :
6.7.2.6. KEYVALUE
Funct ion declarat ion:
Purpose : It is used t o split t he source st ring int o key-value pairs by split 1, separat e key-value pairs by
split 2, and ret urn t he value of t he corresponding key.
Descript ion:
srcSt r: t he source st ring t o be split .
key: st ring t ype. Aft er t he source st ring is split by 'split 1' and 'split 2', ret urn t he corresponding value
according t o t he specificat ion of t he 'key' value.
split 1 and split 2: st rings used as separat ors. T he source st ring is split by t he t wo separat ors. If t hese
t wo paramet ers are not specified in t he expression, split 1 is a semicolon (;) and split 2 is a colon (:) by
default . If a st ring t hat has been split by split 1 has mult iple split 2 values, t he ret urned result is
undefined.
Example :
keyvalue('0:1\;1:2', 1) = '2'
-- The source string is "0:1\;1:2". Because split1 and split2 are not specified, split1 is
a semicolon (;) and split2 is a colon (:) by default. After split1 split, the key-value pai
r is:
0:1\,1:2
After split2 split, it becomes:
0 1/
1 2
Returns the value(2) of the key corresponding to 1.
keyvalue("\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:s
hoes\;instPayAmount:0\;", "\;",":","tf") = "21910"
-- The source string is "\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\
;pf:0\;market:shoes\;instPayAmount:0\;". After the source string is split by split1 "\;", t
he key-value pairs are as follows:
decreaseStore:1, xcard:1, isB2C:1, tf:21910, cart:1, shipping:2, pf:0, market:shoes, instPa
yAmount:0
If split2 is ":", after split it becomes:
decreaseStore 1
xcard 1
isB2C 1
tf 21910
cart 1
shipping 2
pf 0
market shoes
instPayAmount 0
For the key parameter whose value is "tf", the returned value of the corresponding value pa
rameter is 21910.
6.7.2.7. LENGTH
Funct ion declarat ion:
Descript ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : bigint t ype. If a st ring is NULL, NULL is ret urned. If a st ring is not UT F-8 encoded, -1 is
ret urned.
Example :
length('hi! China') = 6
6.7.2.8. LENGTHB
Funct ion declarat ion:
Descript ion:
st r: st ring t ype. If t he input is of t he bigint , double, decimal, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. If t he input is of anot her t ype, an error is
ret urned.
Ret urned value : bigint t ype. If t he input is NULL, NULL is ret urned.
Example :
lengthb('hi! china') = 10
6.7.2.9. MD5
Funct ion declarat ion:
Descript ion:
value: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype before t his comput at ion. If t he input is of anot her t ype, an
error is ret urned.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
6.7.2.10. PARSE_URL
Funct ion declarat ion:
Purpose : It is used t o parse a URL and ext ract informat ion by key.
Descript ion:
If URL or part is NULL, NULL is ret urned. If URL is invalid, an error is ret urned.
part : st ring t ype. It support s HOST , PAT H, QUERY, REF, PROT OCOL, AUT HORIT Y, FILE, and USERINFO,
and is case insensit ive. If it is none of t he preceding values, an error is ret urned.
If part is QUERY, t he value in query st ring t hat corresponds t o t he key value is ext ract ed. Ot herwise,
t he paramet er key is ignored.
Example :
6.7.2.11. REGEXP_EXTRACT
Command synt ax :
Purpose : It is used t o ret urn part of t he source st ring t hat mat ches t he regular expression and t he
occurrence of t he mat ches.
Descript ion:
Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned.
Example :
6.7.2.12. REGEXP_INSTR
Funct ion declarat ion:
Purpose : It is used t o ret urn t he st art or end posit ion of t he subst ring t hat mat ches t he pat t ern in t he
source st ring from st art _posit ion for t he nt h_occurrence t ime.
Descript ion:
Ret urned value : bigint t ype. It is t he st art or end posit ion of t he mat ched subst ring in source st ring
according t o t he t ype specified by ret urn_opt ion. If any input is NULL, NULL is ret urned.
Example :
6.7.2.13. REGEXP_SUBSTR
Funct ion declarat ion:
Purpose : It is used t o ret urn t he st ring t hat mat ches pat t ern in t he source st ring from posit ion
st art _posit ion for t he nt h_occurence t ime.
Descript ion:
pat t ern: a const ant of t he st ring t ype. It indicat es a pat t ern t o be mat ched. If pat t ern is null, an error
is ret urned.
st art _posit ion: a const ant of t he bigint t ype. It must be great er t han 0. If it is anot her t ype or if it is
less t han or equal t o 0, an error is report ed. When it is not specified, it is regarded as 1 by default , so
t he mat ching st art s from t he first charact er of 'source'.
nt h_occurrence: a const ant of t he bigint t ype. It must be great er t han 0. If it is anot her t ype or is less
t han or equal t o 0, an error is ret urned. If it is not specified, it is regarded as 1 by default , indicat ing
t hat t he st ring in t he first mat ch is ret urned.
Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned. If t here is no mat ching, NULL is
ret urned.
Example :
6.7.2.14. REGEXP_COUNT
Command synt ax :
Purpose : It is used t o ret urn t he number of occurrences t hat a st ring pat t ern appears in t he source
st ring, st art ing from st art _posit ion.
Descript ion:
source: st ring t ype. It indicat es a st ring t o be searched. For all ot her input t ypes, an error is ret urned.
pat t ern: st ring t ype. It indicat es a pat t ern t o be mat ched. If t he pat t ern is NULL or of anot her t ype,
an error is ret urned.
st art _posit ion: bigint st art _posit ion must be a number t hat is great er t han 0. Ot herwise, an error is
ret urned. If st art _posit ion is not specified, t he default value is 1 which means st art ing from t he first
charact er of t he source st ring.
Ret urned value : bigint t ype. If any input is NULL, NULL is ret urned. If t here is no mat ching, 0 is ret urned.
Example :
regexp_count('abababc', 'a.c') = 1
regexp_count('abcde', '[[:alpha:]]{2}', 3) = 1
6.7.2.15. SPLIT_PART
Funct ion declarat ion:
Purpose : It is used t o split a st ring wit h t he specified delimit er, and ret urn t he st ring bet ween t he
specified st art segment and end segment (inclusive).
Descript ion:
Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned. If delimit er is NULL, t he original st ring
is ret urned.
Not e
If delimit er does not exist in st r, and st art is set t o 1, t he ent ire st r is ret urned. If t he input is
NULL, NULL is ret urned.
If st art is set t o a value great er t han t he number of segment s (for example, t he st ring has 6
segment s but t he st art value is great er t han 6), NULL is ret urned.
If end is set t o a value great er t han t he number of segment s, t he st ring bet ween st art and
t he last segment is ret urned.
Example :
6.7.2.16. REGEXP_REPLACE
Funct ion declarat ion:
Purpose : It is used t o search a source st ring for subst rings t hat mat ch a given pat t ern, replace t hem
wit h t he specified replace_st ring, and ret urn t he result .
Descript ion:
less t han 0, an error is ret urned. It can be omit t ed. T he default value is 0.
Ret urned value : st ring t ype. When t he referenced group does not exist , t he replace operat ion is not
performed. When t he input paramet ers source, pat t ern, and occurrence are NULL, NULL is ret urned. If
replace_st ring is NULL and t he pat t ern is mat ched, NULL is ret urned. If replace_st ring is NULL but t he
pat t ern is not mat ched, t he original st ring is ret urned.
Not e When t he referenced group does not exist , t he act ion is not defined.
Example :
6.7.2.17. SUBSTR
Funct ion declarat ion:
Purpose : It is used t o ret urn a subst ring of 'lengt h' from 'st r' st art ing from 'st art _posit ion'.
Descript ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype before t his comput at ion. For all ot her t ypes of input s, an
error is ret urned.
st art _posit ion: bigint t ype. T he st art posit ion is 1. If st art _posit ion is a negat ive value, t he count ing
st art s from t he end t o t he st art of t he st ring and t he last charact er is –1. If t he input is of anot her
t ype, an error is ret urned.
lengt h: bigint t ype. It indicat es t he lengt h of t he subst ring, which is great er t han 0. If it is of anot her
t ype or less t han or equal t o 0, an error is ret urned.
Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned.
Not e If t he lengt h is omit t ed, t he subst ring from st art t o end is ret urned.
Example :
substr("abc", 2) = "bc"
substr("abc", 2, 1) = "b"
substr("abc",-2,2) = "bc"
substr("abc",-3) = "abc"
6.7.2.18. TOLOWER
Funct ion declarat ion:
Purpose : It is used t o convert 'source' int o a lowercase st ring and ret urn t he value.
Descript ion:
source: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is
ret urned.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
Example :
tolower("aBcd") = "abcd"
tolower("Haha Cd") = "haha cd"
6.7.2.19. TOUPPER
Funct ion declarat ion:
Purpose : It is used t o convert 'source' int o an uppercase st ring and ret urn t he value.
Descript ion:
source: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype before t his comput at ion. For all ot her t ypes of input s, an error
is ret urned.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
Example :
toupper("aBcd") = "ABCD"
toupper("HahaCd") = "HAHACD"
6.7.2.20. TO_CHAR
Funct ion declarat ion:
Purpose : It is used t o convert t he input of t he boolean, bigint , decimal, or double t ype int o a value of
t he st ring t ype.
Descript ion:
value: boolean, bigint , or double t ype. For all ot her t ypes of input s, an error is ret urned. For more
informat ion about t he format t ed out put of dat a of t he dat et ime t ype, see Dat e processing funct ions
— T O_CHAR.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
Example :
to_char(123) = '123'
to_char(true) = 'TRUE'
to_char(1.23) = '1.23'
to_char(null) = 'null'
6.7.2.21. TRIM
Funct ion declarat ion:
Descript ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her t ypes of input s, an error is
ret urned.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
6.7.2.22. LTRIM
Funct ion declarat ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
Example :
6.7.2.23. RTRIM
Funct ion declarat ion:
Purpose : It is used t o remove t he right most spaces from t he input st ring 'st r'.
Descript ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
Example :
6.7.2.24. REVERSE
Funct ion declarat ion:
Descript ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
Example :
6.7.2.25. SPACE
Funct ion declarat ion:
STRING SPACE(bigint n)
Purpose : It is used t o ret urn a st ring wit h 'n' consecut ive space charact ers.
Descript ion:
n: bigint t ype. T he lengt h cannot exceed 2 MB. If t he input is NULL, an error is ret urned.
Ret urned value : st ring t ype.
Example :
6.7.2.26. REPEAT
Funct ion declarat ion:
Purpose : It is used t o ret urn st ring 'st r' t hat has been repeat ed n t imes.
Descript ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error
is ret urned.
n: bigint t ype. T he lengt h cannot exceed 2 MB. If it is NULL, an error is ret urned.
Example :
6.7.2.27. ASCII
Funct ion declarat ion:
Purpose : It is used t o ret urn t he ASCII code of t he first charact er of st ring 'st r'.
Descript ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
6.7.2.28. URL_ENCODE
Funct ion declarat ion:
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
Example :
6.7.2.29. URL_DECODE
Funct ion declarat ion:
Descript ion:
Example :
set odps.sql.type.system.odps2=true;
Not e You must submit and execut e t he SET st at ement and t he SQL st at ement s of t he new
funct ions simult aneously.
T he st ring processing funct ions described in subsequent t opics are new in MaxComput e 2.0.
6.7.2.31. CONCAT_WS
Command synt ax :
Purpose : It is used t o join input st rings st art ing from t he second wit h t he first st ring as t he separat or.
Descript ion:
Ret urned value : st ring t ype. If t here is no input or if any input is NULL, NULL is ret urned.
Example :
concat_ws(':','name','bob') = 'name:bob'
concat_ws(':','avg',null,'34')= 'null'
6.7.2.32. LPAD
Funct ion declarat ion:
Purpose : It is used t o pad t he left side of st ring a wit h st ring b unt il t he new padded st ring has len bit s.
Descript ion:
Example :
lpad('abcdefgh',10,'12')='12abcdefgh'
lpad('abcdefgh',5,'12')='abcde'
lpad('abcdefgh',0,'12')
-- NULL is returned.
6.7.2.33. RPAD
Funct ion declarat ion:
Purpose : It is used t o pad t he right side of st ring 'a' wit h st ring 'b' unt il t he new padded st ring has 'len'
places.
Descript ion:
Ret urned value : st ring t ype. If len is smaller t han t he number of charact ers in a, a is t runcat ed from
t he left t o obt ain a st ring wit h t he number of charact ers specified by len. If len is 0, NULL is ret urned.
Example :
rpad('abcdefgh',10,'12')='abcdefgh12'
rpad('abcdefgh',5,'12')='abcde'
rpad('abcdefgh',0,'12')
-- NULL is returned.
6.7.2.34. REPLACE
Funct ion declarat ion:
Purpose : It is used t o replace t he part of st ring a t hat is exact ly t he same as st ring OLD wit h st ring
NEW, and ret urn st ring a.
Descript ion:
Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned.
Example :
replace('ababab','abab','12') = '12ab'
replace('ababab','cdf','123') = 'ababab'
replace('123abab456ab',null,'abab') = 'null'
6.7.2.35. SOUNDEX
Funct ion declarat ion:
string soundex(string a)
Descript ion:
All paramet ers are of t he st ring t ype.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
Example :
soundex('hello') = 'H400'
6.7.2.36. SUBSTRING_INDEX
Funct ion declarat ion:
Purpose : It is used t o ret urn t he subst ring in 'a' t hat comes before t he 'count ' (nt h) delimit er ('SEP'). If
'count ' is a posit ive value, it st art s from t he left of t he st ring. If 'count ' is a negat ive value, it st art s from
t he right of t he st ring.
Descript ion:
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
Example :
6.7.2.37. TRANSLATE
Funct ion declarat ion:
Ret urned value : ST RING t ype. If any input is NULL, NULL is ret urned.
Example :
translate('MaxComputer','puter','pute')='MaxCompute'
translate('aaa','b','c')='aaa'
translate('MaxComputer','puter',null)=null
6.7.2.38. JSON_TUPLE
Funct ion declarat ion:
Descript ion: T his funct ion ext ract s specific st rings from a st andard JSON st ring based on a set of input
keys, such as key1 and key2.
Paramet ers:
Not e
If t he json paramet er is empt y or invalid, NULL is ret urned.
If t he key paramet er is empt y or invalid, NULL is ret urned. If t he key paramet er does not exist
in t he JSON st ring, it is considered invalid.
If t he json paramet er is valid and t he key paramet er exist s, t he required st ring is ret urned.
T his funct ion parses a JSON st ring t he same way as t he GET _JSON_OBJECT funct ion for which
set odps.sql.udf.getjsonobj.new=true; is added. T o parse a JSON st ring mult iple t imes,
you must call t he GET _JSON_OBJECT funct ion mult iple t imes. However, t he JSON_T UPLE
funct ion allows you t o ent er mult iple keys at a t ime and parse t he JSON st ring only once. T his
improves parsing efficiency.
JSON_T UPLE is a user-defined t able-valued funct ion (UDT F). T o select ot her columns, use
JSON_T UPLE wit h LAT ERAL VIEW.
Example :
Table: school
+------------+------------+
| Id | json |
+------------+------------+
| 1 | {
"School name": "湖畔⼤学",
"Location":"杭州",
"SchoolRank": "00",
"Class1":{
"Student":[{
"studentId":1,
"scoreRankIn3Year":[1,2,[3,2,6]]
}, {
"studentId":2,
"scoreRankIn3Year":[2,3,[4,3,1]]
}]}
} |
+------------+------------+
Descript ion:
dat e: T his value must be a st ring t ype dat e.If t he input is of t he st ring t ype, it is implicit ly convert ed
int o a value of t he dat et ime t ype before t his comput at ion. For all ot her t ypes of input s, an error is
ret urned.
delt a: bigint t ype. It indicat es t he scope of modificat ion. If t he input is of t he st ring or double t ype, it
is implicit ly convert ed int o a value of t he bigint t ype before t his comput at ion. If t he input is of
anot her t ype, an error is ret urned. If delt a is great er t han 0, t he delt a is added t o t he value. If delt a is
less t han 0, t he delt a is subt ract ed from t he value.
dat epart : a const ant of t he st ring t ype. T his field is set based on t he st ring-dat et ime conversion
convent ion. yyyy indicat es year and mm indicat es mont h. For rules of t ype conversion, see Conversion
bet ween t he st ring t ype and dat et ime t ype. In addit ion, t he ext ended dat e format is also
support ed: year, mont h or mon, day, and hour. If t he paramet er value is not a const ant or of an
unsupport ed format or anot her t ype, an error is ret urned.
Ret urned value : dat et ime t ype. If any input is NULL, NULL is ret urned.
Not e
When delt a is added t o or subt ract ed from t he value, carrying and borrowing are base-10
for year, base-12 for mont h, base-24 for hour, and base-60 for minut e and second. If delt a
is measured in mont hs, t he following calculat ion is applied: If t he mont h in t he dat et ime
value does not cause t he day value t o become invalid aft er delt a is added, t he day value is
kept . Ot herwise, t he day value is adjust ed t o t he last day of t he result ing mont h.
T his field is set based on t he st ring-dat et ime conversion convent ion. yyyy indicat es t he year
and mm indicat es t he mont h. Unless ot herwise specified, all built -in funct ions relat ed t o t he
dat et ime t ype follow t his convent ion. Unless ot herwise specified, t he dat epart of all built -in
funct ions relat ed t o t he dat et ime t ype also support s t he ext ended dat e format : year,
mont h or mon, day, and hour.
Example :
T he values of t rans_dat e used only serve as examples. T he dat et ime examples in t his document use
simple format s. In MaxComput e SQL, a const ant cannot be of t he dat et ime t ype. T he following synt ax
is incorrect :
If you must use a const ant of t he dat et ime t ype, use t he following met hod:
6.7.3.2. DATEDIFF
Funct ion declarat ion:
Purpose : It is used t o calculat e t he difference bet ween dat e1 and dat e2 based on t he specified
dat epart .
Descript ion:
dat e1 and dat e2: minuend and subt rahend of t he dat et ime t ype respect ively. If t he input is a st ring,
it is implicit ly convert ed int o a value of t he dat et ime t ype before t his comput at ion. For all ot her input
t ypes, an error is ret urned.
dat apart : A const ant of t he st ring t ype. It support s t he ext ended dat e format . If dat epart is not in
t he specified format or is of anot her t ype, an error is ret urned.
Ret urned value : bigint t ype. If any input is NULL, NULL is ret urned. If dat e1 is less t han dat e2, t he
ret urned value may be negat ive.
Not e T he lower unit part is t runcat ed based on 'dat epart ' in t he comput at ion process and
t hen t he result is calcualt ed.
Example :
6.7.3.3. DATEPART
Funct ion declarat ion:
dat e: dat et ime t ype. If t he input is a st ring, it is implicit ly convert ed int o a value of t he dat et ime t ype
before t his comput at ion. For all ot her input t ypes, an error is ret urned.
dat epart : a const ant of t he st ring t ype. It support s t he ext ended dat e format . If dat epart is not in
t he specified format or is of anot her t ype, an error is ret urned.
Ret urned value : bigint t ype. If any input is NULL, NULL is ret urned.
Example :
6.7.3.4. DATETRUNC
Funct ion declarat ion:
Purpose : It is used t o ret urn t he value of a dat e aft er t he specified dat epart is t runcat ed.
Descript ion:
dat e: dat et ime t ype. If t he input is a st ring, it is implicit ly convert ed int o a value of t he dat et ime t ype
before t his comput at ion. For all ot her input t ypes, an error is ret urned.
dat epart : a const ant of t he st ring t ype. It support s t he ext ended dat e format . If dat epart is not in
t he specified format or is of anot her t ype, an error is ret urned.
Ret urned value : dat et ime t ype. If any input is NULL, NULL is ret urned.
Example :
6.7.3.5. GETDATE
Funct ion declarat ion:
datetime getdate()
Purpose : It is used t o obt ain t he current syst em t ime. Use UT C+8 as t he st andard t ime of MaxComput e.
Ret urned value : t he current dat e and t ime of t he dat et ime t ype.
Not e In a MaxComput e SQL t ask (execut ed in a dist ribut ed manner), 'get dat e' always ret urns
a fixed value. T he ret urned result is any t ime in MaxComput e. T he t ime ret urned is precise t o t he
second. In lat er versions, t he t ime will be precise t o t he milisecond.
6.7.3.6. ISDATE
Funct ion declarat ion:
Purpose : It is used t o det ermine whet her a dat e st ring can be convert ed int o a dat e value based on
t he corresponding format st ring. If t he conversion can be performed, t rue is ret urned. Ot herwise, false is
ret urned.
Descript ion:
dat e: T his value must be a st ring t ype dat e. If t he input is of t he bigint , decimal, double, or dat et ime
t ype, it is implicit ly convert ed int o a value of t he st ring t ype before t his comput at ion. For all ot her
input t ypes, an error is ret urned.
format : a const ant of t he st ring t ype. T he ext ended dat e format is not support ed. If it is of anot her
t ype or an unsupport ed format , an error is ret urned. If t here are redundant format st rings appearing
in 'format ', t he dat e value corresponding t o t he first format st ring is used. Ot her st rings are t aken as
delimit ers. If isdat e("1234-yyyy", "yyyy-yyyy"), t rue is ret urned.
Ret urned value : boolean t ype. If any input is NULL, NULL is ret urned.
6.7.3.7. LASTDAY
Purpose : It is used t o ret urn t he last day of t he current mont h t o which t he dat e belongs. T he value is
accurat e t o day. T he hour, minut e, and second part is expressed as 00:00:00.
Descript ion:
dat e: dat et ime t ype. If t he input is a st ring, it is implicit ly convert ed int o a value of t he dat et ime t ype
before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : dat et ime t ype. If t he input is NULL, NULL is ret urned.
6.7.3.8. TO_DATE
Funct ion declarat ion:
dat e: st ring t ype. It indicat es t he dat e value of t he st ring t ype t o be convert ed. If t he input is of t he
bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed int o a value of t he st ring t ype
before t his comput at ion. For all ot her t ypes of input s or NULL, an error is ret urned.
format : a const ant of t he st ring t ype in t he dat e format . For all ot her t ypes of input s and non-
const ant values, an error is ret urned. It does not support t he ext ended dat e format . Ot her charact ers
are ignored as invalid charact ers in parsing. T he format paramet er must cont ain yyyy. Ot herwise, an
error is ret urned. If t here are redundant format st rings in t he format , t he corresponding dat e value of
t he first format st ring is used, and t he rest are processed as separat ors. For example, t o_dat e('1234-
2234', 'yyyy-yyyy') ret urns '1234-01-01 00:00:00'.
Ret urned value : dat et ime t ype. T he format is yyyy-mm-dd hh:mi:ss. If any input is NULL, NULL is
ret urned.
Example :
6.7.3.9. TO_CHAR
Funct ion declarat ion:
Purpose : It is used t o convert a value of t he dat e t ype int o a st ring based on t he specified format .
Descript ion:
dat e: dat e value of t he dat et ime t ype t o be convert ed. If t he input is a st ring, it is implicit ly
convert ed int o a value of t he dat et ime t ype before t his comput at ion. For all ot her t ypes of input s,
an error is ret urned.
format : a const ant of t he st ring t ype. If it is not a const ant or is of a different t ype, an error is
ret urned. In format , t he dat e format part is replaced wit h t he corresponding dat a and ot her
charact ers are out put direct ly.
Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned.
Example :
Not e For more informat ion about conversion from ot her t ypes int o t he st ring t ype, see St ring
funct ions — T O_CHAR.
6.7.3.10. UNIX_TIMESTAMP
Funct ion declarat ion:
Purpose : It is used t o convert a dat e int o a dat et ime value of t he int eger t ype in t he Unix format .
Descript ion:
dat e: dat et ime t ype. It indicat es t he dat e. If t he input is a st ring, it is implicit ly convert ed int o a value of
t he dat et ime t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : bigint t ype. It indicat es t he dat e value in Unix format . If dat e is NULL, NULL is ret urned.
6.7.3.11. FROM_UNIXTIME
Funct ion declarat ion:
Purpose : It is used t o convert a Unix dat e value from t he BIGINT t ype t o t he DAT ET IME t ype.
Descript ion:
unixt ime: BIGINT t ype. It is a dat e value in t he Unix format . If t he input is of t he ST RING, DECIMAL, or
DOUBLE t ype, it is implicit ly convert ed int o a value of t he BIGINT t ype before comput at ion.
Ret urned value : DAT ET IME t ype. If unixt ime is NULL, NULL is ret urned.
Example :
6.7.3.12. WEEKDAY
Funct ion declarat ion:
Descript ion:
dat e: dat et ime t ype. If t he input is of t he st ring t ype, it is implicit ly convert ed t o a value of t he
dat et ime t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : bigint t ype. If t he input is NULL, NULL is ret urned. Monday is t he first day of a week
and t he ret urned value is 0. Days are numbered in ascending order st art ing from 0. If t he day is Sunday,
t he ret urned value is 6.
6.7.3.13. WEEKOFYEAR
Funct ion declarat ion:
Purpose : It is used t o ret urn t he calendar week of t he year t hat t he specified dat e falls in. T he syst em
uses Monday as t he first day of t he week.
Not e If a week ext ends int o t he next year, t he week belongs t o t he year cont aining four
days or more. If more days fall in t he first year, t he week is considered as t he last week of t he first
year. If more days fall in t he second year, t he week is considered as t he first week of t he second
year.
Descript ion:
dat e: t he dat e of t he dat et ime t ype. If t he input is of t he st ring t ype, it is implicit ly convert ed t o a
value of t he dat et ime t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : bigint t ype. If t he input is NULL, NULL is ret urned.
Example :
set odps.sql.type.system.odps2=true;
Not e You must submit and execut e t he SET st at ement and t he SQL st at ement s of t he new
funct ions simult aneously.
Example:
set odps.sql.type.system.odps2=true;
select year('2017-01-01 12:30:00') = 2017 from dual;
T he dat e funct ions described in subsequent t opics are new in MaxComput e 2.0.
6.7.3.15. YEAR
Funct ion declarat ion:
dat e: t he dat e of t he st ring t ype. T he dat e format must include yyyy-mm-dd and have no redundant
st rings. Ot herwise, NULL is ret urned.
Example :
6.7.3.16. QUARTER
Command synt ax :
Descript ion:
dat e: dat et ime, t imest amp, or st ring t ype. T he dat e format must include yyyy-mm-dd and have no
redundant st rings. Ot herwise, NULL is ret urned.
Ret urned value : int t ype. If t he input is NULL, NULL is ret urned.
Example :
quarter('2017-11-12 10:00:00') = 4
quarter('2017-11-12') = 4
6.7.3.17. MONTH
Funct ion declarat ion:
Descript ion:
dat e: T his value must be a dat e of t he st ring t ype. For all ot her input t ypes, an error is ret urned.
Example :
month('2017-09-01') = 9
month('20170901') = null
6.7.3.18. DAY
Funct ion declarat ion:
Descript ion:
dat e: T his value must be a st ring t ype dat e. For all ot her input t ypes, an error is ret urned.
Ret urned value : int t ype.
Example :
day('2017-09-01') = 1
day('20170901') = null
6.7.3.19. DAYOFMONTH
Funct ion declarat ion:
INT dayofmonth(date)
Descript ion:
dat e: T his value must be a st ring t ype dat e. For all ot her input t ypes, an error is ret urned.
Example :
dayofmonth('2017-09-01') = 1
dayofmonth('20170901') = null
6.7.3.20. HOUR
Funct ion declarat ion:
dat e: T his value must be a st ring t ype dat e. For all ot her input t ypes, an error is ret urned.
Example :
hour('2017-09-01 12:00:00') = 12
hour('12:00:00') = 12
hour('20170901120000') = null
6.7.3.21. MINUTE
Funct ion declarat ion:
Descript ion:
dat e: T his value must be a st ring t ype dat e. For all ot her input t ypes, an error is ret urned.
Ret urned value : int t ype.
Example :
minute('2017-09-01 12:30:00') = 30
minute('12:30:00') = 30
minute('20170901120000') = null
6.7.3.22. SECOND
Funct ion declarat ion:
Descript ion:
dat e: T his value must be a st ring t ype dat e. For all ot her input t ypes, an error is ret urned.
Example :
second('2017-09-01 12:30:45') = 45
second('12:30:45') = 45
second('20170901123045') = null
6.7.3.23. FROM_UTC_TIMESTAMP
Funct ion declarat ion:
Purpose : It is used t o convert a UT C t imest amp t o a t imest amp for a specified t imezone.
Descript ion:
{any primit ive t ype}*: t he t imest amp. T he t ype can be T IMEST AMP, DAT ET IME, T INYINT , SMALLINT , INT ,
or BIGIN.
t imezone: Specifies t he dest inat ion t imezone, such as PST .
6.7.3.24. CURRENT_TIMESTAMP
Funct ion declarat ion:
timestamp current_timestamp()
Purpose : T he current t imest amp is ret urned as a T imest amp-t ype value. T he value is not fixed.
Example :
6.7.3.25. ADD_MONTHS
Funct ion declarat ion:
Purpose : It is used t o ret urn t he dat e, which is 'nummont hs' mont hs lat er t han 'st art dat e'.
Descript ion:
st art dat e: T his value must be a st ring t ype dat e. T he dat e format must cont ain yyyy-mm-dd.
Ot herwise, NULL is ret urned.
num_mont hs: int t ype.
Ret urned value : T his value must be a st ring t ype dat e. T he format is yyyy-mm-dd.
Example :
6.7.3.26. LAST_DAY
Funct ion declarat ion:
Descript ion:
Example :
last_day('2017-03-04') = '2017-03-31'
last_day('2017-07-04 11:40:00') = '2017-07-31'
last_day('20170304') = null
6.7.3.27. NEXT_DAY
Funct ion declarat ion:
Purpose : It is used t o ret urn t he next dat e t hat is lat er t han st art dat e and mat ches t he week value.
T hat is, t he dat e of t he day specified of t he next week.
Descript ion:
Ret urned value : T his value must be a st ring t ype dat e. T he format is yyyy-mm-dd.
Example :
next_day('2017-08-01','TU') = '2017-08-08'
next_day('2017-08-01 23:34:00','TU') = '2017-08-08'
Next_day ('20170801 ', 'tu') = NULL
6.7.3.28. MONTHS_BETWEEN
Funct ion declarat ion:
Purpose : It is used t o ret urn t he number of mont hs bet ween dat e1 and dat e2.
Descript ion:
dat e1: dat et ime, t imest amp, or st ring t ype. T he format is yyyy-MM-dd HH:mi:ss or yyyy-mm-dd.
dat e2: dat et ime, t imest amp, or st ring t ype. T he format is yyyy-MM-dd HH:mi:ss or yyyy-mm-dd.
If 'dat e1' is lat er t han 'dat e2', t he ret urned value is posit ive. If 'dat e2' is lat er t han 'dat e1', t he
ret urned value is negat ive.
When dat e1 and dat e2 correspond t o t he last days of t wo mont hs, t he ret urned value is an int eger
represent ing t he number of mont hs. Ot herwise, t he formula is (dat e1 - dat e2)/31.
Example :
6.7.3.29. EXTRACT
Funct ion declarat ion:
Descript ion: T his funct ion ext ract s t he part specified by dat epart from t he t ime specified by
t imest amp.
Paramet ers:
dat epart : a value t hat can be set t o a t ime unit , such as YEAR, MONT H, DAY, HOUR, or MINUT E
t imest amp: a value of t he T IMEST AMP t ype
SET odps.sql.type.system.odps2=true;
SELECT extract(YEAR FROM '2019-05-01 11:21:00') year
,extract(MONTH FROM '2019-05-01 11:21:00') month
,extract(DAY FROM '2019-05-01 11:21:00') day
,extract(HOUR FROM '2019-05-01 11:21:00') hour
,extract(MINUTE FROM '2019-05-01 11:21:00') minute;
-- The following result is returned:
+------+-------+------+------+--------+
| year | month | day | hour | minute |
+------+-------+------+------+--------+
| 2019 | 5 | 1 | 11 | 21 |
+------+-------+------+------+--------+
If t he t ime value specified in t he SQL st at ement is invalid or exceeds t he specified range, t he ret urn
value is t he remainder obt ained by dividing t he specified t ime value by t he maximum value in t he t ime
range.
Example :
SET odps.sql.type.system.odps2=true;
SELECT extract(HOUR FROM '2019-05-01 31:01:01') hour
,extract(MINUTE FROM '2019-05-01 23:61:01') minute;
-- The following result is returned:
+------+-------+
| hour | minute|
+------+-------+
| 7 | 1 |
+------+-------+
-- The maximum value of hour is 24, and the specified time value is 31. The return value is
7 (31/24).
-- The maximum value of minute is 60, and the specified time value is 61. The return value
is 1 (61/60).
6.7.4.1. Overview
In MaxComput e SQL st at ement s, you can use t he window funct ion t o analyze and process dat a flexibly.
T he window funct ion can only appear in SELECT clauses. It does not support nest ed Window or
aggregat ion funct ions. T he window funct ion cannot be used wit h t he same-level aggregat ion
funct ions at t he same t ime.
Synt ax :
Descript ion:
PART IT ION BY specifies part it ion columns. T he rows on which t he part it ion column values are t he
same are considered t o be in t he same window. A window can cont ain up t o 100 million rows of dat a
(we recommend t hat t he number of rows does not exceed 5 million). Ot herwise, an error is ret urned.
Use ORDER BY t o specify t he rule for sort ing dat a in a window.
You can use ROWS in windowing_clause t o specify t he part it ioning met hod. T here are t wo met hods:
rows bet ween x preceding|following and y preceding|following indicat es a window range from t he
xt h row preceding or following t he current row t o t he yt h row preceding or following t he current
row.
rows x preceding|following indicat es a window range from t he xt h row preceding or following t he
current row t o t he current row.
Not e
x and y must be int eger const ant s great er t han or equal t o 0. T heir values range from 0 t o
10,000. 0 indicat es t he current row.
You must specify ORDER BY before using ROWS t o specify a window range.
Not all window funct ions open windows using t he met hod specified by ROWS. T he
met hod is only support ed by t he following funct ions: AVG, COUNT , MAX, MIN, ST DDEV,
and SUM.
6.7.4.2. COUNT
Command synt ax :
expr: any t ype. When it is NULL, t his row is not involved in comput at ion. If t he dist inct keyword is
specified, t his paramet er indicat es t hat only dist inct values are count ed.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc],col2[asc|desc]: T he count value of expr in t he current window is ret urned if
ORDER BY is not set . T he ret urned result s are sort ed in t he specified order if ORDER BY is specified, and
t he value is t he count value from t he st art row t o t he current row in t he current window.
Example :
6.7.4.3. AVG
Funct ion declarat ion:
dist inct : If t he dist inct keyword is specified, t his paramet er indicat es t hat t he average value of
dist inct values is calculat ed.
expr: double t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a value of
t he double t ype before comput at ion. If t he input is of anot her t ype, an error is ret urned. If t he input
is NULL, t his row is not used in comput at ion. T he input cannot be of t he boolean t ype.
part it ion by col1[, col2]…: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he count value of expr in t he current window is ret urned if
ORDER BY is not set . T he ret urned result s are sort ed in t he specified order if ORDER BY is specified, and
t he value is t he count value from t he st art row t o t he current row in t he current window.
6.7.4.4. MAX
Funct ion declarat ion:
Descript ion:
expr: any t ypes except t he boolean t ype. If t he value is NULL, t he corresponding row is not involved
in t he operat ion. If t he dist inct keyword is specified, t his paramet er indicat es t hat t he maximum value
of t he dist inct values is t aken (whet her t his paramet er is set or not does not affect t he result ).
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he maximum value in t he current window is ret urned if ORDER
BY is not set . If ORDER BY is set , t he ret urned result s are sort ed in t he specified order, and t he values
are t he maximum values from t he st art row t o t he current row in t he current window.
Ret urned value : T he t ype is t he same as t hat of expr.
6.7.4.5. MIN
Funct ion declarat ion:
Descript ion:
expr: any t ypes except t he boolean t ype. If t he value is NULL, t he corresponding row is not involved
in t he operat ion. If t he dist inct keyword is specified, t his paramet er indicat es t hat t he minimum value
of dist inct values is t aken (whet her t his paramet er is set or not does not affect t he result ).
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he minimum value in t he current window is ret urned if ORDER
BY is not set . If ORDER BY is set , t he ret urned result s are sort ed in t he specified order, and t he
ret urned value is t he minimum value in t he current window from t he st art row t o t he current row.
6.7.4.6. MEDIAN
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed
int o a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is
ret urned. If t he input is NULL, NULL is ret urned.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
6.7.4.7. STDDEV
Funct ion declarat ion:
Descript ion:
expr: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned. If
t he input value is NULL, t hen NULL is ret urned. If t he dist inct keyword is specified, t his paramet er
indicat es t hat t he populat ion st andard deviat ion of dist inct values is calculat ed.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he populat ion st andard deviat ion of t he current window is
ret urned if ORDER BY is not set . If ORDER BY is set , t he ret urned result s are sort ed in t he specified
order, and t he values are t he populat ion st andard deviat ion of t he st art row t o t he current row in
t he current window.
Ret urned value : When t he input is of t he decimal t ype, a value of t he decimal t ype is ret urned.
Ot herwise, a value of t he double t ype is ret urned.
6.7.4.8. STDDEV_SAMP
Descript ion:
expr: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned. If
t he input is NULL, NULL is ret urned. If t he dist inct keyword is specified, t his paramet er indicat es t hat
t he sample st andard deviat ion of dist inct values is calculat ed.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he sample st andard deviat ion of t he current window is
ret urned if ORDER BY is not set . If ORDER BY is set , t he ret urned result s are sort ed in t he specified
order, and t he values are t he sample st andard deviat ion of t he st art row t o t he current row in t he
current window.
Ret urned value : When t he input is of t he decimal t ype, a value of t he decimal t ype is ret urned.
Ot herwise, a value of t he double t ype is ret urned.
6.7.4.9. SUM
Funct ion declarat ion:
Descript ion:
expr: double, decimal, or bigint t ype. If t he input is of t he st ring t ype, it is implicit ly convert ed int o a
value of t he double t ype before comput at ion. If t he input is of anot her t ype, an error is ret urned. If
t he value is NULL, t his row is not calculat ed. If t he dist inct keyword is specified, t his paramet er
indicat es t hat t he sum of dist inct values is calculat ed.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he sum of t he expr value in t he current window is ret urned if
ORDER BY is not set . If ORDER BY is set , t he ret urned result s are sort ed in t he order specified. T he
ret urned result s are t he cumulat ive sum of st art row t o t he current row in t he current window.
Ret urned value : When t he input is of t he bigint t ype, a value of t he bigint t ype is ret urned. When t he
input is of t he double or st ring t ype, a value of t he double t ype is ret urned.
6.7.4.10. DENSE_RANK
Funct ion declarat ion:
Purpose : It is used t o calculat e t he consecut ive ranking of values. T he dat a in t he same row of col2 has
t he same rank.
Descript ion:
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T his paramet er specifies t he value for deciding t he ranking.
Example :
T o obt ain t heir serial number, t he employees must be group by t heir depart ment s and sort ed by SAL in
descending order.
SELECT deptno
, ename
, sal
, DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums
-- DEPTNO (department) is the partition used in the computation, and SAL (salary) is used a
s basis for sorting returned results.
FROM emp;
-- Returned result:
+------------+-------+------------+------------+
| deptno | ename | sal | nums |
+------------+-------+------------+------------+
| 10 | JACCKA | 5000.0 | 1 |
| 10 | King | 5000.0 | 1 |
| 10 | CLARK | 2450.0 | 2 |
| 10 | WELAN | 2450.0 | 2 |
| 10 | TEBAGE | 1300.0 | 3 |
10 | Miller | 1300.0 | 3 |
| 20 | SCOTT | 3000.0 | 1 |
| 20 | Ford | 3000.0 | 1 |
| 20 | JONES | 2975.0 | 2 |
| 20 | ADAMS | 1100.0 | 3 |
| 20 | SMITH | 800.0 | 4 |
| 30 | BLAKE | 2850.0 | 1 |
| 30 | ALLEN | 1600.0 | 2 |
| 30 | TURNER | 1500.0 | 3 |
| 30 | MARTIN | 1250.0 | 4 |
| 30 | WARD | 1250.0 | 4 |
| 30 | JAMES | 950.0 | 5 |
+------------+-------+------------+------------+
6.7.4.11. RANK
Command synt ax :
Purpose : It is used t o ret urn a ranking value. T he ranking of t he same row dat a wit h col2 drops.
Descript ion:
part it ion by col2[, col2..]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: specifies t he rule for deciding t he ranking.
Example :
Now group t he employees by depart ment . Sort t he employees in each group in descending order
based on t he salary. Each employee obt ains a number t hat represent s t heir posit ion in t he group.
SELECT deptno
, ename
, sal
, RANK() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums
-- DEPTNO (department) is the partitioning column. The sal column is sorted to generate the
ranking value for each employee.
FROM emp;
-- Returned result:
+------------+-------+------------+------------+
| deptno | ename | sal | nums |
+------------+-------+------------+------------+
| 10 | JACCKA | 5000.0 | 1 |
| 10 | KING | 5000.0 | 1 |
| 10 | CLARK | 2450.0 | 3 |
| 10 | WELAN | 2450.0 | 3 |
| 10 | TEBAGE | 1300.0 | 5 |
| 10 | MILLER | 1300.0 | 5 |
| 20 | SCOTT | 3000.0 | 1 |
| 20 | FORD | 3000.0 | 1 |
| 20 | JONES | 2975.0 | 3 |
| 20 | ADAMS | 1100.0 | 4 |
| 20 | SMITH | 800.0 | 5 |
| 30 | BLAKE | 2850.0 | 1 |
| 30 | ALLEN | 1600.0 | 2 |
| 30 | TURNER | 1500.0 | 3 |
| 30 | MARTIN | 1250.0 | 4 |
| 30 | WARD | 1250.0 | 4 |
| 30 | JAMES | 950.0 | 6 |
+------------+-------+------------+------------+
6.7.4.12. LAG
Funct ion declarat ion:
lag(expr, bigint offset, default) over(partition by col1[, col2…] [order by col1 [asc|desc]
[, col2[asc|desc]…]])
Purpose : It is used t o ret rieve t he value in t he row wit h a negat ive offset from t he current row. For
example, if t he current row is rn, t he value ret rieved is from t he row rn - offset .
Descript ion:
6.7.4.13. LEAD
Funct ion declarat ion:
Purpose : It is used t o ret rieve t he value in t he row wit h a posit ive offset from t he current row. For
example, if t he current row is rn, t he value ret rieved is from t he row rn + offset .
Descript ion:
Example :
6.7.4.14. PERCENT_RANK
Funct ion declarat ion:
Purpose : It is used t o ret urn t he relat ive ranking of a row in a group of dat a.
Descript ion:
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: specifies t he value for t he ranking.
Ret urned value : double t ype. Value range: 0 t o 1. T he relat ive ranking is calculat ed using t he
following formula: (rank-1)/(number of rows -1).
6.7.4.15. ROW_NUMBER
Funct ion declarat ion:
Descript ion:
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], > col2[asc|desc]: indicat es t he sort ing value of t he ret urned result .
Example :
Now, all employees need t o be grouped by depart ment , and each group must be sort ed in descending
order according t o SAL t o obt ain t he serial number in own group.
SELECT deptno
, ename
,sal
, ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums
-- DEPTNO (department) is the partition used in the computation, and SAL (salary) is used a
s basis for sorting results.
FROM emp;
-- Returned result:
+------------+-------+------------+------------+
| deptno | ename | sal | nums |
+------------+-------+------------+------------+
| 10 | JACCKA | 5000.0 | 1 |
| 10 | KING | 5000.0 | 2 |
| 10 | CLARK | 2450.0 | 3 |
| 10 | WELAN | 2450.0 | 4 |
| 10 | TEBAGE | 1300.0 | 5 |
| 10 | MILLER | 1300.0 | 6 |
| 20 | SCOTT | 3000.0 | 1 |
| 20 | FORD | 3000.0 | 2 |
| 20 | JONES | 2975.0 | 3 |
| 20 | ADAMS | 1100.0 | 4 |
| 20 | SMITH | 800.0 | 5 |
| 30 | BLAKE | 2850.0 | 1 |
| 30 | ALLEN | 1600.0 | 2 |
| 30 | TURNER | 1500.0 | 3 |
| 30 | MARTIN | 1250.0 | 4 |
| 30 | WARD | 1250.0 | 5 |
| 30 | JAMES | 950.0 | 6 |
+------------+-------+------------+------------+
6.7.4.16. CLUSTER_SAMPLE
Command synt ax :
Descript ion:
x: bigint t ype. x>=1. If t he paramet er y is specified, x indicat es t hat a window is divided int o x part s.
Ot herwise, x indicat es t hat x rows of records are ext ract ed from a window (t hat is, t he ret urned value
is t rue if t here are x rows). If x is NULL, NULL is ret urned.
y: a const ant of t he bigint t ype. y>=1, y<=x. T his paramet er ext ract s y records from x part s int o
which a window is divided (t hat is, t he ret urned value is t rue if y records exist ). If y is NULL, NULL is
ret urned.
part it ion by col1[, col2]: specifies t he part it ions used in t he comput at ion.
Example :
T he t est _t bl t able has t wo columns: key and value. T he key column st ores t he group name of each
value. T he group names are groupa and groupb. T he value column st ores t he values. T he t able
st ruct ure is like t his:
+------------+--------------------+
| key | value |
+------------+--------------------+
| groupa | -1.34764165478145 |
| groupa | 0.740212609046718 |
| groupa | 0.167537127858695 |
| groupa | 0.630314566185241 |
| GroupA | 0.0112401388646925 |
| groupa | 0.199165745875297 |
| groupa | -0.320543343353587 |
| groupa | -0.273930924365012 |
| groupa | 0.386177958942063 |
| groupa | -1.09209976687047 |
| groupb | -1.10847690938643 |
| groupb | -0.725703978381499 |
| groupb | 1.05064697475759 |
| groupb | 0.135751224393789 |
| groupb | 2.13313102040396 |
| groupb | -1.11828960785008 |
| groupb | -0.849235511508911 |
| groupb | 1.27913806620453 |
| groupb | -0.330817716670401 |
| groupb | -0.300156896191195 |
| groupb | 2.4704244205196 |
| groupb | -1.28051882084434 |
+------------+--------------------+
Run t he following SQL st at ement t o t ake a sample of 10% of t he values in each group:
select key, value from (select key, value, cluster_sample(10, 1) over(partition by key) as
flag from tbl) sub where flag = true;
-- Returned result:
+--------+--------------------+
| key | value |
+--------+--------------------+
| groupa | -0.273930924365012 |
| groupb | -1.11828960785008 |
+-----+-----------------------
6.7.4.17. NTILE
Funct ion declarat ion:
Purpose : It is used t o split grouped dat a int o n slices and ret urn t he current slice number. If t he slice is
uneven, t he dist ribut ion of t he first slice is increased.
Descript ion:
n: BIGINT t ype.
Ret urned value : BIGINT t ype.
Example :
Group all employees by depart ment , sort each group in descending order by salary, and t hen obt ain
sequence numbers of employees in each group.
6.7.4.18. NTH_VALUE
Funct ion declarat ion:
Purpose : It is used t o ret urn t he nt h value in part it ions used in t he comput at ion.
Descript ion:
Not e If skipNulls is set t o t rue, t he nt h non-NULL value is ret urned. If t he nt h non-NULL value
does not exist , NULL is ret urned.
Example :
select a, nth_value(a + 1, 1) over (partition by a order by a) from values (3), (1), (2) as
t(a);
-- If n is 1, NTH_VALUE is equivalent to FIRST_VALUE.
-- Returned results:
-- 1 2
-- 2 3
-- 3 4
6.7.4.19. CUME_DIST
Funct ion declarat ion:
Purpose : It is used t o ret urn t he cumulat ive dist ribut ion. T he cumulat ive dist ribut ion is t he rat io
bet ween t he number of rows whose values are less t han or equal t o t he current value of t he group and
t he t ot al number of rows in t he group.
Ret urned value : t he rat io of t he number of rows whose values are equal t o or less t han t he current
value in t he group t o t he t ot al number of rows in t he group.
Example :
Group all employees by depart ment , and t hen obt ain t he cumulat ive dist ribut ion of salary for each
group.
SELECT deptno
, ename
, sal
, concat(round(cume_dist() OVER(PARTITION BY deptno ORDER BY sal desc)*100,2),'%') as cume_
dist
FROM emp;
Returned result
6.7.4.20. FIRST_VALUE
Funct ion declarat ion:
Purpose : It is used t o sort part it ions and ret urn t he first value in t he range from t he beginning t o t he
current row.
Descript ion:
Ret urned value : t he first expr value in part it ions used in t he comput at ion.
Example :
Group all employees by depart ment , sort each group in descending order by salary, and t hen obt ain t he
name of t he first employee in each group.
SELECT deptno
, ename
, sal
, FIRST_VALUE(ename) OVER(PARTITION BY deptno ORDER BY sal desc) AS first1-- Obtain
the name of the first employee in each group after descending sorting by salary.
FROM emp;
Returned result
6.7.4.21. LAST_VALUE
Funct ion declarat ion:
Purpose : It is used t o sort part it ions and ret urn t he last value in t he range from t he beginning t o t he
current row.
Descript ion:
Ret urned value : t he last expr value in part it ions used in t he comput at ion.
Example :
Group all employees by depart ment , and t hen obt ain t he name of t he last employee in each group.
SELECT deptno
, ename
, sal
, LAST_VALUE(ename) OVER(PARTITION BY deptno ) AS last1
FROM emp;
Returned result
6.7.5.2. COUNT
Command synt ax :
Descript ion:
dist inct |all: indicat es whet her duplicat e records are cleared in count ing. T he default value is all,
indicat ing t hat records are count ed. If it is set t o dist inct , only records wit h dist inct values are
count ed.
value: any t ype. When it is NULL, t his row is not involved in comput at ion. value can be *. When it is set
t o count (*), t he number of all rows is ret urned.
Example :
+------+
| COL1 |
+------+
| 1 |
+------+
| 2 |
+------+
| NULL |
+------+
select count(*) from tbla;
-- 3 is returned.
select count(col1) from tbla;
-- The value is 2.
Aggregat e funct ions can be used wit h t he GROUP BY st at ement . For example, t able t est _src cont ains
t wo columns: key (st ring t ype), and value (double t ype).
+-----+-------+
| key | value |
+-----+-------+
| a | 2.0 |
+-----+-------+
| a | 4.0 |
+-----+-------+
| b | 1.0 |
+-----+-------+
| b | 3.0 |
+-----+-------+
select key, count(value) as count from test_src group by key;
-- Run the preceding SQL statement. The output is:
+-----+-------+
| key | count |
+-----+-------+
| a | 2 |
+-----+-------+
| b | 2 |
+-----+-------+
Aggregat e funct ions perform aggregat ion on values of t he same key. T he usage of t he following
aggregat e funct ions is t he same as t hat of t his funct ion and is not described in det ail in t his document .
6.7.5.3. AVG
Funct ion declarat ion:
Descript ion:
value: double t ype or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed
int o a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
If t he value is NULL, t his row is not used for calculat ion. T he input cannot be of t he boolean t ype.
Ret urned value : If t he input is of t he decimal t ype, a value of t he decimal t ype is ret urned. For all
ot her valid input t ypes, a value of t he double t ype is ret urned.
Example :
+-------+
| value |
+-------+
| 1 |
| 2 |
| NULL |
+-------+
select avg(value) as avg from tbla;
+------+
| avg |
+------+
| 1.5 |
+------+
-- The avg result of this column is as follows: (1 + 2) / 2 = 1.5.
6.7.5.4. MAX
Funct ion declarat ion:
max(value)
Descript ion:
value: can be any dat a t ype. If t he column value is NULL, t he corresponding row is not involved in t he
operat ion. Values of t he boolean t ype are excluded from t he comput at ion.
Example :
+------+
| col1 |
+------+
| 1 |
+------+
| 2 |
+------+
| NULL |
+------+
select max(value) from tbla;
-- 2 is returned.
6.7.5.5. MIN
Funct ion declarat ion:
MIN(value)
Descript ion:
value: a column of any dat a t ype. If a value in t he column is NULL, t he corresponding row is not involved
in t he operat ion. Boolean t ypes are not allowed in t his operat ion.
Example :
In t he t bla t able, t he value column is of t he bigint t ype.
+------+
| value|
+------+
| 1 |
+------+
| 2 |
+------+
| NULL |
+------+
select min(value) from tbla;
-- 1 is returned.
6.7.5.6. MEDIAN
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned. If t he
input is NULL, a failure is ret urned.
6.7.5.7. STDDEV
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned. If
t he input value is NULL, a failure is ret urned.
6.7.5.8. STDDEV_SAMP
Funct ion declarat ion:
Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned. If
t he input is NULL, a failure is ret urned.
6.7.5.9. SUM
Funct ion declarat ion:
sum(value)
Descript ion:
value: double, decimal, or bigint t ype. If t he input is of t he st ring t ype, it is implicit ly convert ed int o a
value of t he double t ype before comput at ion. If a value in t he column is NULL, t his row is not used for
calculat ion. Values of t he boolean t ype are excluded from calculat ion.
Ret urned value : When t he input is of t he bigint t ype, a value of t he bigint t ype is ret urned. When t he
input is of t he double or st ring t ype, a value of t he double t ype is ret urned.
Example :
+------+
| value|
+------+
| 1 |
+------+
| 2 |
+------+
| NULL |
+------+
select sum(value) from tbla;
-- 3 is returned.
6.7.5.10. WM_CONCAT
Funct ion declarat ion:
Descript ion:
separat or: t he delimit er, which is a const ant of t he st ring t ype. If it is of anot her t ype or is not a
const ant , an error is ret urned.
st r: st ring t ype. If t he input is of t he bigint , double, or dat et ime t ype, it is implicit ly convert ed t o a
value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Not e If t est _src in t he select wm_concat (',', name) from > t est _src; st at ement is an empt y
set , NULL is ret urned.
6.7.5.11. PERCENTILE
Funct ion declarat ion:
Purpose : It is used t o ret urn t he pt h percent ile of t he specified column. p must be bet ween 0 and 1.
Not ice You can only calculat e t rue percent iles for int eger values.
Descript ion:
+------------+
| c1 |
+------------+
| 8 |
| 9 |
| 10 |
| 11 |
+------------+
set odps.sql.type.system.odps2=true;
Not e You must submit and execut e t he SET st at ement and t he SQL st at ement s of t he new
funct ions simult aneously.
T he aggregat e funct ions described in subsequent t opics are new in MaxComput e 2.0.
6.7.5.13. COLLECT_LIST
Command synt ax :
ARRAY collect_list(col)
Descript ion:
6.7.5.14. COLLECT_SET
Command synt ax :
ARRAY collect_set(col)
Purpose : It is used t o convert t he values on t he col column wit h duplicat es removed int o an array.
Descript ion:
col: a t able column of any dat a t ype.
6.7.5.15. VARIANCE/VAR_POP
Funct ion declarat ion:
DOUBLE variance(col)
DOUBLE var_pop(col)
col: numeric t ype column. NULL is ret urned for ot her t ypes.
Example :
+------------+
| c1 |
+------------+
| 8 |
| 9 |
| 10 |
| 11 |
+------------+
6.7.5.16. VAR_SAMP
Funct ion declarat ion:
DOUBLE var_samp(col)
Descript ion:
col: numeric t ype column. NULL is ret urned for ot her t ypes.
+------------+
| c1 |
+------------+
| 8 |
| 9 |
| 10 |
| 11 |
+------------+
6.7.5.17. COVAR_POP
Funct ion declarat ion:
Descript ion:
col1 and col2: numeric t ype columns. NULL is ret urned for ot her t ypes.
Example :
+------------+------------+
| c1 | c2 |
+------------+------------+
| 3 | 2 |
| 14 | 5 |
| 50 | 14 |
| 26 | 75 |
+------------+------------+
6.7.5.18. COVAR_SAMP
Funct ion declarat ion:
Descript ion:
col1 and col2: numeric t ype columns. NULL is ret urned for ot her t ypes.
Example :
+------------+------------+
| c1 | c2 |
+------------+------------+
| 3 | 2 |
| 14 | 5 |
| 50 | 14 |
| 26 | 75 |
+------------+------------+
array(value1,value2, ...)
Descript ion:
Example :
6.7.6.2. ARRAY_CONTAINS
Funct ion declarat ion:
array_contains(ARRAY<T> a, value v)
Descript ion:
a: array t ype.
v: T he given value v must be of t he same t ype as t he dat a in t he array.
Example :
6.7.6.3. CAST
Command synt ax :
cast(expr as <type>)
Purpose : It is used t o convert an expression of one dat a t ype t o anot her. For example, cast ('1' as
bigint ) convert s 1 of t he st ring t ype t o t he int eger t ype. If t he conversion fails, an error is ret urned.
Not e
cast (double as bigint ) convert s a value of t he double t ype int o a value of t he bigint t ype.
cast (st ring as bigint ) convert s a value of t he st ring t ype int o a value of t he bigint t ype. If t he
st ring is composed of numerals expressed in int eger form, it is direct ly convert ed int o a value
of t he bigint t ype. If t he st ring is comprised of numerals expressed in t he 'float ' or
'exponent ' form, it is convert ed t o 'double' t ype first and t hen t o 'bigint ' t ype.
For cast (st ring as dat et ime) or cast (dat et ime as > st ring), t he dat at ime format is yyyy-mm-
dd hh:mi:ss by default .
6.7.6.4. COALESCE
Command synt ax :
Purpose : It is used t o ret urn t he first non-NULL value in t he list . If all values in t he list are NULL, NULL is
ret urned.
Descript ion:
expr: a value t o be t est ed. All t hese values must be of t he same t ype or be NULL. Ot herwise, an error is
ret urned.
6.7.6.5. DECODE
Funct ion declarat ion:
Purpose : It is used t o implement t he if-t hen-else condit ional branching feat ure.
Descript ion:
Ret urned value : T he mat ched search is ret urned. If t here are no mat ches, t he default value is
ret urned. If default is not specified, NULL is ret urned.
Not e
At least t hree paramet ers are specified.
All result s must share t he same t ype or be NULL. Inconsist ent dat a t ypes will cause an error.
All values of search and expression must be of t he same t ype. Ot herwise, an error is
ret urned.
If t he search opt ion in decode has repeat ed records and mat ches t he expression, t he first
search value is ret urned.
Example :
select decode(customer_id,
1, 'Taobao',
2, 'Alipay',
3, 'Aliyun',
NULL, 'N/A',
'Others') as result from sale_detail;
T he preceding DECODE funct ion implement s t he feat ure in t he following if-t hen-else st at ement :
Not ice T he MaxComput e SQL st at ement ret urns NULL when calculat ing NULL = NULL.
However, in t he DECODE funct ion, values of NULL and NULL are equal. In t he preceding example,
when t he value of cust omer_id is NULL, t he DECODE funct ion ret urns N/A.
6.7.6.6. EXPLODE
Funct ion declarat ion:
explode (var)
Purpose : It is used t o convert one row of dat a int o mult iple rows of UDT F. If var is of t he array t ype,
t he array st ored in t he column is convert ed int o mult iple rows. If var is of t he map t ype, each key-value
pair of t he map st ored in t he column is convert ed int o a row wit h t wo columns, wit h one column for t he
key and t he ot her for t he value.
Descript ion:
var: array < T > t ype or map < K,V > t ype.
Not e
Only one UDT F is allowed in a SELECT st at ement , and ot her columns are not allowed.
One select can only have one UDT F and no ot her columns can appear.
Example :
6.7.6.7. GET_IDCARD_AGE
Funct ion declarat ion:
get_idcard_age(idcardno)
Purpose : It is used t o ret urn t he current age based on t he ID card number. T he current age is t he
current year minus t he birt h year on t he ID card.
Descript ion:
idcardno: st ring t ype, ID number of 15-digit or 18-digit . During t he calculat ion, t he validit y of t he ID card
is verified based on t he province code and t he last check code. If t he verificat ion fails, NULL is ret urned.
Ret urned value : bigint t ype. If t he input is NULL, NULL is ret urned. If t he difference of t he current year
minus t he birt h year is great er t han 100, t hen NULL is ret urned.
6.7.6.8. GET_IDCARD_BIRTHDAY
Funct ion declarat ion:
get_idcard_birthday(idcardno)
Descript ion:
idcardno: st ring t ype, a 15-digit or 18-digit ID card number. During comput at ion, t he validit y of t he ID
card is verified based on t he province code and t he last check code. If t he verificat ion fails, NULL is
ret urned.
Ret urned value : dat et ime t ype. If t he input is NULL, NULL is ret urned.
6.7.6.9. GET_IDCARD_SEX
Funct ion declarat ion:
get_idcard_sex(idcardno)
Purpose : It is used t o ret urn t he gender based on t he ID card number. T he ret urned value is M (male) or
F (female).
Descript ion:
idcardno: st ring t ype, a 15-digit or 18-digit ID card number. During comput at ion, t he validit y of t he ID
card is verified based on t he province code and t he last check code. If t he verificat ion fails, NULL is
ret urned.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
6.7.6.10. GREATEST
Funct ion declarat ion:
Descript ion:
var: bigint , double, dat et ime, or st ring t ype. If all values are NULL, NULL is ret urned.
T he great est value in input paramet er. If t he implicit conversion is not needed, ret urn t ype is t he same
as input paramet er t ype.
NULL is int erpret ed as t he minimum value.
If t he input paramet ers are of different t ypes, values of t he double, bigint , and st ring t ypes are
convert ed int o values of t he double t ype for comparison, and values of t he st ring and dat et ime
t ypes are convert ed int o values of t he dat et ime t ype for comparison. Implicit conversion of ot her
t ypes is not allowed.
6.7.6.11. INDEX
Funct ion declarat ion:
index(var1[var2])
Purpose : It is used t o ret urn t he specified element in a given array, or ret urn t he value of t he specified
key in a given map.
Descript ion:
var1: array < T > t ype or map < K,V > t ype.
var2: If var1 is of t he array < T > t ype, var2 must be t he bigint t ype must be larger or equal t o 0. If
var1 is of t he map < K,V > t ype, var2 is of t he K t ype.
If var1 is of t he array < T > t ype, a value of t he T t ype is ret urned. If var2 is out of range of array < T >
element s, NULL is ret urned.
If var1 is of t he map < K,V > t ype, a value of t he V t ype is ret urned. If no key is var2 in map < K,V >,
NULL is ret urned.
Example :
Not ice
T o use t he SQL st at ement , remove t he index and run var1[var2] direct ly. Ot herwise, a synt ax
error is ret urned.
If Var1 is NULL, NULL is ret urned.
6.7.6.12. MAX_PT
Funct ion declarat ion:
max_pt(table_full_name)
Purpose : For part it ioned t ables, it is used t o ret urn t he maximum values in t he first -level part it ions t hat
have dat a files and sort t he values in alphabet ic order.
Descript ion:
t able_full_name: st ring t ype. It specifies a t able name (project name required, for example, prj.src). You
must have t he read permission on t he t able.
Ret urned value : maximum value in t he primary part it ion.
Example :
Part it ioned t able t bl has t he following part it ions wit h dat a files: pt ='20170901' and pt ='20170902'. In
t he following st at ement , t he ret urned value of max_pt is '20170902'. T he MaxComput e SQL st at ement
reads dat a from t he '20120902' part it ion.
Not e If a new part it ion is added by using alt er t able, but t here is no dat a file in t his part it ion,
t hen t his part it ion is not ret urned.
6.7.6.13. ORDINAL
Funct ion declarat ion:
Purpose : It is used t o sort t he input variables in ascending order, and ret urn t he specified nt h value.
Descript ion:
nt h: bigint t ype. It specifies t he posit ion at which t he value is t o be ret urned. If it is NULL, NULL is
ret urned.
var: bigint , double, dat et ime, or st ring t ype.
T he value in nt h bit . If t he implicit conversion is not needed, ret urn t ype is t he same as input
paramet er t ype.
If t ype conversion is performed, values of t he double, bigint , and st ring t ypes are convert ed int o
values of t he double t ype. Values of t he st ring and dat et ime t ypes are convert ed int o values of t he
dat et ime t ype. Implicit conversion of ot her t ypes is not allowed.
NULL is t he least value.
Example :
ordinal(3, 1, 3, 2, 5, 2, 4, 6) = 2
6.7.6.14. LEAST
Funct ion declarat ion:
Descript ion:
var: bigint , double, dat et ime, or st ring t ype. If all values are NULL, NULL is ret urned.
T he least value in input paramet er; If t he implicit conversion is not needed, ret urn t ype is t he same as
input paramet er t ype.
If t ype conversion is performed, values of t he double, bigint , and st ring t ypes are convert ed int o
values of t he double t ype. Values of t he st ring and dat et ime t ypes are convert ed int o values of t he
dat et ime t ype. Implicit conversion of ot her t ypes is not allowed.
NULL is int erpret ed as t he minimum value.
6.7.6.15. SIZE
Funct ion declarat ion:
size(map<K, V>)
size(array<T>)
Purpose : size(map) is used t o ret urn t he number of key-value pairs in t he given map, and size(array) is
used t o ret urn t he number of element s in t he given array.
Descript ion:
Example :
6.7.6.16. SPLIT
Funct ion declarat ion:
split(str, pat)
Descript ion:
Example :
6.7.6.17. STR_TO_MAP
Funct ion declarat ion:
Purpose : It is used t o divide 't ext ' int o K-V pairs wit h 'delimit er1', and t o separat e each K-V pair wit h
'delimit er2'.
Descript ion:
delimit er1: st ring t ype. It is t he delimit er. If it is not specified, t he default value ',' is used.
delimit er2: st ring t ype. It is t he delimit er. If it is not specified, t he default value '=' is used.
Ret urned value : map < st ring, st ring >. T he element s are t he K-V result s of t he separat ion of 't ext ' by
t he st rings 'delimit er1' and 'delimit er2'.
Example :
6.7.6.18. UNIQUE_ID
Funct ion declarat ion:
STRING UNIQUE_ID()
Purpose : It is used t o ret urn a random but unique ID, for example, 29347a88-1e57-41ae-bb68-
a9edbdd94212_1. T his funct ion runs more efficient ly t han UUID.
6.7.6.19. UUID
Funct ion declarat ion:
string uuid()
6.7.6.20. SAMPLE
Funct ion declarat ion:
Purpose : It is used t o sample all values read from t he specified column based on t he given set t ings,
and filt ers out t he rows t hat do not meet t he sampling condit ion.
Descript ion:
x, y: bigint t ype. It indicat es t hat dat a is hashed t o x port ions and t he yt h port ion is t aken. y can be
omit t ed. If y is omit t ed, t he first port ion is t aken and column_name must also be omit t ed. x and y are
const ant s of t he int eger t ype and are great er t han 0. If t hey are of anot her t ype or if t hey are less
t han or equal t o 0, an error is ret urned. If y is great er t han x, an error is ret urned. If eit her x or y is NULL,
Not e T o avoid dat a skew result ing from t he NULL value, a uniform hash of x is made for a
value of NULL in column_name. If column_name is not added, t he out put is not necessarily uniform
since t he dat a size is smaller. So column_name is suggest ed t o be added t o get bet t er out put .
Example :
case value
when (_condition1) then result1
when (_condition2) then result2
...
else resultn
end
case
when (_condition1) then result1
when (_condition2) then result2
when (_condition3) then result3
...
else resultn
end
CASE WHEN flexibly ret urns different values based on t he calculat ion result of t he expression. Alibaba
Cloud St reamComput e support s t wo t ypes of CASE WHEN expressions:
select
case
when shop_name is null then 'default_region'
when shop_name like 'hang%' then 'zj_region'
end as region
From sale_detail;
Not e
If t here are values of only t he bigint and double t ype in t he result s, t he result s are convert ed
int o values of t he double t ype.
If t here is a value of t he st ring t ype in t he result s, t he result s are all convert ed int o values of
t he st ring t ype. If t he result of a t ype cannot be convert ed (for example, boolean t ype), an
error is ret urned.
Conversion bet ween ot her t ypes is not allowed.
6.7.6.22. IF
Funct ion declarat ion:
Purpose : It is used t o det ermine whet her 't est Condit ion' is t rue. If it is t rue, valueT rue is ret urned. If it is
not t rue, valueFalseOrNull is ret urned.
Descript ion:
t est Condit ion: boolean t ype. T he expression t o be det ermined t rue or not .
valueT rue: t he value ret urned when expression 't est Condit ion' is t rue.
valueFalseOrNull: t he value ret urned when expression 't est Condit ion' is false. It can be set t o NULL.
Example :
6.7.6.24. MAP
Funct ion declarat ion:
Descript ion:
key/value: T he t ypes of all keys are t he same and must be of one of t he basic t ypes. T he t ypes of all
values are t he same and can be of any t ype.
Example :
6.7.6.25. MAP_KEYS
Funct ion declarat ion:
map_keys(map<K, V> )
Descript ion:
Ret urned value : array t ype. If t he input is NULL, NULL is ret urned.
Example :
6.7.6.26. MAP_VALUES
Funct ion declarat ion:
map_values(map<K, V>)
Descript ion:
Ret urned value : array t ype. If t he input is NULL, NULL is ret urned.
Example :
6.7.6.27. SORT_ARRAY
Funct ion declarat ion:
sort_array(ARRAY<T>)
Descript ion:
Example :
select sort_array(array('a','c','f','b')),sort_array(array(4,5,7,2,5,8)),sort_array(array('
You','Me','He')) from dual;
-- Returned result:
[a, b, c, f] [2, 4, 5, 5, 7, 8] [him, you, me]
6.7.6.28. POSEXPLODE
Command synt ax :
posexplode(ARRAY<T>)
Purpose : It is used t o explode t he given array. Each value is given a row and each row has t wo columns
corresponding t o t he subscript (st art ing from 0) and t he array element .
Descript ion:
Example :
6.7.6.29. STRUCT
Funct ion declarat ion:
struct(value1,value2, ...)
Descript ion:
Ret urned value : st ruct t ype. T he field names of t he creat ed st ruct are col1, col2, and so on.
Example :
6.7.6.30. NAMED_STRUCT
Funct ion declarat ion:
Descript ion:
Ret urned value : st ruct t ype. T he field names of t he creat ed st ruct are name1, name2, and so on.
Example :
6.7.6.31. INLINE
Funct ion declarat ion:
Purpose : It is used t o expand a st ruct , wit h each element corresponding t o a row, and each st ruct
element in each row corresponding t o a column.
Descript ion:
Example :
select inline(array(named_struct('user_id',10001,'user_name','bob','married','F','weight',6
3.50))) from dual;
-- Returned result:
+------------+-----------+---------+------------+
| user_id | user_name | married | weight |
+------------+-----------+---------+------------+
| 10001 | bob | F | 63.5 |
+------------+-----------+---------+------------+
If A, B, or C is NULL, t hen t he value is NULL. If A is great er t han or equal t o B, and less t han or equal t o C,
t he value is t rue. Ot herwise, t he value is false.
Example :
Run t he following command t o query dat a where sal is great er t han or equal t o 1,000 and less t han or
equal t o 1,500:
6.7.6.33. NVL
Funct ion declarat ion:
Purpose : It is used t o ret urn default _value if value is NULL and ret urn value ot herwise.
Example :
T able t _dat a has t hree columns of c1 st ring, c2 bigint , and c3 dat et ime, as well as t he following dat a:
+----+------------+------------+
| c1 | c2 | c3 |
+----+------------+------------+
| NULL | 20 | 2017-11-13 05:00:00 |
| ddd | 25 | NULL |
| bbb | NULL | 2017-11-12 08:00:00 |
| aaa | 23 | 2017-11-11 00:00:00 |
+----+------------+------------+
Use t he NVL funct ion t o out put t he NULL values in c1 t o 00000, t he NULL values in c2 t o 0, and t he NULL
values in c3 t o "-".
6.7.6.34. TABLE_EXISTS
Descript ion: T his funct ion checks whet her a specific t able exist s.
Paramet ers:
t able_name: t he t able name of t he ST RING t ype. T he value can include t he project name, such as
my_proj.my_t able. If no project name is specified, t he name of t he current project is used.
Ret urn value : A value of t he BOOLEAN t ype is ret urned. If t he specified t able exist s, T rue is ret urned.
Ot herwise, False is ret urned.
Example :
6.7.6.35. PARTITION_EXISTS
Funct ion declarat ion:
Descript ion: T his funct ion checks whet her a specific part it ion exist s.
Paramet ers:
t able_name: t he t able name of t he ST RING t ype. T he value can include t he project name, such as
my_proj.my_t able. If no project name is specified, t he name of t he current project is used.
part it ions: t he part it ion names of t he ST RING t ype. Set t his paramet er t o t he part it ioning column
values based on part it ion key columns in sequence. T he number of part it ion names must be t he same
as t hat of part it ion key columns.
Ret urn value : A value of t he BOOLEAN t ype is ret urned. If t he specified part it ions exist , T rue is
ret urned. Ot herwise, False is ret urned.
Example :
6.8. UDFs
245 > Document Version: 20220928
MaxComput e User Guide· MaxComput e SQL
6.8.1. Overview
UDF is short for user defined funct ion. MaxComput e provides a variet y of built -in funct ions. You can also
creat e UDFs based on specific comput ing requirement s. You can use UDFs as using common built -in
funct ions. T his t opic briefs how t o use SQL UDFs. For more informat ion about SQL UDFs, see t he official
document at ion on UDFs.
UDF category
Not e In general, UDFs refer t o all user defined funct ions: UDFs, UDAFs, and UDT Fs. In a narrow
sense, UDFs only refer t o user defined scalar funct ions. T his t erm is used int erchangeably in t his
document . You will have t o det ermine t he exact meaning based on t he cont ext .
Basic dat a t ypes: BIGINT , DOUBLE, BOOLEAN, DAT ET IME, DECIMAL, ST RING, T INYINT , SMALLINT , INT ,
FLOAT , VARCHAR, BINARY, and T IMEST AMP.
Complex dat a t ypes: ARRAY, MAP, and ST RUCT .
Not e In UDFs, you can define t he writ able at t ribut e of paramet ers.
T he usage of some basic dat a t ypes (such as T INYINT , SMALLINT , INT , FLOAT , VARCHAR, BINARY, and
T IMEST AMP) in Java UDFs is as follows:
UDAFs and UDT Fs use t he @Resolve annot at ion t o obt ain signat ures. Example: @Resolve("smallint
->varchar(10)") .
UDFs reflect and analyze t he evaluat e() met hod t o obt ain signat ures. In t his case, t here is a one-t o-
one mapping bet ween MaxComput e built -in t ypes and Java t ypes.
T o use complex dat a t ypes (ARRAY, MAP, and ST RUCT ) in Java UDFs, t ake t he following st eps:
UDT Fs use t he @Resolve annot at ion t o specify signat ures. Example: @Resolve("array<string>,stru
ct<a1:bigint,b1:string>,string->map<string,bigint>,struct<b1:bigint>") .
UDFs use t he signat ure of t he evaluat e() met hod t o map t he input and out put t ypes. For more
informat ion, see t he mappings bet ween MaxComput e t ypes and Java t ypes. In t he preceding
example, ARRAY corresponds t o java.ut il.List , MAP corresponds t o java.ut il.Map, and ST RUCT
corresponds t o com.aliyun.odps.dat a.St ruct .
UDAFs and UDT Fs use t he @Resolve annot at ion t o obt ain signat ures. Example: @Resolve("smallint
->varchar(10)") .
Not ice
You can use type,* t o add any number of paramet ers. Example: @resolve("st ring,*-
>array<st ring>"). Not e t hat you must add a subt ype aft er array.
T he field name and field t ype of com.aliyun.odps.dat a.St ruct cannot be reflect ed.
T herefore, t he @Resolve annot at ion is required. If you want t o use st ruct in a UDF, you must
add t he @Resolve annot at ion t o t he UDF class. T his annot at ion only affect s t he overloads
of paramet ers or ret urned values t hat cont ain com.aliyun.odps.dat a.St ruct .
A class support s only one @Resolve annot at ion. A UDF t hat cont ains st ruct can only reload
paramet ers or ret urned values once.
T he following t able list s t he mapping bet ween MaxComput e and Java dat a t ypes.
SMALLINT java.lang.Short
INT java.lang.Integer
BIGINT java.lang.Long
FLOAT java.lang.Float
DOUBLE java.lang.Double
DECIMAL java.math.BigDecimal
BOOLEAN java.lang.Boolean
ST RING java.lang.String
V ARCHAR com.aliyun.odps.data.Varchar
BINARY com.aliyun.odps.data.Binary
ARRAY java.util.List
MAP java.util.Map
ST RUCT com.aliyun.odps.data.Struct
Not e
Java dat a t ypes and t he dat a t ypes of ret urned values are object s, and must st art wit h a
capit alized let t er.
T he NULL value in SQL is represent ed by a NULL reference in Java. T he Java primit ive t ype is
not allowed because it cannot represent a NULL value in SQL.
T he ARRAY t ype in MaxComput e corresponds t o a list , not an array, in Java.
Read Read
Supported DAT ET IME
UDF UDAF UDT F resource resource
language type
file table
6.8.3. UDFs
A UDF must inherit t he com.aliyun.odps.udf.UDF class and implement t he EVALUAT E met hod. T he
EVALUAT E met hod must be a non-st at ic public met hod. T he t ypes of paramet ers and ret urned values
of t he EVALUAT E met hod are used as t he UDF signat ures in SQL. T his means t hat users can implement
mult iple EVALUAT E met hods in a UDF. When a UDF is called, t he framework mat ches t he correct
EVALUAT E met hod based on t he paramet er t ype called by t he UDF.
Example :
package org.alidata.odps.udf.examples;
import com.aliyun.odps.udf.UDF;
public final class Lower extends UDF { public String evaluate(String s) { if (s == null) {
return null; } return s.toLowerCase();
}
}
Not e You can implement void setup(ExecutionContext ctx) and void close() to
implement UDF init ializat ion and t erminat ion code, respect ively.
UDFs are used in t he same way as built -in funct ions in MaxComput e SQL. For more informat ion, see Built -
in funct ions.
6.8.4. UDAFs
T o implement a Java UDAF, you must inherit t he com.aliyun.odps.udf.UDAF class and implement t he
following APIs:
T he most import ant APIs are it erat e, merge, and t erminat e. T he primary logic of UDAFs relies on t he
implement at ion of t hese t hree APIs. In addit ion, you must implement a cust om writ able buffer. As an
example, t he following figure briefly illust rat es t he implement at ion logic and comput at ional flow of
t he avg (average value) MaxComput e UDAF funct ion.
In t he preceding figure, t he input dat a is sliced by a cert ain size (for descript ion of slicing, see
MapReduce). T he size of each slice is suit able for a worker t o complet e in an appropriat e period of t ime.
You need t o manually configure t he size of t he slices.
T he UDAF calculat ion process is divided int o t wo phases:
Phase 1: Each Worker count s t he number of dat a rows and t he sum of t he dat a in each slice. T he user
can regard t he count ed number and sum as an int ermediat e result .
Phase 2: T he Worker summarizes t he informat ion gained from t he previous phase wit hin each slice. In
t he final out put , r.sum / r.count is t he average of all input dat a.
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import com.aliyun.odps.io.DoubleWritable;
import com.aliyun.odps.io.Writable;
import com.aliyun.odps.udf.Aggregator;
import com.aliyun.odps.udf.UDFException;
import com.aliyun.odps.udf.annotation.Resolve;
@Resolve({"double->double"})
public class AggrAvg extends Aggregator {
private static class AvgBuffer implements Writable { private double sum = 0;
private long count = 0;
@Override
public void write(DataOutput out) throws IOException { out.writeDouble(sum);
out.writeLong(count);
}
@Override
public void readFields(DataInput in) throws IOException { sum = in.readDouble();
count = in.readLong();
}
}
private DoubleWritable ret = new DoubleWritable();
@Override
public Writable newBuffer() { return new AvgBuffer();
}
@Override
public void iterate(Writable buffer, Writable[] args) throws UDFException { DoubleWritable
arg = (DoubleWritable) args[0];
AvgBuffer buf = (AvgBuffer) buffer; if (arg ! = null) {
buf.count += 1; buf.sum += arg.get();
}
}
@Override
public Writable terminate(Writable buffer) throws UDFException { AvgBuffer buf = (AvgBuffer
) buffer;
if (buf.count == 0) { ret.set(0);
} else {
ret.set(buf.sum / buf.count);
}
return ret;
}
@Override
public void merge(Writable buffer, Writable partial) throws UDFException { AvgBuffer buf =
(AvgBuffer) buffer;
AvgBuffer p = (AvgBuffer) partial; buf.sum += p.sum;
buf.count += p.count;
}
}
Not ice
T he SQL synt ax used by UDAFs is t he same as t hat used by common built -in aggregat e funct ions. For
6.8.5. UDTFs
6.8.5.1. Overview
Java UDT Fs must inherit t he com.aliyun.odps.udf.UDT F class. T his class requires t he implement at ion of
four APIs. T he following t able list s t he definit ions of t hese APIs.
API definitions
API Description
UDT F example :
package org.alidata.odps.udtf.examples;
import com.aliyun.odps.udf.UDTF;
import com.aliyun.odps.udf.UDTFCollector;
import com.aliyun.odps.udf.annotation.Resolve;
import com.aliyun.odps.udf.UDFException;
// TODO define input and output types, e.g., "string,string->string,bigint".
@Resolve({"string,bigint->string,bigint"}) public class MyUDTF extends UDTF {
@Override public void process(Object[] args) throws UDFException { String a = (String) args
[0];
Long b = (Long) args[1];
for (String t: a.split("\\s+")) { forward(t, b);
}
}
}
T he preceding example shows how t o creat e a UDT F in MaxComput e. If t his UDT F is named user_udt f,
you can run t he following SQL st at ement t o call t his UDT F:
+------+------+
| col0 | col1 |
+------+------+
| A B | 1 |
| C D | 2 |
+------+------+
+----+----+
| c0 | c1 |
+----+----+
| A | 1 |
| B | 1 |
| C | 2 |
| D | 2 |
+----+----+
Not ice
UDT F examples
T he user can use a UDT F t o read MaxComput e resources. T he following are examples of reading
MaxComput e resources by using UDT Fs:
1. Writ e UDT F program. T he JAR package (udt fexample1.jar) is export ed aft er compilat ion.
package com.aliyun.odps.examples.udf;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.Iterator;
import com.aliyun.odps.udf.ExecutionContext;
import com.aliyun.odps.udf.UDFException;
import com.aliyun.odps.udf.UDTF;
import com.aliyun.odps.udf.annotation.Resolve;
/**
* project: example_project
* table: wc_in2
* partitions: p2=1,p1=2
* columns: colc,colb
*/
@Resolve({ "string,string->string,bigint,string" }) public class UDTFResource extends U
DTF { ExecutionContext ctx;
long fileResourceLineCount;
long tableResource1RecordCount;
long tableResource2RecordCount;
@Override
public void setup(ExecutionContext ctx) throws UDFException { this.ctx = ctx;
try {
InputStream in = ctx.readResourceFileAsStream("file_resource.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
fileResourceLineCount = 0;
while ((line = br.readLine()) ! = null) { fileResourceLineCount++;
}
br.close();
Iterator<Object[]> iterator = ctx.readResourceTable("table_resource1").iterator();
tableResource1RecordCount = 0;
while (iterator.hasNext()) { tableResource1RecordCount++; iterator.next();
}
iterator = ctx.readResourceTable("table_resource2").iterator();
tableResource2RecordCount = 0;
while (iterator.hasNext()) { tableResource2RecordCount++;
iterator.next();
}
} catch (IOException e) { throw new UDFException(e);
}
}
@Override
public void process(Object[] args) throws UDFException { String a = (String) args[0];
long b = args[1] == null ? 0 : ((String) args[1]).length();
forward(a, b, "fileResourceLineCount=" + fileResourceLineCount + "|tableResource1Record
Count=" + tableResource1RecordCount + "|tableResource2RecordCount=" + tableResource2Rec
ordCount);
}
}
4. Creat e resource t ables 't able_resource1' and 't able_resource2' in MaxComput e, and insert t he
corresponding dat a.
5. Run t his UDT F.
Not e You can also use t he same met hod t o obt ain resources. For more informat ion, see
MapReduce examples.
T he code in t he following example defines a UDF wit h t hree overloads. T he first overload uses array as
t he paramet er; t he second uses map as t he paramet er; and t he t hird uses st ruct as t he paramet er. T he
t hird overload uses a st ruct t ype as t he paramet er or ret urned value, t he UDF class must be
supplement ed wit h a @Resolve annot at ion t o specify t he specific t ype of st ruct .
@Resolve("struct<a:bigint>,string->string")
public class UdfArray extends UDF {
public String evaluate(List<String> vals, Long len) {
return vals.get(len.intValue());
}
public String evaluate(Map<String,String> map, String key) {
return map.get(key);
}
public String evaluate(Struct struct, String key) {
return struct.getFieldValue("a") + key;
}
}
In addit ion, not all modules in t he Pyt hon st andard library are available for use. Modules t hat involve t he
preceding feat ures are disabled. Descript ion of available modules in t he st andard library:
_json
_locale
_lsprof
mat h
_md5
_mult ibyt ecodec
operat or
_random
_sha256
_sha512
_sha
_st ruct
st rop
t ime
unicodedat a
_weakref
cPickle
3. Some modules have limit ed funct ionalit y. For example, t he sandbox limit s t he size t hat user codes
can writ e t o t he st andard out put and st andard error out put . sys.stdout and sys.stderr can
writ e up t o 20 KB. Any remaining charact ers are ignored.
Warning T he use of t hird-part y libraries is also subject t o rest rict ions. For example, local or
remot e I/O operat ions are prohibit ed. T herefore, t he relat ed APIs in t he t hird-part y libraries are
disabled.
@odps.udf.annotate(signature)
Pyt hon UDFs support t he following MaxComput e SQL dat a t ypes: bigint , st ring, double, boolean, and
dat et ime. Before you run a SQL st at ement , you must specify t he paramet er t ypes and ret urned value
t ypes of all funct ions. Pyt hon is a dynamically-t yped language. You need t o add decorat ors t o t he UDF
class t o specify t he funct ion signat ure.
Not e
T he part t o t he left of t he arrow indicat es t he t ype of paramet er. T he part t o t he right of
t he arrow indicat es t he t ype of ret urned value.
T he ret urned value of a UDT F can cont ain mult iple columns. T he ret urned value of a UDF or
UDAF can cont ain only one column.
* represent s a variable argument . If a variable argument is specified, t he UDF, UDT F, or UDAF
can mat ch any t ype of paramet er.
'bigint,double->string'
-- The parameter is of the bigint or double type, and the returned value is of the string t
ype.
'bigint,boolean->string,datetime'
-- The UDTF parameter is of the bigint or boolean type, and the returned value is of the st
ring or datetime type.
'*->string'
-- Specify a variable argument: The input parameter can be of any type, and the returned va
lue is of the string type.
'->double'
-- The parameter is NULL and the returned value is of the double type.
If an invalid signat ure is found during query parsing, an error is ret urned and t he execut ion is banned.
During execut ion, t he UDF paramet er wit h t he t ype specified by t he funct ion signat ure is t ransferred t o
t he user. T he user ret urned value must be of t he t ype specified by t he funct ion signat ure. Ot herwise, an
error is ret urned. T he following t able shows t he mappings bet ween MaxComput e SQL t ypes and Pyt hon
t ypes.
Mapping
Bigint int
String str
Double float
Boolean bool
Datetime int
Not e
A value of t he dat et ime t ype is passed t o user code as t he int t ype. T he value is t he number
of milliseconds t hat have elapsed since t he epoch t ime. You can use t he dat et ime module in
t he Pyt hon st andard library t o process t he dat et ime t ype.
NULL corresponds t o none in Pyt hon.
In addit ion, t he paramet er of odps.udf.int (value[, silent =T rue]) is modified. Paramet er silent is added. If
silent is t rue and t he value cannot be convert ed t o t he int t ype, none is ret urned inst ead of an error.
6.8.6.4. UDFs
Implement ing a Pyt hon UDF is as easy as defining a new-st yle class and implement ing t he evaluat e
met hod.
Example :
Not ice A Pyt hon UDF must have it s signat ure specified t hrough annot at e.
6.8.6.5. UDAFs
Descript ion:
#coding:utf-8
from odps.udf import annotate
from odps.udf import BaseUDAF
@annotate('double->double')
class Average(BaseUDAF):
def new_buffer(self):
return [0, 0]
def iterate(self, buffer, number):
If number is not None:
buffer[0] += number
buffer[1] += 1
def merge(self, buffer, pbuffer):
buffer [0] + = pbuffer [0]
buffer [1] + = pbuffer [1]
def terminate (self, buffer ):
If buffer [1] = 0:
return 0.0
return buffer[0] / buffer[1]
6.8.6.6. UDTFs
T he paramet ers are described as follows.
Parameters
Parameter Description
Base class for a Python UDT F. Users inherit this class and
class o dps.udf .BaseUDT F
implement methods such as PROCESS and CLOSE.
Example :
#coding:utf-8
# explode. py
from odps.udf import annotate
Not ice A Pyt hon UDT F can also specify t he paramet er t ype or ret urned value t ype wit hout
adding 'annot at e'. In t his case, t he funct ion can mat ch any input paramet er in SQL. T he t ype of
ret urned value cannot be deduced, but all out put paramet ers will be considered t o be of t he st ring
t ype. T herefore, when FORWARD is called, all out put values must be convert ed int o values of t he
st ring t ype.
odps.distcache.get_cache_file(resource_name)
Not e
Descript ion: ret urns t he cont ent of t he specified resource. resource_name is a st ring t hat
corresponds t o t he name of an exist ing resource in t he current project . If t he resource name
is invalid or does not exist , an error is ret urned.
Ret urned value: ret urns file-like object . Aft er t his object is used, t he caller must call t he
CLOSE met hod t o release t he resource file t hat is opened.
Example :
odps.distcache.get_cache_table(resource_name)
Not e
Descript ion: ret urns t he cont ent of t he specified resource t able. resource_name is a st ring
t hat corresponds t o t he name of an exist ing resource t able in t he current project . If t he
resource t able name is invalid or does not exist , an error is ret urned.
Ret urned value: ret urns a value of t he generat or t ype. T he caller t raverses t he t able t o
obt ain t he cont ent . Each t ime t he caller t raverses t he t able, a record is obt ained in t he form
of a t uple.
Example :
6.9. UDTs
> Document Version: 20220928 262
User Guide· MaxComput e SQL MaxComput e
6.9.1. Overview
User-defined t ypes (UDT s) are int roduced in MaxComput e 2.0 for t he lat est version of t he SQL engine.
UDT s allow you t o reference classes or object s of t hird-part y languages in SQL st at ement s t o obt ain
dat a or call met hods.
Scenario 1 : MaxComput e does not have built -in funct ions t o complet e t asks t hat can be easily
performed using ot her languages. For example, t here are some t asks t hat can be performed by
calling a single built -in Java class. Performing t hese t asks wit h user defined funct ions (UDFs) is
complex.
Scenario 2 : You need t o call a t hird-part y library in SQL st at ement s t o implement t he corresponding
feat ure. You want t o use a feat ure provided by a t hird-part y library direct ly in a SQL st at ement ,
inst ead of wrapping t he feat ure inside a UDF.
Scenario 3 : SELECT T RANSFORM allows you t o include object s and classes in SQL st at ement s t o
make t hese SQL st at ement s easier t o read and maint ain. For some languages, such as Java, t he source
code can be only execut ed aft er it is compiled. You want t o reference object s and classes of t hese
languages in SQL st at ement s.
Not ice
UDT s only support Java.
All operat ors use t he semant ics of MaxComput e SQL.
UDT s cannot be used as shuffle keys in t he JOIN, GROUP BY, DIST RIBUT E BY, SORT BY, ORDER
BY, and CLUST ER BY clauses.
DDL st at ement s do not support UDT s. You cannot creat e t ables t hat cont ain UDT object s.
T he final out put cannot be UDT t ypes.
T he UDT s support ed in MaxComput e are very different from t hose in ot her SQL engines.
UDT s support ed by ot her SQL engines are similar t o t he st ruct composit e t ype in MaxComput e. UDT s
support ed by MaxComput e are similar t o t he CREAT E T YPE st at ement . A UDT cont ains bot h fields and
met hods. Addit ionally, MaxComput e does not require t hat you use Dat a Definit ion Language (DDL)
st at ement s t o define t ype mappings. MaxComput e allows you t o reference t ypes direct ly in SQL
st at ement s.
Example:
set odps.sql.type.system.odps2=true;
SELECT Integer.MAX_VALUE;
-- A similar output is displayed:
+-----------+
| max_value |
+-----------+
| 2147483647 |
+-----------+
You can use UDFs t o implement all feat ures provided by UDT s, but wit h some complexit y. If you use a
UDF t o implement t he same feat ure, you need t o follow t hese st eps:
package com.aliyun.odps.test;
public class IntegerMaxValue extends com.aliyun.odps.udf.UDF {
public Integer evaluate() {
return Integer.MAX_VALUE;
}
}
2. Compile t he UDF as a JAR package. Upload t he JAR package and creat e a funct ion.
select integer_max_value();
A UDT simplifies t his procedure. By using UDT s, you can use feat ures provided by ot her languages in SQL
st at ement s.
-- Sample data
@table1 := select * from values ('100000000000000000000') as t(x);
@table2 := select * from values (100L) as t(y);
-- Code logic
@a := select new java.math.BigInteger(x) x from @table1; -- Create an object by us
ing the new method.
@b := select java.math.BigInteger.valueOf(y) y from @table2; -- Call a static method.
select /*+mapjoin(b)*/ x.add(y).toString() from @a a join @b b; -- Call an instance metho
d.
Not e T his example also shows how t o use subqueries wit h UDT columns. User-defined
funct ions (UDFs) cannot be used in such subqueries. Variable a in t he x column is of t he
java.mat h.BigInt eger class, not a built -in class. You can pass UDT dat a t o anot her operat or and
t hen call t he required met hod. You can also use UDT dat a in dat a shuffling.
UDT execution
Example
T he preceding figure shows t he t hree st ages of a UDT : M1, R2, and J3. Only t he new
java.math.BigInteger(x) met hod is called at t he M1 st age. T he java.math.BigInteger.valueOf(y) and
x.add(y).toString() met hods are called at t he J3 st age.
If a JOIN clause is used in MapReduce, dat a must be reshuffled. As a result , dat a is processed at mult iple
st ages. Dat a is processed at different st ages or even by different processes or physical machines. T he
UDT encapsulat es t hese st ages and funct ions as a JVM.
Description
UDT s support only Java.
UDT s also allow you t o upload JAR packages and direct ly reference t hese packages. Some flags are
provided for UDT s.
set odps.sql.type.system.odps2=true;
set odps.sql.session.resources=odps-test.jar; -- Specify the JAR package that you want
to reference. Before you reference the JAR package, upload the package to your project.
select new com.aliyun.odps.test.IntegerMaxValue().evaluate();
Not ice T his flag is t he same as t he flag t hat is used t o specify resources in t he SELECT
T RANSFORM st at ement . T herefore, t his flag affect s JAR package uploading in UDT s and
resource set t ings in t he SELECT T RANSFORM st at ement .
set odps.sql.type.system.odps2=true;
set odps.sql.session.resources=odps-test.jar;
set odps.sql.session.java.imports=com.aliyun.odps.test. *; -- Specify the default Java
package.
select new IntegerMaxValue().evaluate();
Call met hods, including st at ic met hods. You can creat e object s in t he fact ory met hod pat t ern.
Access fields, including st at ic fields.
Not ice
Ident ifiers in UDT s cont ain package names, class names, met hod names, and field names.
All ident ifiers are case-sensit ive.
UDT s support SQL t ype conversions, such as cast (1 as java.lang.Object ). UDT s do not
support Java t ype conversions, such as (Object )1.
Anonymous classes and lambda expressions are not support ed.
Funct ions t hat do not ret urn values cannot be called in UDT s.
Not e UDT s are used in expressions. Funct ions t hat do not ret urn values cannot
be called in expressions.
All SDK for Java classes can be referenced by UDT s. T he JDK runt ime environment is JDK 1.8. Lat er
versions may not be support ed.
All operat ors use t he semant ic of MaxComput e SQL. T he result of String.valueOf(1) + String.val
ueOf(2) is 3. T he t wo st rings are implicit ly convert ed t o DOUBLE-t ype values and summed. If you use
You may be confused about t he role of t he = operat or. T he = operat or in SQL st at ement s is
used as a comparison operat or. It is used t o compare one expression wit h anot her expression. You
must call t he equals met hod in Java t o compare whet her t wo object s are equivalent . T he = operat or
cannot be used t o verify t he equivalence of t wo object s.
Java dat a t ypes are mapped t o built -in dat a t ypes. T he mapping can be applied t o UDT s.
You can direct ly call t he met hod of t he Java t ype t o which t he built -in t ype is mapped. Example:
'123'.length() , 1L.hashCode() .
UDT s can be used in built -in funct ions and UDFs. For example, in chr(Long.valueOf('100')) , Lo
ng.valueOf ret urns a value of t he java.lang.Long t ype. T he CHR built -in funct ion support s t he
built -in BIGINT t ype.
T he dat a of a Java primit ive t ype is aut omat ically convert ed t o t he boxing t ype and t he preceding
t wo rules are applied.
Not ice For some new built -in dat a t ypes, you must use set odps.sql.type.system.odps2
=true; t o declare t hese t ypes. Ot herwise, an error occurs.
UDT s complet ely support Java generics. For example, based on t he paramet er t ype, t he compiler can
det ermine t hat t he value ret urned by java.util.Arrays.asList(new java.math.BigInteger('1'))
is java.util.List<java.math.BigInteger> .
Not ice You must specify t he t ype paramet er in a const ruct or funct ion or use
java.lang.Object . T his is t he same as Java. For example, t he result of new java.util.ArrayList(j
ava.util.Arrays.asList('1', '2')) is of t he java.util.ArrayList<Object> t ype. T he result
of new java.util.ArrayList<String>(java.util.Arrays.asList('1', '2')) is of t he java.u
til.ArrayList<String> t ype.
UDT s do not have a clear definit ion of object equalit y . T his is caused by dat a reshuffling. T he JOIN
example shows t hat object s may be t ransmit t ed bet ween different processes or physical machines.
During t ransmission, an object may be referenced as t wo different object s. For example, an object
may be shuffled t o t wo machines and t hen reshuffled.
T herefore, when you use UDT s, you must use t he equals met hod inst ead of t he = operat or t o equat e
t wo object s.
Not e Object s in t he same row or column are correlat ed in some way. However, a
correlat ion bet ween object s in different rows or columns cannot be ensured.
UDT s cannot be used as shuffle keys in clauses, such as JOIN, GROUP BY, DIST RIBUT E BY, SORT BY,
ORDER BY, or CLUST ER BY.
UDT s can be used at t he st ages in expressions, but cannot be used as out put s. For example, you
cannot call t he group by new java.math.BigInteger('123') met hod. However, you can call t he
group by new java.math.BigInteger('123').hashCode() met hod. T his is because t he value ret urned
by hashCode is an int .class t ype, which can be used as t he built -in INT t ype.
UDT object s can be forcibly convert ed t o t he object s of t heir base classes or subclasses.
T he dat a t ype conversion for t wo object s wit hout inherit ance follows nat ive conversion rules.
Not ice T he conversion may cause dat a changes. For example, dat a of t he
java.lang.Long t ype can be forcibly convert ed t o t he java.lang.Int eger t ype. T his conversion
uses t he rules t hat are used t o convert t he built -in BIGINT t ype t o t he INT t ype. T his process
may cause dat a changes or even dat a precision loss.
UDT object s cannot be saved or added t o t ables. DDL st at ement s do not support UDT s. You cannot
creat e t ables t hat cont ain UDT object s unless t he dat a t ype is implicit ly convert ed t o one of t he
built -in t ypes. In addit ion, t he out put cannot be a UDT . However, you can call t he t oSt ring() met hod
t o convert t he dat a t ype t o t he java.lang.St ring t ype because t he t oSt ring() met hod support s all Java
classes. You can use t his met hod t o check UDT dat a during debugging.
Not e T his flag is t ypically used for debugging because it can be applied only t o PRINT
st at ement s. It cannot be applied t o INSERT st at ement s.
BINARY is a built -in t ype and support s aut omat ic serializat ion. You can save t he byt e[] arrays. T he
saved byt e[] arrays can be deserialized t o t he BINARY t ype.
Some classes may have t heir own serializat ion and deserializat ion met hods, such as prot obuffer. T o
save UDT s, you must call serializat ion and deserializat ion met hods t o convert t he dat a t ype t o
BINARY.
You can use UDT s t o achieve t he feat ure provided by t he SCALAR funct ion. You can use t he
COLLECT _LIST and EXPLODE built -in funct ions wit h UDT s t o achieve t he feat ures provided by
aggregat e and t able-valued funct ions.
UDT s support resource access. You can call t he com.aliyun.odps.udf.impl.UDTExecutionContext.ge
t() st at ic met hod t o obt ain t he Execut ionCont ext object . T hen, use t he object t o access t he
current execut ion cont ext and t hen t o access resources, such as files and t ables.
set odps.sql.type.system.odps2=true;
set odps.sql.udt.display.tostring=true;
select
new Integer[10], -- Create an array that contains 10 elements.
new Integer[] {c1, c2, c3}, -- Create an array that contains three elements by initial
izing an ArrayList.
new Integer[][] { new Integer[] {c1, c2}, new Integer[] {c3, c4} }, -- Create a multid
imensional array.
new Integer[] {c1, c2, c3} [2], -- Access the elements in the array using indexes.
java.util.Arrays.asList(c1, c2, c3); -- This is another way to create a built-in arr
ay. It creates a List<Integer>, which can be used as an array<int>.
from values (1,2,3,4) as t(c1, c2, c3, c4);
Example:
set odps.sql.type.system.odps2=true;
set odps.sql.session.java.imports=java.util.*,java,com.google.gson. *; -- To import multipl
e packages, separate the packages with commas (,).
@a := select new Gson() gson; -- Create a GSON object.
select
gson.toJson(new ArrayList<Integer>(Arrays.asList(1, 2, 3))), -- Convert an object to a JSON
string.
cast(gson.fromJson('["a","b","c"]', List.class) as List<String>) --Deserialize the JSON str
ing. GSON also forcibly converts the deserialized result from List<Object> type to List<Str
ing> type.
from @a;
Compared wit h built -in funct ion GET _JSON_OBJECT , t his met hod is simple and improves efficiency by
ext ract ing cont ent from t he JSON st ring and deserializing t he st ring t o a support ed dat a t ype.
In addit ion t o GSON dependencies, MaxComput e runt ime also carries ot her dependencies, including
commons-logging (1.1.1), commons-lang (2.5), commons-io (2.4), and prot obuf-java (2.4.1).
Java object s in classes calling t he java.ut il.List or java.ut il.Map API can be used in MaxComput e SQL
composit e t ype dat a processing.
Array and map t ype dat a in MaxComput e can direct ly call t he java.ut il.List or java.ut il.Map API.
Example:
set odps.sql.type.system.odps2=true;
set odps.sql.session.java.imports=java.util.*;
select
size(new ArrayList<Integer>()), -- Call built-in function size to obtain the siz
e of the ArrayList.
array(1,2,3).size(), -- Call the List method for built-in type array.
sort_array(new ArrayList<Integer>()), -- Sort the data in the ArrayList.
al[1], -- The Java List method does not support indexin
g. However, the array type supports indexing.
Objects.toString(a), -- With this method, you can convert array type to string t
ype data.
array(1,2,3).subList(1, 2) -- Get a sublist.
from (select new ArrayList<Integer>(array(1,2,3)) as al, array(1,2,3) as a) t;
T he following example shows how t o obt ain t he median from BigInt eger dat a. You cannot direct ly call
t he built -in MEDIAN funct ion because t he dat a is java.mat h.BigInt eger t ype.
set odps.sql.session.java.imports=java.math.*;
@test_data := select * from values (1),(2),(3),(5) as t(value);
@a := select collect_list(new BigInteger(value)) values from @test_data; -- Aggregate the
data to a list.
@b := select sort_array(values) as values, values.size() cnt from @a; -- To obtain the med
ian, first sort the data.
@c := select if(cnt % 2 == 1, new BigDecimal(values[cnt div 2]), new BigDecimal(values[cnt
div 2 - 1].add(values[cnt div 2])).divide(new BigDecimal(2))) med from @b;
-- Final output.
select med.toString() from @c;
You cannot use t he COLLECT _LIST funct ion t o implement part ial aggregat ion because it aggregat es all
dat a. It is more efficient t o use t he built -in aggregat or or UDAF object . We recommend t hat you use t he
built -in aggregat or. Aggregat ing all dat a in a group increases t he risk of dat a skew.
If t he logic of t he UDAF object is t o aggregat e all dat a in a similar manner t o built -in funct ion
WM_CONCAT , using t he COLLECT _LIST funct ion is more efficient t han using t he UDAF object .
1. For more informat ion about how t o input mult iple rows or columns, see t he example of using
aggregat e funct ions.
2. T o out put mult iple rows, you can use a UDT t o define a Collect ion t ype (List or Map), and t hen call
t he EXPLODE funct ion t o split t he collect ion int o mult iple rows.
3. A UDT can cont ain mult iple fields. You can ret rieve t he dat a from t he fields by calling different
get t er met hods. T he dat a is t hen out put in mult iple rows.
T he following example shows how t o split a JSON st ring and out put t he result as mult iple columns:
Deserializat ion is not required for object s in only one process. Deserializat ion is required only when
t he object s are t ransmit t ed among processes. T his means t hat UDT do not incur any serializat ion or
deserializat ion overhead when no dat a reshuffling is performed, such as calling t he join or
aggregat or funct ion.
UDT s suffer no performance loss from reflect ion because t he runt ime of UDT s is based on Codegen,
rat her t han based on reflect ion.
Mult iple UDT s can be wrapped int o a single funct ion call and execut ed t oget her. In t he following
example, a single UDT is being called. UDT s focus on small-granularit y dat a processing. T his does not
incur addit ional overhead for t he API where mult iple funct ions are called.
values[x].add(values[y]).divide(java.math.BigInteger.valueOf(2))
6.10. UDJ
6.10.1. Overview
MaxComput e provides mult iple JOIN met hods nat ively, including INNER JOIN, RIGHT JOIN, OUT ER JOIN, LEFT
JOIN, FULL JOIN, SEMIJOIN, and ANT ISEMIJOIN met hods. You can use t hese nat ive JOIN met hods in most
scenarios. However, t hese met hods cannot handle mult iple t ables.
In most cases, you can build your code framework using UDFs. However, t he current UDF, UDT F, and
UDAF frameworks only can handle one t able at a t ime. T o perform user-defined operat ions for mult iple
t ables, you have t o use nat ive JOIN met hods, UDFs, UDT Fs, and complex SQL st at ement s. In cert ain cases
when you handle mult iple t ables, you must use a cust om MapReduce framework inst ead of SQL t o
complet e t he required t ask.
In any sit uat ion, t hese operat ions require t echnological expert ise and may cause t he following
problems:
Calling mult iple JOIN met hods in SQL st at ement s can lead t o comput at ional black box t hat is complex
and difficult t o execut e wit h minimal overheads.
Using MapReduce even make opt imal execut ion of code becomes impossible. Most of t he MapReduce
code is writ t en in Java. T he execut ion of t he MapReduce code is less efficient t han t he execut ion of
MaxComput e code generat ed by t he LLVM code generat or at an opt imized nat ive runt ime.
Wit h t he addit ion of t he MaxComput e 2.0 comput e engine, t he user defined join (UDJ) API has been
added t o t he user defined funct ion (UDF) framework. T his API allows you t o handle mult iple t ables and
simplifies operat ions performed in t he underlying MapReduce dist ribut ed syst em.
T he payment (user_id st ring,t ime dat et ime,pay_info st ring) t able st ores t he payment informat ion of a
user. Each payment record includes t he user ID, payment t ime, and t he payment det ails.
T he user_client _log (user_id st ring,t ime dat et ime,cont ent st ring) t able st ores user client records,
including t he user ID, operat ion t ime, and operat ion.
Requirement s: For each record in t he user_client _log t able, locat e t he payment record t hat has t he
t ime closest t o t he operat ion t ime, and join and out put t he cont ent of bot h records.
T o complet e t his t ask by using st andard join met hods, you would need t o join t he t wo t ables based on
t heir common user_id fields, and t hen locat e t he payment record and operat ion t hat most closely
mat ch each ot her's t ime. T he SQL st at ement may be writ t en as follows:
SELECT
p.user_id,
p.time,
merge(p.pay_info, u.content)
FROM
payment p RIGHT OUTER JOIN user_client_log u
ON p.user_id = u.user_id and abs(p.time - u.time) = min(abs(p.time - u.time))
However, when you join t wo rows in t he t ables, you must calculat e t he minimum difference bet ween
t he p.t ime and u.t ime under t he same user_id, and t he aggregat e funct ion cannot be called in t he join
condit ion. Because of t his, t his t ask cannot be complet ed by calling t he st andard JOIN met hod.
Can we use UDJ t o solve t his problem? Yes. T he following t opics describe how t o use UDJ t o sat isfy t he
preceding requirement s.
Prerequisites
UDJ is a new feat ure, so a new SDK is required.
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-sdk-udf</artifactId>
<version>0.30.0</version>
<scope>provided</scope>
</dependency>
T he SDK cont ains a new abst ract class UDJ. All UDJ feat ures can be implement ed t hrough t his class.
Sample code
T he following sample code is used for reference only.
package com.aliyun.odps.udf.example.udj;
import com.aliyun.odps.Column;
import com.aliyun.odps.OdpsType;
import com.aliyun.odps.Yieldable;
import com.aliyun.odps.data.ArrayRecord;
import com.aliyun.odps.data.Record;
import com.aliyun.odps.udf.DataAttributes;
import com.aliyun.odps.udf.ExecutionContext;
import com.aliyun.odps.udf.UDJ;
import com.aliyun.odps.udf.annotation.Resolve;
import java.util.ArrayList;
import java.util.Iterator;
/** For each record of right table, find the nearest record of left table and
* merge two records.
*/
@Resolve("->string,bigint,string")
public class PayUserLogMergeJoin extends UDJ {
private Record outputRecord;
/** Will be called prior to the data processing phase. User could implement
* this method to do initialization work.
*/
@Override
public void setup(ExecutionContext executionContext, DataAttributes dataAttributes) {
//
outputRecord = new ArrayRecord(new Column[]{
new Column("user_id", OdpsType.STRING),
new Column("time", OdpsType.BIGINT),
new Column("content", OdpsType.STRING)
});
}
/** Override this method to implement join logic.
* @param key Current join key
* @param left Group of records of left table corresponding to the current key
* @param right Group of records of right table corresponding to the current key
* @param output Used to output the result of UDJ
Not ice In t his example, t he NULL values in t he ent ries are not processed. T o simplify t he dat a
processing procedure, assume t hat no NULL values are cont ained in t he t ables.
Each t ime you call t his JOIN met hod of UDJ, records t hat mat ch t he same key in t he t wo t ables are
ret urned. T herefore, UDJ searches all records in t he payment t able t o locat e t he record wit h t he t ime
closest t o each record in t he user_client _log t able.
Assume t hat t he user only has a few payment records. In t his case, you can load t he dat a in t he
payment t able t o t he memory. T ypically, t here is sufficient memory t o st ore t he user payment dat a
generat ed each day. What if t his assumpt ion is invalid? How can we resolve t his issue? T his issue will be
discussed in Pre-sort ing .
Assume t hat t he code is compressed int o JAR package odps-udj-example.jar. Use t he Add JAR command
t o upload t he JAR package t o MaxComput e.
Execut e t he CREAT E FUNCT ION st at ement t o creat e UDJ funct ion pay_user_log_merge_join, using JAR
package odps-udj-example.jar and Java class com.aliyun.odps.udf.example.udj.PayUserLogMergeJoin.
Not ice T he dat a in t his example is only used for reference. You may need t o creat e
different dat a in act ual operat ions.
3. In MaxComput e SQL, use t he UDJ funct ion you have creat ed:
Descript ion:
r is t he alias of t he result ret urned by t he UDJ funct ion. You can reference t his alias in ot her SQL
st at ement s.
(user_id, t ime, cont ent ) are t he columns ret urned by t he UDJ funct ion.
+---------+------------+---------+
| user_id | time | content |
+---------+------------+---------+
| 1000235 | 2018-02-13 00:25:36 | click FNOXAibRjkIaQPB |
| 1000235 | 2018-02-13 22:30:00 | click GczrYaxvkiPultZ |
| 1335656 | 2018-02-13 18:30:00 | click MxONdLckpAFUHRS, pay PEqMSHyktn |
| 1335656 | 2018-02-13 19:54:00 | click mKRPGOciFDyzTgM, pay PEqMSHyktn |
| 2656199 | 2018-02-13 08:30:00 | click CZwafHsbJOPNitL, pay pYvotuLDIT |
| 2656199 | 2018-02-13 09:14:00 | click nYHJqIpjevkKToy, pay pYvotuLDIT |
| 2656199 | 2018-02-13 21:05:00 | click gbAfPCwrGXvEjpI, pay PEqMSHyktn |
| 2656199 | 2018-02-13 21:08:00 | click dhpZyWMuGjBOTJP, pay PEqMSHyktn |
| 2656199 | 2018-02-13 22:29:00 | click bAsxnUdDhvfqaBr, pay gZhvdySOQb |
| 2656199 | 2018-02-13 22:30:00 | click XIhZdLaOocQRmrY, pay gZhvdySOQb |
| 4356142 | 2018-02-13 18:30:00 | click DYqShmGbIoWKier |
| 4356142 | 2018-02-13 19:54:00 | click DYqShmGbIoWKier |
| 8881237 | 2018-02-13 00:30:00 | click MpkvilgWSmhUuPn, pay pYvotuLDIT |
| 8881237 | 2018-02-13 06:14:00 | click OkTYNUHMqZzlDyL, pay pYvotuLDIT |
| 8881237 | 2018-02-13 10:30:00 | click OkTYNUHMqZzlDyL, pay KBuMzRpsko |
| 9890100 | 2018-02-13 16:01:00 | click vOTQfBFjcgXisYU, pay gZhvdySOQb |
| 9890100 | 2018-02-13 16:20:00 | click WxaLgOCcVEvhiFJ, pay MxONdLckwa |
+---------+------------+---------+
As shown in t he preceding code, t he t ask t hat could not be performed by calling nat ive JOIN met hods
has been complet ed by using UDJ.
6.10.2.5. Pre-sorting
An it erat or is used t o search all records in t he payment t able and locat e payment records t hat mat ch
t he query. T o perform t his t ask, you must load all payment records wit h t he same user_id t o an
ArrayList . T his met hod can be applied when t he number of payment records is small. Due t o RAM size
limit s, you must find anot her met hod t o load t he dat a if a large number of payment records have been
generat ed.
T his t opic describes how t o address t his issue using t he SORT BY clause. When t he size of t he payment
dat a is t oo large t o be st ored in t he memory, it would be easier t o address t his issue if all dat a in t he
t able has already been sort ed by t ime. You t hen only need t o compare t he first element in t hese t wo
list s. UDJ code in Java:
@Override
public void join(Record key, Iterator<Record> left, Iterator<Record> right, Yieldable<Recor
d> output) {
outputRecord.setString(0, key.getString(0));
if (! right.hasNext()) {
return;
} else if (! left.hasNext()) {
while (right.hasNext()) {
Record logRecord = right.next();
outputRecord.setBigint(1, logRecord.getDatetime(0).getTime());
outputRecord.setString(2, logRecord.getString(1));
output.yield(outputRecord);
}
return;
}
long prevDelta = Long.MAX_VALUE;
Record logRecord = right.next();
Record payRecord = left.next();
Record lastPayRecord = payRecord.clone();
while (true) {
long delta = logRecord.getDatetime(0).getTime() - payRecord.getDatetime(0).getTime();
if (left.hasNext() && delta > 0) {
// The delta of time between two records is decreasing, we can still
// explore the left group to try to gain a smaller delta.
lastPayRecord = payRecord.clone();
prevDelta = delta;
payRecord = left.next();
} else {
// Hit to the point of minimal delta. Check with the last pay record,
// output the merge result and prepare to process the next record of
// right group.
Record nearestPay = Math.abs(delta) < prevDelta ? payRecord : lastPayRecord;
outputRecord.setBigint(1, logRecord.getDatetime(0).getTime());
String mergedString = mergeLog(nearestPay.getString(1), logRecord.getString(1));
outputRecord.setString(2, mergedString);
output.yield(outputRecord);
if (right.hasNext()) {
logRecord = right.next();
prevDelta = Math.abs(
logRecord.getDatetime(0).getTime() - lastPayRecord.getDatetime(0).getTime()
);
} else {
break;
}
}
}
}
Not ice Aft er you have modified t he UDJ code, you must updat e t he corresponding JAR
package.
When t he creat ed UDJ funct ion is used in MaxComput e SQL, you must modify t he command as follows:
In t he nat ive SQL language, you must make a few modificat ions, add a SORT BY clause t o t he end of t he
UDJ clause, and t hen sort t he dat a in bot h t ables by t ime.
T his met hod uses t he SORT BY clause t o pre-sort t he dat a. T o achieve t he same result , only a maximum
of t hree records need t o be cached.
T he following example uses an online MapReduce job t o t est t he UDJ performance. T his MapReduce job
uses a complex algorit hm t o join t wo t ables. T his example uses UDJ t o rewrit e t he SQL st at ement s of
t he MapReduce job and checks t he execut ion result s.
As shown in t he figure, UDJ helps describe t he complex logic of handling mult iple t ables, and great ly
improves t he query performance.
Not e T he code is only execut ed inside UDJ. T he ent ire logic of t he code is execut ed by t he
high-performance MaxComput e nat ive runt ime.
UDJ opt imizes t he MaxComput e runt ime engine and t he dat a exchange bet ween int erfaces. T he join
logic of UDJ is more efficient t han t hat of t he reduce st age.
In t he t radit ional views of MaxComput e, complex SQL script s are encapsulat ed at t he underlying layer.
Callers can call views like reading a st andard t able wit hout t he need t o underst and t he underlying
implement at ion mechanism. T radit ional views are widely used because t hey can be used t o implement
encapsulat ion and code reuse.
However, you cannot specify paramet ers for t radit ional views. If a t radit ional view is called t o read dat a
from an underlying t able, you cannot filt er dat a in t he underlying t able or pass ot her paramet ers t o t he
view. T his reduces t he code reuse rat e.
T he new SQL engine of MaxComput e V2.0 support s paramet erized views and allows you t o import any
t ables or ot her variables t o cust omize views.
T he creat ed pv1 view has t wo paramet ers, t he t able and st ring paramet ers. T he paramet er values
can be t ables or be of a basic dat a t ype.
T he paramet er values can also be subqueries. Example:
When you define a view, you can set t he t ype of a paramet er value t o ANY. Example:
When you define a view, you can use an ast erisk (*) t o indicat e a varying-lengt h column. Example:
create view view_name(@a bigint @b TABLE(x bigint, * ANY)) asselect * from @b where x = @
a;
Not e
You can use different paramet ers t o call t he pv1 view. T he value of t he t able paramet er can
be a physical t able, view, t able variable, or t able alias in common t able expressions (CT Es).
Common paramet ers can be variables or const ant s.
Additional instructions
A paramet erized view can cont ain mult iple SQL st at ement s, similar t o a script .
Not e
Cont ent bet ween BEGIN and end is t he script of t his view.
T he @pv2 := … st at ement is similar t o t he RET URN st at ement in ot her programming
languages. T his st at ement is used t o assign a value t o an implicit t able variable t hat has t he
same name as t he view.
Only DML st at ement s can be used in script s. T he INSERT and CREAT E T ABLE AS st at ement s
cannot be included in script s.
PRINT st at ement s cannot be included in script s.
T he mat ching rules for act ual and formal view paramet ers are t he same as t hose specified in a normal
programming language. If view paramet ers can be implicit ly convert ed, t hese paramet ers can be
mat ched. For example, t he BIGINT value can mat ch t he paramet ers of t he DOUBLE t ype. For t able
variables, if t he schema of T able A can be insert ed int o T able B, T able A can be used t o mat ch t he
t able paramet er t hat has t he same schema as T able B.
In some sit uat ions, you can declare t he ret urn t ype t o make t he code easier t o read. Example:
Not e RET URNS @ret T ABLE (x st ring, y st ring) defines t he following informat ion:
T he ret urn t ype is T ABLE (x st ring, y st ring), which indicat es t he t ype ret urned t o t he caller.
You can use t his paramet er t o cust omize t he t able schema.
T he response paramet er is @ret . A value is assigned t o t he paramet er in t he view script .
You can regard a view t hat does not cont ain t he BEGIN and END keywords or t hat does not
ret urn variables as a simplified view.
All funct ions are published in t he geospat ial project of t he Dat aWorks market place. T hese funct ions are
prefixed wit h ST _. You can click a funct ion t o view and use it wit hout t he need t o apply for permissions.
T o use a funct ion, add geospatial.Project prefix t o t he beginning of t he funct ion name and
commit SQL st at ement s t hat cont ain t his funct ion wit h t he following flags:
set odps.sql.hive.compatible=true;
set odps.sql.udf.java.retain.legacy=false;
set odps.isolation.session.enable=true;
6.12.2. Constructors
6.12.2.1. ST_AsBinary
Funct ion declarat ion:
ST_AsBinary(ST_Geometry)
Descript ion: T his funct ion ret urns t he well-known binary (WKB) represent at ion of t he input geomet ry.
Example :
6.12.2.2. ST_AsGeoJson
Funct ion declarat ion:
ST_AsGeoJson(geometry)
Descript ion: T his funct ion ret urns t he GeoJSON represent at ion of t he input geomet ry.
Example :
6.12.2.3. ST_AsJson
Funct ion declarat ion:
ST_AsJSON(ST_Geometry)
Descript ion: T his funct ion ret urns t he JSON represent at ion of t he input geomet ry.
Example :
6.12.2.4. ST_AsShape
Funct ion declarat ion:
ST_AsShape(ST_Geometry)
Descript ion: T his funct ion ret urns t he ESRI shape represent at ion of t he input geomet ry.
Example :
6.12.2.5. ST_AsText
Funct ion declarat ion:
ST_AsText(ST_Geometry)
Descript ion: T his funct ion ret urns t he well-known t ext (WKT ) represent at ion of t he input geomet ry.
Example :
6.12.2.6. ST_GeomCollection
Funct ion declarat ion:
ST_GeomCollection(wkt)
Descript ion: T his funct ion const ruct s a mult i-part geomet ry from t he well-known t ext (WKT )
represent at ion based on t he Open Geospat ial Consort ium (OGC).
Not ice T he ST _GeomCollect ion funct ion in MaxComput e support s only t he mult i-part
geomet ry feat ure, not t he collect ion feat ure.
Example :
6.12.2.7. ST_GeomFromGeoJson
Funct ion declarat ion:
ST_GeomFromGeoJson(json)
Descript ion: T his funct ion const ruct s a geomet ry from t he input GeoJSON represent at ion.
Example :
6.12.2.8. ST_GeomFromJSON
Funct ion declarat ion:
ST_GeomFromJSON(json)
Descript ion: T his funct ion const ruct s a geomet ry from t he input ESRI JSON represent at ion.
Example :
6.12.2.9. ST_GeomFromShape
Funct ion declarat ion:
ST_GeomFromShape(shape)
Descript ion: T his funct ion const ruct s a geomet ry from t he input ESRI shape represent at ion.
Example :
6.12.2.10. ST_GeomFromText
Funct ion declarat ion:
ST_GeomFromText(wkt)
Descript ion: T his funct ion const ruct s a geomet ry from t he input well-known t ext (WKT )
represent at ion based on t he Open Geospat ial Consort ium (OGC).
Example :
6.12.2.11. ST_GeomFromWKB
ST_GeomFromWKB(wkb)
Descript ion: T his funct ion const ruct s a geomet ry from t he input well-known binary (WKB)
represent at ion based on t he Open Geospat ial Consort ium (OGC).
Example :
6.12.2.12. ST_GeometryType
Funct ion declarat ion:
ST_GeometryType(geometry)
Descript ion: T his funct ion ret urns t he t ype name of t he input geomet ry.
Example :
6.12.2.13. ST_LineString
Funct ion declarat ion:
Example :
6.12.2.14. ST_LineFromWKB
Funct ion declarat ion:
ST_LineFromWKB(wkb)
Descript ion: T his funct ion const ruct s a t wo-dimensional line from t he input well-known binary (WKB)
represent at ion based on t he Open Geospat ial Consort ium (OGC).
Example :
6.12.2.15. ST_MultiLineString
Funct ion declarat ion:
ST_MultiLineString(array(x1, y1, x2, y2, ... ), array(x1, y1, x2, y2, ... ), ... )
ST_MultiLineString('multilinestring( ... )')
Descript ion: T his funct ion const ruct s a t wo-dimensional mult ilinest ring.
Example :
SELECT ST_MultiLineString(array(1, 1, 2, 2), array(10, 10, 20, 20)) from src LIMIT 1;
SELECT ST_MultiLineString('multilinestring ((1 1, 2 2), (10 10, 20 20))', 0) from src LIMIT
1;
6.12.2.16. ST_MLineFromWKB
Funct ion declarat ion:
ST_MLineFromWKB(wkb)
Descript ion: T his funct ion const ruct s a t wo-dimensional mult ilinest ring from t he input well-known
binary (WKB) represent at ion based on t he Open Geospat ial Consort ium (OGC).
Example :
6.12.2.17. ST_MultiPoint
Funct ion declarat ion:
Descript ion: T his funct ion const ruct s a t wo-dimensional mult ipoint geomet ry.
Example :
6.12.2.18. ST_MPointFromWKB
Funct ion declarat ion:
ST_MPointFromWKB(wkb)
Descript ion: T his funct ion const ruct s a t wo-dimensional mult ipoint geomet ry from t he input well-
known binary (WKB) represent at ion based on t he Open Geospat ial Consort ium (OGC).
Example :
6.12.2.19. ST_MultiPolygon
Funct ion declarat ion:
ST_MultiPolygon(array(x1, y1, x2, y2, ... ), array(x1, y1, x2, y2, ... ), ... )
ST_MultiPolygon('multipolygon ( ... )')
Descript ion: T his funct ion const ruct s a t wo-dimensional mult ipolygon.
Example :
6.12.2.20. ST_MPolyFromWKB
Funct ion declarat ion:
ST_MPolyFromWKB(wkb)
Descript ion: T his funct ion const ruct s a t wo-dimensional mult ipolygon from t he input well-known
binary (WKB) represent at ion based on t he Open Geospat ial Consort ium (OGC).
Example :
6.12.2.21. ST_Point
Funct ion declarat ion:
ST_Point(x, y)
ST_Point('point (x y)')
Example :
6.12.2.22. ST_PointFromWKB
Funct ion declarat ion:
ST_PointFromWKB(wkb)
Descript ion: T his funct ion const ruct s a t wo-dimensional point from t he input well-known binary (WKB)
represent at ion based on t he Open Geospat ial Consort ium (OGC).
Example :
6.12.2.23. ST_PointZ
Funct ion declarat ion:
ST_PointZ(x, y, z)
Example :
6.12.2.24. ST_Polygon
Example :
6.12.2.25. ST_PolyFromWKB
Funct ion declarat ion:
ST_PolyFromWKB(wkb)
Descript ion: T his funct ion const ruct s a t wo-dimensional polygon from t he input well-known binary
(WKB) represent at ion based on t he Open Geospat ial Consort ium (OGC).
Example :
6.12.2.26. ST_SetSRID
Funct ion declarat ion:
ST_SetSRID(<ST_Geometry>, SRID)
Descript ion: T his funct ion set s t he spat ial reference syst em ident ifier (SRID) of t he input geomet ry.
Example :
6.12.3. Accessors
6.12.3.1. ST_Area
Funct ion declarat ion:
ST_Area(ST_Polygon)
Descript ion: T his funct ion ret urns t he areas of one or more polygons.
Example :
6.12.3.2. ST_Centroid
Funct ion declarat ion:
ST_Centroid(polygon)
Descript ion: T his funct ion ret urns t he cent er point of t he minimum bounding rect angle of t he input
polygon.
Example :
6.12.3.3. ST_CoordDim
Funct ion declarat ion:
ST_CoordDim(geometry)
Descript ion: T his funct ion ret urns t he coordinat e dimension of t he input geomet ry.
Example :
6.12.3.4. ST_Dimension
Funct ion declarat ion:
ST_Dimension(geometry)
Descript ion: T his funct ion ret urns t he spat ial dimension of t he input geomet ry.
Example :
6.12.3.5. ST_Distance
Funct ion declarat ion:
ST_Distance(ST_Geometry1, ST_Geometry2)
Descript ion: T his funct ion ret urns t he dist ance bet ween a point in geomet ry1 and a point in
geomet ry2.
Example :
6.12.3.6. ST_GeodesicLengthWGS84
Funct ion declarat ion:
ST_GeodesicLengthWGS84(line)
Descript ion: T his funct ion ret urns t he dist ance in met ers on a spheroid based on World Geodet ic
Syst em 1984 (WGS84). T he geomet ry must be in WGS84. Ot herwise, t his funct ion ret urns NULL.
Example :
6.12.3.7. ST_GeometryN
Funct ion declarat ion:
ST_GeometryN(ST_GeometryCollection, n)
Descript ion: T his funct ion ret urns t he nt h geomet ry in t he input geomet ry collect ion. n st art s from 1.
Example :
SELECT ST_GeometryN(ST_GeomFromText('multipoint ((10 40), (40 30), (20 20), (30 10))'), 3)
FROM src LIMIT 1;
-- ST_Point(20 20)
SELECT ST_GeometryN(ST_GeomFromText('multilinestring ((2 4, 10 10), (20 20, 7 8))'), 2) FRO
M src LIMIT 1;
-- ST_Linestring(20 20, 7 8)
6.12.3.8. ST_Is3D
Funct ion declarat ion:
ST_Is3D(geometry)
Descript ion: If t he input geomet ry has Z coordinat es, t his funct ion ret urns t rue. Ot herwise, t his
funct ion ret urns false.
Example :
6.12.3.9. ST_IsClosed
Funct ion declarat ion:
ST_IsClosed(ST_[Multi]LineString)
Descript ion: If t he input linest ring or linest rings are closed, t his funct ion ret urns t rue.
Example :
6.12.3.10. ST_IsEmpty
Funct ion declarat ion:
ST_IsEmpty(geometry)
Descript ion: If t he input geomet ry is empt y, t his funct ion ret urns t rue.
Example :
6.12.3.11. ST_IsMeasured
Funct ion declarat ion:
ST_IsMeasured(geometry)
Descript ion: If t he input geomet ry has M coordinat es (measures), t his funct ion ret urns t rue.
Example :
6.12.3.12. ST_IsSimple
Funct ion declarat ion:
ST_IsSimple(geometry)
Descript ion: If t he input geomet ry is simple, t his funct ion ret urns t rue.
Example :
6.12.3.13. ST_IsRing
Funct ion declarat ion:
ST_IsRing(ST_LineString)
Descript ion: If t he input linest ring is closed or simple, t his funct ion ret urns t rue.
Example :
6.12.3.14. ST_Length
Funct ion declarat ion:
ST_Length(line)
Descript ion: T his funct ion ret urns t he lengt h of t he input line segment .
Example :
6.12.3.15. ST_M
Funct ion declarat ion:
ST_M(geometry)
Descript ion: T his funct ion ret urns t he M coordinat e of t he input geomet ry.
Example :
6.12.3.16. ST_MaxM
Funct ion declarat ion:
ST_MaxM(geometry)
Descript ion: T his funct ion ret urns t he maximum M coordinat e of t he input geomet ry.
Example :
6.12.3.17. ST_MinM
ST_MinM(geometry)
Descript ion: T his funct ion ret urns t he minimum M coordinat e of t he input geomet ry.
Example :
6.12.3.18. ST_X
Funct ion declarat ion:
ST_X(point)
Descript ion: T his funct ion ret urns t he X coordinat e of t he input point .
Example :
6.12.3.19. ST_Y
Funct ion declarat ion:
ST_Y(point)
Descript ion: T his funct ion ret urns t he Y coordinat e of t he input point .
Example :
6.12.3.20. ST_Z
Funct ion declarat ion:
ST_Z(point)
Descript ion: T his funct ion ret urns t he Z coordinat e of t he input point .
Example :
6.12.3.21. ST_MaxX
Funct ion declarat ion:
ST_MaxX(geometry)
Descript ion: T his funct ion ret urns t he maximum X coordinat e of t he input geomet ry.
Example :
6.12.3.22. ST_MaxY
Funct ion declarat ion:
ST_MaxY(geometry)
Descript ion: T his funct ion ret urns t he maximum Y coordinat e of t he input geomet ry.
Example :
6.12.3.23. ST_MaxZ
Funct ion declarat ion:
ST_MaxZ(geometry)
Descript ion: T his funct ion ret urns t he maximum Z coordinat e of t he input geomet ry.
Example :
6.12.3.24. ST_MinX
Funct ion declarat ion:
ST_MinX(geometry)
Descript ion: T his funct ion ret urns t he minimum X coordinat e of t he input geomet ry.
Example :
6.12.3.25. ST_MinY
Funct ion declarat ion:
ST_MinY(geometry)
Descript ion: T his funct ion ret urns t he minimum Y coordinat e of t he input geomet ry.
Example :
6.12.3.26. ST_MinZ
Funct ion declarat ion:
ST_MinZ(geometry)
Descript ion: T his funct ion ret urns t he minimum Z coordinat e of t he input geomet ry.
Example :
6.12.3.27. ST_NumGeometries
Funct ion declarat ion:
ST_NumGeometries(ST_GeometryCollection)
Descript ion: T his funct ion ret urns t he number of geomet ries in t he input geomet ry collect ion.
Example :
SELECT ST_NumGeometries(ST_GeomFromText('multipoint ((10 40), (40 30), (20 20), (30 10))'))
FROM src LIMIT 1;
-- 4
SELECT ST_NumGeometries(ST_GeomFromText('multilinestring ((2 4, 10 10), (20 20, 7 8))')) FR
OM src LIMIT 1;
-- 2
6.12.3.28. ST_NumInteriorRing
Funct ion declarat ion:
ST_NumInteriorRing(ST_Polygon)
Descript ion: T his funct ion ret urns t he number of int erior rings of t he input polygon.
Example :
6.12.3.29. ST_NumPoints
Funct ion declarat ion:
ST_NumPoints(geometry)
Descript ion: T his funct ion ret urns t he number of point s in t he input geomet ry.
Example :
6.12.3.30. ST_PointN
Funct ion declarat ion:
ST_PointN(ST_Geometry, n)
Descript ion: T his funct ion ret urns t he nt h point of one or more linest rings.
Example :
6.12.3.31. ST_StartPoint
Funct ion declarat ion:
ST_StartPoint(geometry)
Descript ion: T his funct ion ret urns t he first point of t he input linest ring.
Example :
6.12.3.32. ST_EndPoint
Funct ion declarat ion:
ST_EndPoint(geometry)
Descript ion: T his funct ion ret urns t he last point of t he input linest ring.
Example :
6.12.3.33. ST_SRID
Funct ion declarat ion:
ST_SRID(ST_Geometry)
Descript ion: T his funct ion ret urns t he spat ial reference syst em ident ifier (SRID) of t he input geomet ry.
Example :
6.12.4. Operations
6.12.4.1. ST_Aggr_ConvexHull
Funct ion declarat ion:
ST_Aggr_ConvexHull(ST_Geometry)
Descript ion: T his funct ion ret urns a convex hull for input geomet ries by using aggregat ion
t ransformat ion.
Example :
6.12.4.2. ST_Aggr_Intersection
Funct ion declarat ion:
ST_Aggr_Intersection(ST_Geometry)
Descript ion: T his funct ion ret urns t he int ersect ion of input geomet ries by using aggregat ion
t ransformat ion.
Example :
6.12.4.3. ST_Aggr_Union
Funct ion declarat ion:
ST_Aggr_Union(ST_Geometry)
Descript ion: T his funct ion ret urns a union of input geomet ries by using aggregat ion t ransformat ion.
Example :
6.12.4.4. ST_Bin
Funct ion declarat ion:
ST_Bin(placeholder)
Descript ion: T his funct ion ret urns t he bin ID of t he input point .
6.12.4.5. ST_BinEnvelope
Funct ion declarat ion:
ST_BinEnvelope(binsize, point)
Descript ion: T his funct ion ret urns t he binary envelope for t he input point .
ST_BinEnvelope(binsize, binid)
Descript ion: T his funct ion ret urns t he binary envelope for t he input bin ID.
6.12.4.6. ST_Boundary
Funct ion declarat ion:
ST_Boundary(ST_Geometry)
Descript ion: T his funct ion ret urns t he boundary of t he input geomet ry.
Example :
6.12.4.7. ST_Buffer
Funct ion declarat ion:
ST_Buffer(geometry, distance)
Descript ion: T his funct ion ret urns a geomet ry t hat indicat es all point s whose dist ance from t his
geomet ry t o t he input geomet ry is less t han or equal t o t he value of t he dist ance paramet er.
6.12.4.8. ST_ConvexHull
Funct ion declarat ion:
Descript ion: T his funct ion ret urns t he convex hull of t he input geomet ry.
Example :
6.12.4.9. ST_Difference
Funct ion declarat ion:
ST_Difference(ST_Geometry1, ST_Geometry2)
Descript ion: T his funct ion ret urns a geomet ry t hat indicat es t he difference bet ween t he input
geomet ries.
Example :
6.12.4.10. ST_Envelope
Funct ion declarat ion:
ST_Envelope(ST_Geometry)
Descript ion: T his funct ion ret urns t he envelope of t he input geomet ry. If t he specified geomet ry is a
point , a horizont al line, or a vert ical line, t his funct ion ret urns t he common difference or an empt y
envelope.
Example :
6.12.4.11. ST_ExteriorRing
Funct ion declarat ion:
ST_ExteriorRing(polygon)
Descript ion: T his funct ion ret urns t he ext erior ring of a polygon as a linest ring.
Example :
6.12.4.12. ST_InteriorRingN
Funct ion declarat ion:
ST_InteriorRingN(ST_Polygon, n)
Descript ion: T his funct ion ret urns t he nt h int erior ring of a polygon as a linest ring.
Example :
6.12.4.13. ST_Intersection
Funct ion declarat ion:
ST_Intersection(ST_Geometry1, ST_Geometry2)
Descript ion: T his funct ion ret urns a geomet ry t hat indicat es t he int ersect ion of t he input geomet ries.
If t he input geomet ries int ersect in a lower dimension, ST _Int ersect ion may drop lower-dimension
int ersect ions or ret urn a closed linest ring.
Example :
6.12.4.14. ST_SymmetricDiff
Funct ion declarat ion:
ST_SymmetricDiff(ST_Geometry1, ST_Geometry2)
Descript ion: T his funct ion ret urns a geomet ry t hat consist s of t he symmet ric differences of t he input
geomet ries.
Example :
6.12.4.15. ST_Union
Funct ion declarat ion:
Descript ion: T his funct ion ret urns a geomet ry t hat is t he union of t he input geomet ries.
Example :
Descript ion: If geomet ry1 cont ains geomet ry2, t his funct ion ret urns t rue. Ot herwise, t his funct ion
ret urns false.
Example :
6.12.5.2. ST_Crosses
Funct ion declarat ion:
Descript ion: If geomet ry1 crosses geomet ry2, t his funct ion ret urns t rue. Ot herwise, t his funct ion
ret urns false.
Not e Crossing indicat es t hat some point s in t he t wo geomet ries are t he same.
Example :
6.12.5.3. ST_Disjoint
Funct ion declarat ion:
Descript ion: If geomet ry1 and geomet ry2 do not int ersect , t his funct ion ret urns t rue. Ot herwise, t his
funct ion ret urns false.
Example :
6.12.5.4. ST_EnvIntersects
Funct ion declarat ion:
Descript ion: If t he envelopes of geomet ry1 and geomet ry2 int ersect , t his funct ion ret urns t rue.
Ot herwise, t his funct ion ret urns false.
Example :
-- false is returned.
SELECT ST_EnvIntersects(ST_LineString(0,0, 2,2), ST_LineString(1,0, 3,2)) from src LIMIT 1;
-- true is returned.
6.12.5.5. ST_Equals
Funct ion declarat ion:
Descript ion: If geomet ry1 equals geomet ry2, t his funct ion ret urns t rue. Ot herwise, t his funct ion ret urns
false.
Example :
6.12.5.6. ST_Intersects
Funct ion declarat ion:
Descript ion: If geomet ry1 and geomet ry2 int ersect , t his funct ion ret urns t rue. Ot herwise, t his funct ion
ret urns false.
Example :
6.12.5.7. ST_Overlaps
Funct ion declarat ion:
Descript ion: If geomet ry1 and geomet ry2 overlap, t his funct ion ret urns t rue. Ot herwise, t his funct ion
ret urns false. Overlapping excludes t he t angency of t he geomet ries.
Example :
SELECT ST_Overlaps(st_polygon(2,0, 2,3, 3,0), st_polygon(1,1, 1,4, 4,4, 4,1)) from src LIMI
T 1;
-- true is returned.
SELECT ST_Overlaps(st_polygon(2,0, 2,1, 3,1), ST_Polygon(1,1, 1,4, 4,4, 4,1)) from src LIMI
T 1;
-- false is returned.
6.12.5.8. ST_Relate
Funct ion declarat ion:
Descript ion: If geomet ry1 has t he specified Dimensionally Ext ended nine-Int ersect ion Model (DE-9IM)
relat ionship wit h geomet ry2, t his funct ion ret urns t rue. Ot herwise, t his funct ion ret urns false.
Example :
6.12.5.9. ST_Touches
Funct ion declarat ion:
Descript ion: If geomet ry1 and geomet ry2 spat ially t ouch and have no similar int erior point s, t his
funct ion ret urns t rue. Ot herwise, t his funct ion ret urns false.
Example :
6.12.5.10. ST_Within
Funct ion declarat ion:
Descript ion: If geomet ry1 is wit hin geomet ry2, t his funct ion ret urns t rue. Ot herwise, t his funct ion
ret urns false.
Example :
SELECT ST_Within(st_point(2, 3), st_polygon(1,1, 1,4, 4,4, 4,1)) from src LIMIT 1;
-- true is returned.
SELECT ST_Within(st_point(8, 8), st_polygon(1,1, 1,4, 4,4, 4,1)) from src LIMIT 1;
-- false is returned.
Descript ion: T his funct ion ret urns t he unique Geohash st ring of t he specified point . T his funct ion uses
a funct ion wit h t he ST _ prefix or t he specified longit udes and lat it udes as input paramet ers. If t he
precision paramet er is not specified, t he maximum precision is used.
Example :
6.12.6.2. ST_PointFromGeoHash
Funct ion declarat ion:
Descript ion: T his funct ion ret urns a point based on t he input Geohash value. If t he precision paramet er
is not specified, t he maximum precision is used.
Example :
SELECT ST_AsText(ST_PointFromGeoHash('9wqz7eep0eyq'));
SELECT ST_AsText(ST_PointFromGeoHash('9wqz7eep0eyq', 4));
6.12.6.3. ST_EnvelopeFromGeoHash
Funct ion declarat ion:
Descript ion: T his funct ion ret urns t he envelope of t he specified precision based on t he input Geohash
value. If t he precision paramet er is not specified, t he maximum precision is used.
Example :
6.12.6.4. ST_GeoHashNeighbours
Funct ion declarat ion:
Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F) t hat generat es nine dat a
records. T his funct ion ret urns nine Geohash st rings of t he current point and it s eight neighboring point s
based on t he input longit ude, lat it ude, and precision. T hese paramet ers must be specified.
Example :
Descript ion: T his funct ion overwrit es t he input geomet ry by using S2 cells at t he specified level. T hen,
it ret urns t he IDs of all S2 cells.
Example :
6.12.7.2. ST_S2CellIdsFromText
Funct ion declarat ion:
Descript ion: T his funct ion overwrit es t he well-known t ext (WKT ) represent at ion of t he input geomet ry
by using S2 cells at t he specified level. T hen, it ret urns t he IDs of all S2 cells.
Example :
6.12.7.3. ST_S2CellCenterPoint
Funct ion declarat ion:
Descript ion: T his funct ion calculat es t he cent er point of t he cell specified by t he cellId paramet er in
t he input S2 cell.
Example :
SELECT ST_S2CellCenterPoint('549015');
SELECT ST_AsText(ST_S2CellCenterPoint('89e37f091'));
6.12.7.4. ST_S2CellNeighbours
Funct ion declarat ion:
Descript ion: T his funct ion calculat es t he neighboring S2 cells of t he cell specified by t he cellId
paramet er at t he specified level. T hen, it ret urns t he IDs of all neighboring S2 cells.
Example :
Descript ion: T his funct ion ret urns t he approximat e geodesic area of t he input geomet ry based on
World Geodet ic Syst em 1984 (WGS84). T his funct ion convert s t he coordinat es of t he input geomet ry
from EPSG:4326 t o EPSG:3857. T hen, it calculat es t he plane area in square met ers.
Example :
6.12.8.2. ST_DistanceWGS84
Funct ion declarat ion:
Descript ion: T his funct ion ret urns t he approximat e geodesic dist ance of t he input geomet ry based on
World Geodet ic Syst em 1984 (WGS84). T his funct ion convert s t he coordinat es of t he input geomet ry
from EPSG:4326 t o EPSG:3857. T hen, it calculat es t he plane dist ance in met ers.
Example :
6.12.8.3. ST_BufferWGS84
Funct ion declarat ion:
Descript ion: T his funct ion ret urns t he approximat e geodesic buffer of t he input geomet ry based on
World Geodet ic Syst em 1984 (WGS84). T his funct ion convert s t he coordinat es of t he input geomet ry
from EPSG:4326 t o EPSG:3857. T hen, it calculat es t he plane buffer and convert s t he coordinat es back
t o EPSG:4326.
Example :
6.12.8.4. ST_GeodesicDistance
Funct ion declarat ion:
double ST_GeodesicDistance(double lon1, double lat1, double lon2, double lat2, string metho
d = VINCENTY)
double ST_GeodesicDistance(st_geometry geo1, st_geometry geo2, string method = VINCENTY)
Descript ion: T his funct ion calculat es t he geodesic dist ance bet ween t wo point s by using t he specified
met hod. T he support ed met hods are Vincent y, LawOfCosines, and Haversine. T he default value of t he
met hod paramet er is VINCENT Y. T he ret urn value is in radians.
Example :
6.12.8.5. ST_Distance_Sphere
Funct ion declarat ion:
Descript ion: T his funct ion uses t he algorit hm provided by AMAP t o calculat e t he approximat e
geodesic dist ance bet ween t he t wo input point s. T his funct ion uses ST _Point or t he specified
longit udes and lat it udes as input paramet ers.
Example :
SELECT ST_Distance_Sphere(
ST_GeomFromText('POINT(116.292078 39.919622)'),
ST_GeomFromText('POINT(116.286676 39.919593)'));
+------------+
| _c0 |
+------------+
| 460.6965312526471 |
+------------+
6.12.8.6. ST_Area_Sphere
Funct ion declarat ion:
Descript ion: T his funct ion uses t he algorit hm provided by AMAP t o calculat e t he geodesic area of t he
input geomet ry. T his funct ion uses only ST _Polygon and ST _Mult iPolygon as input paramet ers.
Example :
Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDAF). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry as input paramet ers t o creat e t he R-t ree index. T his
funct ion must be used wit h ot her R-t ree funct ions.
Example :
6.12.9.2. ST_ContainsFromRTree
Funct ion declarat ion:
Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat are cont ained by
t he geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _Cont ains query.
6.12.9.3. ST_CrossesFromRTree
Funct ion declarat ion:
Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat cross t he
geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _Crosses query.
6.12.9.4. ST_EqualsFromRTree
Funct ion declarat ion:
Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat equal t he
geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _Equals query.
6.12.9.5. ST_IntersectsFromRTree
Funct ion declarat ion:
Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat int ersect wit h t he
geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _Int ersect s query.
6.12.9.6. ST_OverlapsFromRTree
Funct ion declarat ion:
Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat overlap wit h t he
geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e ST _Overlaps query.
6.12.9.7. ST_TouchesFromRTree
Funct ion declarat ion:
Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat spat ially t ouch
t he geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _T ouches query.
6.12.9.8. ST_WithinFromRTree
Funct ion declarat ion:
Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat include t he
geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _Wit hin query.
6.12.9.9. ST_KNNFromRTree
Funct ion declarat ion:
Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of k object s t hat are near t o t he
geomet ry from t he R-t ree index.
6.12.9.10. Example
T his t opic provides examples on how t o use R-t ree index funct ions.
Example 1
-- Query the intersections of line segments in the A table and polygons in the B table.
set odps.sql.allow.cartesian=true;
SELECT a.id as link_id, b.id as shape_id
FROM link_sample_wkt a, poi_sample_wkt b
WHERE geospatial.ST_IsValid(b.shape)
AND geospatial.ST_Intersects(
geospatial.ST_LineString(a.line),
geospatial.ST_Multipolygon(b.shape));
Summary:
resource cost: cpu 3.28 Core * Min, memory 5.76 GB * Min
inputs:
meta_dev.poi_sample_wkt: 1000 (237592 bytes)
meta_dev.link_sample_wkt: 1000 (105940 bytes)
outputs:
Job run time: 111.000
+---------+----------+
| link_id | shape_id |
+---------+----------+
| 5121371185457659960 | B000A844XK |
| 5121377123249946651 | B000A85TV4 |
| 5121377166199619654 | B000A844KT |
+---------+----------+
-- After optimization by using the new function:
SELECT /*+mapjoin(i)*/
geospatial.ST_IntersectsFromRTree(id, line, i.index)
AS (link_id, shape_id)
FROM link_sample_wkt
JOIN
(
SELECT geospatial.ST_BuildRTreeIndex(id, shape) AS index
FROM poi_sample_wkt
WHERE geospatial.ST_IsValid(shape)
) i;
Summary:
resource cost: cpu 1.03 Core * Min, memory 1.99 GB * Min
inputs:
meta_dev.poi_sample_wkt: 1000 (237592 bytes)
meta_dev.link_sample_wkt: 1000 (105940 bytes)
outputs:
Job run time: 41.000
+---------+----------+
| link_id | shape_id |
+---------+----------+
| 5121371185457659960 | B000A844XK |
| 5121377123249946651 | B000A85TV4 |
| 5121377166199619654 | B000A844KT |
+---------+----------+
Example 2
-- Create an R-tree for all points in a table and use the KNN function to locate the neares
t point of each point.
SELECT /*+mapjoin(i)*/
geospatial.ST_KNNFromRTree(id, point, 1, i.index) AS (id1, id2)
FROM poi_sample_wkt
JOIN
(
SELECT geospatial.ST_BuildRTreeIndex(id, point) AS index
FROM poi_sample_wkt
) i;
Summary:
resource cost: cpu 1.17 Core * Min, memory 2.24 GB * Min
inputs:
meta_dev.poi_sample_wkt: 1000 (237592 bytes)
outputs:
Job run time: 46.000
+-----+-----+
| id1 | id2 |
+-----+-----+
| B000A01B4E | B000A01B4E |
| B000A01C19 | B000A01C19 |
| B000A023A5 | B000A023A5 |
| B000A02F81 | B000A02F81 |
| B000A07BEE | B000A07BEE |
| B000A07E06 | B000A07E06 |
| B000A08863 | B000A08863 |
...
-- The table has 1,000 rows of data. This function returns 1,000 rows of data, which meets
your expectations.
Descript ion: T his funct ion checks whet her t he input geomet ry or well-known t ext (WKT ) st ring meet s
t he requirement s.
Example :
6.12.10.2. ST_Transform
Funct ion declarat ion:
Descript ion: T his funct ion convert s t he coordinat es of t he input geomet ry from one spat ial reference
syst em t o anot her. T he ST _T ransformWGS84 funct ion convert s t he coordinat es of t he geomet ry from
EPSG:4326 t o EPSG:3857. T he ST _T ransform funct ion convert s t he geomet ry from fromSRID t o t oSRID.
If t he overload funct ion cont ains only t oSRID, you must call t he ST _Set SRID funct ion first .
Example :
Use SQ L functions
Example:
Example:
Example:
MaxComput e support s t he CLONE T ABLE st at ement . You can execut e t his st at ement t o clone dat a from
one t able t o anot her.
Synt ax
Not e
If t he dest inat ion t able is not creat ed before dat a is cloned, a t able is creat ed by using t he
CREAT E T ABLE LIKE st at ement when you execut e t he CLONE T ABLE st at ement .
If t he dest inat ion t able is creat ed before dat a is cloned and IF EXIST S OVERWRIT E is
specified, dat a in t he specified part it ions of t he dest inat ion t able is overwrit t en.
If t he dest inat ion t able is creat ed before dat a is cloned and IF EXIST S IGNORE is specified,
exist ing part it ions in t he dest inat ion t able are skipped and dat a in t hese part it ions is not
overwrit t en.
T he schema of a dest inat ion t able must be compat ible wit h t hat of t he source t able.
T he CLONE T ABLE st at ement support s bot h part it ioned and non-part it ioned t ables. T ables t hat have
special dat a organizat ion st ruct ures are not support ed. T hese t ables include clust ered t ables, shard
t ables, Xlib or Algo t ables, and t ables wit h ext reme st orage.
Make sure t hat t he configurat ion of t he clust er for t he source t able int ersect s wit h t hat for t he
dest inat ion t able and t he dat a t hat you want t o process is in t he same clust er. If any of t he
condit ions is not met , an error is ret urned.
If t he dest inat ion t able already exist s before dat a is cloned, you can clone dat a from a maximum of
10,000 part it ions at a t ime.
If t he dest inat ion t able does not exist before dat a is cloned, t he number of part it ions t hat you can
clone at a t ime is not limit ed, which ensures at omicit y.
If a hard link in t he Apsara Dist ribut ed File Syst em is fault y, purge t he recycle bin and t ry again.
T he user who submit s t he command must have t he Creat e T able and Updat e T able permissions on
t he t arget project .
Example
Clone dat a from t he part it ioned t able and skip exist ing part it ions in t he dest inat ion t able.
1. Broadcast hash join: T his met hod is used when a JOIN operat ion involves a small t able. T he small
t able is broadcast ed and t ransferred t o all JoinT ask inst ances. T hen, t he hash join operat ion is
performed t o join t he small t able wit h a large t able.
2. Shuf f le hash join: T his met hod is used when a JOIN operat ion involves large t ables t hat cannot be
broadcast ed direct ly. In t his case, t he hash shuffle operat ion is performed on t wo t ables based on
join keys. T he hash result s for t he same key-value pairs are t he same. T his ensures t hat result s t hat
have t he same key are collect ed on a JoinT ask inst ance. For each inst ance, a hash t able is creat ed
by using a small t able, probe operat ions are performed by using a large t able, and t hen t he t ables
are joined.
3. Sort merge join: T his met hod is used when a JOIN operat ion involves larger t ables and t he
preceding met hods cannot be used because t he memory is insufficient t o creat e a hash t able. In
t his case, t he hash shuffle operat ion is performed on t wo t ables based on join keys, t he obt ained
values are sort ed by using join keys, and t hen t he sort ed values are merged.
T he sort merge join operat ion is commonly used in MaxComput e because MaxComput e processes
huge volumes of dat a in most cases. T his operat ion generat es repeat ed shuffle and join operat ions.
T he physical execut ion plan of Job Scheduler of t he JOIN operat ion also requires mult iple st ages, which
consumes excessive volumes of resources.
T herefore, MaxComput e allows you t o configure t he hash shuffle and sort at t ribut es when t he dat a is
init ially generat ed in a t able. T his prevent s dat a from being shuffled and sort ed repeat edly in
subsequent queries. As a result , t he number of st ages in t he physical execut ion plan of Job Scheduler of
a JOIN operat ion is reduced. T he preceding figure shows t hat only one st age is required.
MaxComput e Hash Clust ering allows you t o configure t he shuffle and sort at t ribut es of a t able when
you creat e t he t able. As a result , MaxComput e opt imizes t he execut ion plan, improves t he efficiency,
and saves resources based on t he exist ing st orage charact erist ics.
6.15.2. Descriptions
6.15.2.1. Enable or disable Hash Clustering
T he Hash Clust ering feat ure is available and enabled by default . If you want t o use clust ered indexes,
add t he following flag:
set odps.sql.cfile2.enable.read.write.index.flag=true;
Aft er t he flag is set t o t rue, t he syst em aut omat ically creat es indexes for t he sort ed hash bucket s t o
improve query efficiency. T o use clust ered indexes, you must add t his flag during t able creat ion and
subsequent queries. If you want t o use clust ered indexes in your project all t he t ime, cont act t he
MaxComput e t eam.
Not e Clust ered indexes improve t he efficiency of queries (equivalent values or ranges) based
on sort keys. However, you can st ill experience t he superior performance provided by Hash
Clust ering alt hough you do not add t his flag.
CREATE TABLE T1 (a string, b string, c bigint) CLUSTERED BY (c) SORTED by (c) INTO 1024 BUC
KETS;
CREATE TABLE T1 (a string, b string, c bigint) PARTITIONED BY (dt string) CLUSTERED BY (c)
SORTED by (c) INTO 1024 BUCKETS;
T he following sect ions det ail t he CLUST ERED BY, SORT ED BY, and INT O number_of_bucket s BUCKET S
clauses.
CLUSTERED BY
T he CLUST ERED BY clause specifies hash keys. MaxComput e performs t he hash operat ion on t he
specified column and dist ribut es dat a t o bucket s based on t he hash values. T o prevent dat a skew and
hot spot s, and t o concurrent ly execut e st at ement s, we recommend t hat you specify a column t hat has
large value ranges and a small number of duplicat e key-value pairs in CLUST ERED BY. In addit ion, t o
opt imize t he JOIN operat ion, we recommend t hat you select commonly used join or aggregat ion keys.
T he join and aggregat ion keys are similar t o t he primary keys in convent ional dat abases.
SO RTED BY
T he SORT ED BY clause specifies how fields are sort ed in a bucket . We recommend t hat you specify t he
same column in SORT ED BY as t hat in CLUST ERED BY t o improve execut ion efficiency. Aft er you specify
t he column in SORT ED BY, MaxComput e aut omat ically generat es indexes and t hen execut es SQL
st at ement s fast er when you query dat a based on t hese indexes.
You can remove t he shuffle operat ion only for t ables wit h t he same number of bucket s in MaxComput e.
In lat er versions, MaxComput e will support bucket alignment . You will be able t o remove t he shuffle
operat ion for t ables whose numbers of bucket s are mult iples or fact ors of each ot her. T o achieve
bucket alignment , we recommend t hat you set t he number of bucket s t o a power of 2, for example,
512, 1,024, and 2,048. T he maximum number of bucket s is 4,096. If t he number of bucket s exceeds t he
value, t he performance and resource usage may be affect ed.
If you want t o remove t he shuffle and sort operat ions during a JOIN operat ion on t wo t ables, t he
numbers of hash bucket s in t he t ables must be t he same. If t he numbers t hat are calculat ed based on
t he aforement ioned met hod are inconsist ent , we recommend t hat you use t he larger number for t he
JOIN operat ion. T his guarant ees t hat SQL st at ement s can be execut ed concurrent ly in an efficient
manner.
If t he sizes of t wo t ables great ly differ, you can set t he number of bucket s for t he large t able t o
several t imes of t hat for t he small t able, for example, 256 and 1,024. If aut omat ic hash bucket split and
merging are support ed, t he set t ings can be opt imized by using dat a feat ures.
T he ALT ER T ABLE st at ement can only modify t he Hash Clust ering at t ribut e of a part it ioned t able. T he
Hash Clust ering at t ribut e cannot be modified aft er it is added t o a non-part it ioned t able.
T he ALT ER T ABLE st at ement t akes effect only for t he new part it ions of a t able, which include t he
part it ions generat ed by using t he INSERT OVERWRIT E st at ement . New part it ions are st ored based on
t he Hash Clust ering at t ribut e. T he st orage format s of exist ing part it ions remain unchanged.
T he ALT ER T ABLE st at ement t akes effect only for t he new part it ions of a t able. T herefore, you
cannot specify a part it ion in t his st at ement .
T he ALT ER T ABLE st at ement is suit able for exist ing t ables. Aft er t he Hash Clust ering at t ribut e is added,
new part it ions are st ored based on t he Hash Clust ering at t ribut e.
You can also execut e t he following st at ement t o view part it ion at t ribut es of a part it ioned t able:
6.15.3. Benefits
6.15.3.1. Bucket pruning and index optimization
T he following code provides a synt ax sample:
CREATE TABLE t1 (id bigint, a string, b string) CLUSTERED BY (id) SORTED BY (id) into 1000
BUCKETS;
...
SELECT t1.a, t1.b, t1.c FROM t1 WHERE t1.id=12345;
T his synt ax indicat es a full scan for a st andard t able. A full scan for a large t able consumes a large
number of resources. However, if t he hash shuffle operat ion is performed on all id fields and t he id
fields are sort ed, t he query is great ly simplified. T he sample procedure is as follows:
1. Find t he hash bucket t hat corresponds t o 12345. T his query is performed in only one bucket , not all
1,000 bucket s. T his process is called bucket pruning.
2. Dat a in a bucket is st ored based on IDs. MaxComput e aut omat ically creat es indexes and uses t he
INDEX LOOKUP funct ion t o locat e relevant records.
T he simplified procedure not only great ly reduces t he number of mappers, but also allows mappers t o
locat e t he page where t he dat a is st ored by using t he INDEX funct ion. T herefore, t he volume of loaded
dat a is great ly reduced.
In most cases, t he depart ment column is shuffled and sort ed. T hen, a st ream aggregat e operat ion is
performed t o collect st at ist ics on t he depart ment groups. However, if CLUSTERED BY (department)
SORTED BY (department) is execut ed for t he t able dat a, t he shuffle and sort operat ions are no longer
required.
For example, t ake a t able wit h 100 GB of T PC-H line it ems and mult iple dat a t ypes, such as INT , DOUBLE,
and ST RING. When Hash Clust ering is used, about 10% of t he st orage space is saved while t he volume of
dat a and compression format remain unchanged.
6.15.4. ShuffleRemove
Range clust ering t ables support t he join and aggregat e operat ions. If a join or group key is a range
clust ering key or it s prefix, dat a redist ribut ion is not required. T his mechanism is called ShuffleRemove,
which improves execut ion efficiency.
If you join t wo hash clust ering t ables and t he numbers of bucket s in t hese t ables are different but
are mult iples or fact ors of each ot her, dat a redist ribut ion is not required. T his improves execut ion
efficiency.
Correlat ed Shuffle Remove is support ed. If dat a meet s dist ribut ion requirement s but does not meet
t he sort ing requirement s, you can add a sort operat or t o avoid dat a redist ribut ion.
6.15.5. Limits
T he limit s of Hash Clust ering are described as follows:
T he INSERT INT O st at ement is not support ed. You can only execut e t he INSERT OVERWRIT E
st at ement t o import dat a.
Small files cannot be merged. Dat a is evenly dist ribut ed in bucket s when it is split , so no small files are
generat ed. If you merge files, t he dat a dist ribut ion is affect ed. However, you can st ill use t he merge
and archive commands t o change t he st orage format of a t able file and t he format of a RAID file.
You cannot use T unnel t o upload dat a t o a range-clust ered t able because dat a uploaded by using
T unnel is unsort ed.
In t he fut ure, t hese limit s will be resolved. St ay t uned for updat es on t he official websit e.
Limits
Maximum
Item Category Description
value/Limit
Co mment lengt h 1,024 bytes Length A comment can be up to 1,024 bytes in length.
Co lumn
A table can contain a maximum of 1,200 column
def init io ns in a 1,200 Quantity
definitions.
t able
St at ist ical
A table can contain a maximum of 100 statistical
def init io ns o f a 100 Quantity
definitions.
t able
St at ist ical
T he length of statistical definitions in a table
def init io n lengt h 64,000 Length
cannot exceed 64,000.
o f a t able
Maximum
Item Category Description
value/Limit
Cannot be
Java UDFs abstract or Operation Java UDFs cannot be abstract or static.
static
Part it io ns t o
10,000 Quantity A maximum of 10,000 partitions can be queried.
query
set odps.sql.mapper.cpu=100
Purpose: It is used t o set t he number of CPUs for each inst ance in a Map t ask. Default value: 100. Value
range: 50 t o 800.
set odps.sql.mapper.memory=1024
Purpose: It is used t o set t he memory size for each inst ance in a Map t ask. Default value: 1024 MB. Value
range: 256 MB t o 12,288 MB.
set odps.sql.mapper.merge.limit.size=64
Purpose: It is used t o set t he maximum size of cont rol files t o be merged. Default value: 64 MB. You can
set t his variable t o cont rol t he input s of mappers. Value range: 0 t o Int eger.MAX_VALUE.
set odps.sql.mapper.split.size=256
Purpose: It is used t o set t he maximum dat a input volume for a map. Default value: 256 MB. You can set
t his variable t o cont rol t he input s of mappers. Value range: 1 t o Int eger.MAX_VALUE.
Purpose: It is used t o set t he number of inst ances in a JOIN t ask. Default value: 1. Value range: 0 t o
2,000.
set odps.sql.joiner.cpu=100
Purpose: It is used t o set t he number of CPUs for each inst ance in a JOIN t ask. Default value: 100. Value
range: 50 t o 800.
set odps.sql.joiner.memory=1024
Purpose: It is used t o set t he memory size for each inst ance in a JOIN t ask. Default value: 1,024 MB.
Value range: 256 MB t o 12,288 MB.
Purpose: It is used t o set t he number of inst ances in a Reduce t ask. Default value: 1. Value range: 0 t o
2,000.
set odps.sql.reducer.cpu=100
Purpose: It is used t o set t he number of CPUs for each inst ance in a Reduce t ask. Default value: 100.
Value range: 50 t o 800.
set odps.sql.reducer.memory=1024
Purpose: It is used t o set t he memory size for each inst ance in a Reduce t ask. Default value: 1,024 MB.
Value range: 256 t o 12,288 MB.
Purpose: It is used t o set t he maximum memory size for a UDF JVM heap. Default value: 1,024 MB. Value
range: 256 t o 12,288 MB.
set odps.sql.udf.timeout=600
Purpose: It is used t o set t he t imeout value of a UDF. Default value: 600 seconds. Value range: 0 t o
3,600 seconds.
set odps.sql.udf.python.memory=256
Purpose: It is used t o set t he maximum memory size for UDF pyt hon. Default value: 256 MB. Value
range: 64 t o 3,072 MB.
set odps.sql.udf.optimize.reuse=true/false
Purpose: aft er st art -up, each UDF funct ion expression can only be calculat ed once, improving
performance. T he default is t rue.
set odps.sql.udf.strict.mode=false/true
Purpose: It is used t o cont rol funct ions regarding whet her t o ret urn NULL or error if dirt y dat a is
encount ered. If it is t rue, an error is ret urned. If it is false, NULL is ret urned.
Purpose: It is used t o set t he maximum memory of a small t able in MAPJOIN. Default vlaue 512 MB. Value
range: 128 t o 2,048 MB.
set odps.sql.reshuffle.dynamicpt=true/false
Purpose:
Some scenarios of dynamic part it ioning are t ime-consuming. Shut t ing t hem down can speed up SQL.
If t he dynamic part it ion value is very small, disabling dynamic part it ion can avoid dat a skew.
set odps.sql.skewjoin=true/false
Effect : enables t he join opt imizat ion. It t akes effect only when odps.sql.skewinfo is configured.
set odps.sql.skewinfo
Purpose: It is used t o set det ailed informat ion of join opt imizat ion. T he command synt ax is as follows:
set odps.sql.skewinfo=skewed_src:(skewed_key)[("skewed_value")]
Example:
T he following command is used t o set a single skewed dat a value in a single field:
set odps.sql.skewinfo=src_skewjoin1:(key)[("0")]
-- Command output: explain select a.key c1, a.value c2, b.key c3, b.value c4 from src a joi
n src_skewjoin1 b on a.key = b.key;
T he following command is used t o set mult iple skewed dat a values in a single field:
set odps.sql.skewinfo=src_skewjoin1:(key)[("0")("1")]
-- Command output: explain select a.key c1, a.value c2, b.key c3, b.value c4 from src a joi
n src_skewjoin1 b on a.key = b.key;
In t he current version, MapReduce programs are aut omat ically convert ed t o SQL for execut ion. Aft er
t he conversion, you can use t he compiler, cost -based opt imizer, and vect orized execut ion engine
released wit h MaxComput e V2.0 t o process t he MapReduce programs. T he new feat ures of t he SQL
engine can also be used. T he feat ures, performance, and st abilit y of t he SQL engine are opt imized.
Not ice
You do not need t o change t he original APIs and job logic.
Only MapReduce jobs of t he OpenMR t ype, which are writ t en wit h MapReduce APIs, can be
convert ed t o SQL.
T his feat ure can be used for project s and jobs.
T his feat ure support s views as t he input .
T his feat ure support s ext ernal t ables as t he input .
T his feat ure support s T emporaryFile reads and writ es.
T his feat ure allows you t o read dat a from and writ e dat a t o hash clust ering t ables.
T his feat ure support s t he near-real-t ime execut ion of small jobs.
You can configure t he execut ion mode based on your business requirement s. T he default
execut ion mode is lot . In lot mode, jobs are execut ed by MapReduce. T he new compiler, opt imizer,
and execut ion engine are not required.
You can enable t he execut ion mode by set t ing t he odps.mr.run.mode paramet er. Valid values: lot ,
sql, and hybrid .
Met hod 1: Enable t he execut ion mode at t he project level. When t he execut ion mode is enabled,
it affect s all jobs. T herefore, t he project administ rat or must apply for and enable t he execut ion
mode. Set t he odps.mr.run.mode paramet er t o hybrid or sql. If SQL execut ion fails in hybrid
mode, t he job is execut ed by MapReduce. If SQL execut ion fails in sql mode, an error is ret urned.
Met hod 2: Enable t he execut ion mode at t he session level. T his met hod is only valid for t he
current job. T o enable t he execut ion mode, use one of t he following met hods:
Add a set flag, such as set odps.mr.run.mode=hybrid , before JAR st at ement s.
T he execut ion mode can be enabled at t he project level lat er by MaxComput e O&M personnel.
You can add t he SET st at ement before a MapReduce job or configure t he job paramet er for it .
T hese met hods t ake effect at t he session level and apply only t o t he current job.
1. LogView XML.
Open Logview and click t he LOT node in t he cent er of t he page. T he SQL jobs t hat are convert ed
from MapReduce jobs are included in t he XML informat ion of t he node. Example:
You can see t hat t he new execut ion engine is used t o execut e jobs.
T he JSON summary informat ion in MapReduce only cont ains t he input and out put informat ion of
Map and Reduce. However, t he JSON summary informat ion in SQL allows you t o view det ails about
each st age of SQL execut ion, such as all execut ion paramet ers, logical execut ion plans, physical
execut ion plans, and execut ion det ails. Example:
"midlots" :
[
"LogicalTableSink(table=[[odps_flighting.flt_20180621104445_step1_ad_quality_tech_qp_a
lgo_antifake_wordbag_filter_bag_change_result_lv2_20, auctionid,word,match_word(3) {0,
1, 2}]])
OdpsLogicalProject(auctionid=[$0], word=[$1], match_word=[$2])
OdpsLogicalProject(auctionid=[$0], word=[$1], match_word=[$2])
OdpsLogicalProject(auctionid=[$0], word=[$1], match_word=[$2])
OdpsLogicalProject(auctionid=[$2], word=[$3], match_word=[$4])
OdpsLogicalTableFunctionScan(invocation=[[MR2SQL_MAPPER_152955294118813063732($0, $1)](
)], rowType=[RecordType(VARCHAR(2147483647) item_id, VARCHAR(2147483647) text, VARCHAR(
2147483647) __tf_0_0, VARCHAR(2147483647) __tf_0_1, VARCHAR(2147483647) __tf_0_2)])
OdpsLogicalTableScan(table=[[ad_quality_tech.qp_algo_antifake_wordbag_filter_bag_change
_lv2_20, item_id,text(2) {0, 1}]])
]
You can use eit her of t he following met hods t o specify volume files:
In t he preceding commands, project and label are opt ional, and t he current project and default
label are used by default . If mult iple input and out put files are used, labels are used t o
dist inguish t he files from each ot her. Aut horizat ion is required before you access t he volume files
of ot her project s.
Configure paramet ers t o specify t he volume and part it ion of t he input and out put files. If
mult iple input or out put files are used, separat e t he paramet ers wit h commas (,).
2. Call t he following met hod by using a cont ext object in t he map and reduce st eps t o writ e dat a t o
t he dist ribut ed file syst em or writ e dat a st ream input and out put files:
context.getOutputVolumeFileSystem();
6.19.1. Features
MaxComput e SQL provides t he feat ure of analyzing t he mapping bet ween SQL input and out put fields.
T his feat ure is t o calculat e t he fields in t he input and out put t ables based on field mapping. Example:
T wo columns are ret urned: key and t ot al. T he key column corresponds t o t he src.key column of t he
input t able. T he t ot al column corresponds t o t he src.value column of t he input t able.
O utput format
Field mapping analysis support s human-readable and JSON format s. You can use t he set
odps.sql.select.output.format=HumanReadable/json flag t o specify t he out put format .
./bin/odpscmd.bat -X D:\lineage.q
Int eract ive mode: Aft er you ent er t he int eract ive mode of odpscmd, you can use t he preceding flag t o
specify t he out put format . T he usage met hod is similar t o t hat used t o commit SQL jobs.
For a running job inst ance where t he min, max, and avg values for t he paramet ers t ime, input records,
and out put records are imbalanced (for example, max is much great er t han avg), a dat a skew problem
may have occurred. You can check t he log view t o locat e t he dat a skew problem, as shown in t he
following figure.
T he Long T ails t ab of each t ask shows t he inst ance where t he dat a skew occurred. T he root cause of
dat a skew is t hat t he amount s of dat a processed by some inst ances are much higher t han t hat
processed by ot her inst ances, causing t he running t ime of t hese inst ances t o exceed t he average t ime
of ot her inst ances. As a result , t he ent ire job slows down.
You can reduce t he dat a skew of different SQL dat a t ypes using different met hods.
Solut ion: Enable t he group skew prevent ion paramet er before running SQL st at ement s:
set odps.sql.groupby.skewindata=true
Not e If t his paramet er is set t o t rue, t he syst em adds random fact ors t o t he shuffle hash
algorit hm and adds a new t ask t o prevent dat a skew.
If t here are small t ables on bot h sides of 'join', perform 'map join' inst ead of 'join'.
T he skewed key can be dealt wit h by using individual logic. For example, a large amount of NULL dat a
in keys on bot h sides of a t able result s in skew. In t his case, you need t o filt er out t he NULL dat a
before performing t he JOIN operat ion or replacing NULL values wit h random values by using t he CASE
WHEN clause, and t hen do JOIN operat ion.
If you do not want t o change SQL st at ement s, set t he following paramet ers t o enable aut omat ic
opt imizat ion on MaxComput e:
set odps.sql.reshuffle.dynamicpt=true;
It int roduces an addit ional level of ReduceT ask t o allow one or more reduce inst ances t o writ e dat a t o
t he same t arget part it ion. T his prevent s t oo many small files from being generat ed. However, dynamic
part it ion shuffle may cause dat a skew.
Solut ion: If t here are only a few t arget part it ions, t he syst em will not generat e many small files. In t his
case, you can run t he following command t o disable t he preceding funct ion, or disable dynamic
part it ioning:
set odps.sql.reshuffle.dynamicpt=false;
In general, t asks lacking comput ing resources have t wo charact erist ics, one of which is t hat t he t ask
get s st uck wit h t he out put remained at a cert ain st age. For example, in t he following figure, t he
progress of t he M1_St g1 t ask has st ayed at 0% (because R2_1_St g1 depends on M1_St g1, it st ays at
0% unt il M1_St g1 ends).
T he ot her charact erist ic is t hat t he t ask remains in "Ready" st at e in t he Logview (as shown in t he
following figure) (a "Ready" t ask is await ing allocat ion of resources; a "Wait ing" t ask is wait ing for
complet ion of t he dependent t ask). T he "Ready" st at e indicat es t hat t he resources for running t hese
st and-by t ask inst ances are insufficient . Once t he inst ances obt ain t he necessary resources, t hey resume
operat ing and change t o "Running" st at e.
Each t ask is split int o subt asks based on t he execut ion plan and shown in a DAG, and each subt ask
invokes mult iple inst ances t o execut e t he comput at ion concurrent ly. In general, t he resources required
for invoking an inst ance are a 1-core CPU and 2 GB of memory. A quot a group is assigned t o each
project for reasonable resource allocat ion. T he quot a group det ermines t he maximum amount of
resources (CPU and memory) t hat can be used by all jobs in t he project concurrent ly. Once t he resource
usage for simult aneously running t asks reaches t he limit of t he quot a group, t he t asks are st uck due t o
insufficient resources.
Example of a part it ioning st at ement : create table src (key string, value bigint) partitioned by
(pt string); . In t his example, select * from src where pt='20160901'; specifies t he part it ioning
format . MaxComput e t akes only t he dat a in t he "20160901" part it ion as t he input when generat ing a
query plan.
Example of a non-part it ioning st at ement : select * from src where key = 'MaxCompute'; scans
t he ent ire t able.
Part it ioning is usually based on dat e or geographical region. You may also set part it ions based on your
business requirement s. Example:
Example: Run t he create table test3 (key boolean) partitioned by (pt string, ds string)
lifecycle 100; command t o creat e a t able wit h a lifecycle of 100. If t he lat est modificat ion t ime of
t his t able or part it ion was more t han 100 days ago, t he t able or part it ion will be delet ed.
Not ice T he lifecycle t akes a part it ion as t he smallest unit , so for a part it ioned t able, if some
part it ions reach t he lifecycle t hreshold, t hey will be delet ed direct ly. Part it ions t hat have not
reached t he lifecycle t hreshold are not be affect ed.
Run t he alter table table_name set lifecycle days; command t o modify t he lifecycle of an
exist ing t able.
Example:
Many inst ances are occupied because a single inst ance can process only a small number of files. T his
result s in a wast e of resources, affect ing t he overall execut ion performance.
T he file syst em becomes larger, while t he use rat io of disk space becomes smaller.
Current ly, t here are t wo alt ernat ive ways t o merge small files: ALT ER merge mode and SQL merge
mode:
T he ALT ER merge mode merges files t hrough 'console' command. T he command format is as follows:
Set cont rol paramet ers aft er SQL execut ion is complet e. Run odps.task.merge.enabled=true; to
det ermine whet her it is necessary t o merge small files. If so, st art FuxiJob t o merge t hese files.
T his problem can be solved by set t ing t he UDF runt ime paramet ers:
odps.sql.mapper.memory=3072;
set odps.sql.udf.jvm.memory=2048;
set odps.sql.udf.python.memory=1536;
6.21. Appendix
6.21.1. Escape character
St ring const ant s in MaxComput e SQL can be enclosed in single or double quot at ion marks, in double
quot at ion marks enclosed in single quot at ion marks, or in single quot at ion marks enclosed in double
quot at ion marks. Ot herwise, t hey must be expressed wit h an escape charact er. Examples of correct
expressions: "I'm a happy coder!" and 'I\'m a happy coder!'.
In MaxComput e SQL, t he backslash (\) is an escape charact er, which expresses t he special charact er in a
st ring or int erpret s t he charact er t hat follows as t he charact er it self. When a st ring const ant is read, if
t he backslash is followed by t hree valid oct al digit s in t he range from 001 t o 177, t he syst em convert s
t he ASCII values int o t he corresponding charact ers. T he following t able list s t he mappings bet ween
escape sequences and represent ed charact ers.
Escape sequences
Escape
Represented character
sequence
\b Backspace
\t T ab
\n Newline
\r Carriage return
\ \ Backslash
\; Semicolon
\Z Control-Z
\0 o r \00 T erminator
Example :
Not e For t he charact er set of st rings, MaxComput e SQL current ly support s t he UT F-8
charact er set . Dat a t hat is encoded in a different format may result in incorrect calculat ions.
^: t he beginning of a row
$: t he end of a row
.: any charact er
*: mat ches zero or mult iple t imes.
+: mat ches once or mult iple t imes.
?: mat ches a modifier. If t his charact er follows any one of ot her delimit ers (*, +, ?, {n}, {n,}, or {n,m}),
t he mat ch is lazy. In t he lazy mode, as few st rings as possible are mat ched. In t he default greedy
mode, as many searched st rings as possible are mat ched zero t imes or once.
A|B: A or B
(abc)*: mat ches t he abc sequence zero or mult iple t imes.
{n} or {m,n}: t he number of mat ches
[ab]: mat ches any charact er in t he bracket s.
[^ab]: ^ represent s NOT . T his met acharact er mat ches any charact er t hat is neit her a nor b.
\: t he escape sequence
\n: n represent s digit 1 t o 9. T his met acharact er specifies backward reference.
\d: digit
\D: non-digit
[::]: POSIX charact er set
[[:alnum:]]: let t er or digit in t he range of [a-zA-Z0-9]
[[:alpha:]]: let t er in t he range of [a-zA-Z]
[[:ascii:]]: ASCII charact er in t he range of [\x00-\x7F]
[[:blank:]]: space and t ab in t he range of [ \t ]
[[:cnt rl:]]: cont rol charact er in t he range of [\x00-\x1F\x7F]
[[:digit :]]: digit in t he range of [0-9]
[[:graph:]]: any charact er except space in t he range of [\x21-\x7E]
[[:space:]]: space in t he range of [ \t \r\n\v\f]
[[:print :]]: [:graph:] and [:space:] in t he range of [\x20-\x7E]
T he syst em uses a backslash (\) as t he escape charact er, so a backslash (\) in a regular expression
indicat es second escape. For example, t he st ring t o be mat ched by t he regular expression is "a+b". T he
plus sign (+) is a special charact er in regex, and must be escaped t o obt ain t he st ring "a+b". However,
t he syst em needs t o escape t he first backslash (escape charact er) before it can be read by regex.
Hence, t he expression t o mat ch "a+b" is "a\\+b".
In ext reme cases, t o mat ch t he charact er "\", which is a special charact er in t he regular engine, t he
expression must be "\\". T he syst em must perform an escape on t he expression, so it is expressed as
"\\\".
Not e
If a MaxComput e SQL st at ement cont ains "a\b", 'a\b' is displayed in t he out put because
MaxComput e escapes t he expression.
If a st ring cont ains a t ab or t ab charact er, t he syst em reads '\t ' and st ores it as one
charact er. T herefore, it is a common charact er in t he regular expression mode.
% & && ( ) * +. / ; < <= <> = > >= ? ADD AFTER ALL ALTER ANALYZE AND ARCHIVE ARRAY AS ASC B
EFORE BETWEEN BIGINT BINARY BLOB BOOLEAN BOTH BUCKET BUCKETS BY CASCADE CASE CAST CFILE CHA
NGE CLUSTER CLUSTERED CLUSTERSTATUS COLLECTION COLUMN COLUMNS COMMENT COMPUTE CONCATENATE C
ONTINUE CREATE CROSS CURRENT CURSOR DATA DATABASE DATABASES DATE DATETIME DBPROPERTIES DEFE
RRED DELETE DELIMITED DESC DESCRIBE DIRECTORY DISABLE DISTINCT DISTRIBUTE DOUBLE DROP ELSE
ENABLE END ESCAPED EXCLUSIVE EXISTS EXPLAIN EXPORT EXTENDED EXTERNAL FALSE FETCH FIELDS FIL
EFORMAT FIRST FLOAT FOLLOWING FORMAT FORMATTED FROM FULL FUNCTION FUNCTIONS GRANT GROUP HAV
ING HOLD_DDLTIME IDXPROPERTIES IF IMPORT IN INDEX INDEXES INPATH INPUTDRIVER INPUTFORMAT IN
SERT INT INTERSECT INTO IS ITEMS JOIN KEYS LATERAL LEFT LIFECYCLE LIKE LIMIT LINES LOAD LOC
AL LOCATION LOCK LOCKS LONG MAP MAPJOIN MATERIALIZED MINUS MSCK NOT NO_DROP NULL OF OFFLINE
ON OPTION OR ORDER OUT OUTER OUTPUTDRIVER OUTPUTFORMAT OVER OVERWRITE PARTITION PARTITIONED
PARTITIONPROPERTIES PARTITIONS PERCENT PLUS PRECEDING PRESERVE PROCEDURE PURGE RANGE RCFILE
READ READONLY READS REBUILD RECORDREADER RECORDWRITER REDUCE REGEXP RENAME REPAIR REPLACE R
ESTRICT REVOKE RIGHT RLIKE ROW ROWS SCHEMA SCHEMAS SELECT SEMI SEQUENCEFILE SERDE SERDEPROP
ERTIES SET SHARED SHOW SHOW_DATABASE SMALLINT SORT SORTED SSL STATISTICS STORED STREAMTABLE
STRING STRUCT TABLE TABLES TABLESAMPLE TBLPROPERTIES TEMPORARY TERMINATED TEXTFILE THEN TIM
ESTAMP TINYINT TO TOUCH TRANSFORM TRIGGER TRUE UNARCHIVE UNBOUNDED UNDO UNION UNIONTYPE UNI
QUEJOIN UNLOCK UNSIGNED UPDATE USE USING UTC UTC_TMESTAMP VIEW WHEN WHERE WHILE
If t he flag is not added, t he read dat a is implicit ly convert ed int o t he original dat a t ype for all
comput at ions.
If t he flag is not added for int eger const ant s, t he BIGINT t ype is used, and an error message is
ret urned.
If you writ e dat a t o a t able and t he dat a is in passt hrough mode, you can choose not t o add t he new
dat a t ype flag. However, if you want t o calculat e t he dat a, an error is ret urned because t he implicit
dat a t ype conversion is invalid.
7.MaxCompute Tunnel
7.1. Overview
MaxComput e provides t wo t ypes of channels for dat a uploads and downloads:
Dat aHub : T his channel is used t o upload or download dat a in real t ime. It includes t he OGG, Flume,
Logst ash, and Fluent d plug-ins.
T unnel: T his channel is used t o upload or download large amount s of dat a at a t ime. It includes t he
MaxComput e client , Dat aWorks, DT S, Sqoop, Ket t le plug-in, and MaxComput e Migrat ion Assist (MMA).
Dat aHub and T unnel provide t heir own SDKs. T he dat a upload and download t ools derived from t hese
SDKs meet t he requirement s of t he most common scenarios in which dat a is migrat ed t o t he cloud. T he
t ools also enable you t o upload or download dat a in a variet y of ot her scenarios.
Limits
Limit s on T unnel-based dat a uploads:
You cannot run T unnel commands t o upload or download dat a of t he ARRAY, MAP, or ST RUCT
t ypes.
No limit s are specified for t he upload speed. T he upload speed depends on t he net work
bandwidt h and server performance.
T he number of ret ries is limit ed. If t he number of ret ries exceeds t he limit , t he next block is
uploaded. Aft er dat a is uploaded, you can execut e t he select count (*) from t able_name
st at ement t o check whet her any dat a is lost .
By default , a project support s a maximum of 2,000 concurrent T unnel connect ions.
On t he server, t he lifecycle of a session is 24 hours. A session can be shared among processes and
t hreads on t he server, but you must make sure t hat each block ID is unique.
MaxComput e ensures t he validit y of concurrent writ es based on at omicit y, consist ency, isolat ion,
durabilit y (ACID).
Dat aHub and T unnel use different endpoint s in different net work environment s. You must also select
different endpoint s when connect ing t o t he service.
Use Dat aWorks for offline bat ch synchronizat ion. Dat aWorks support s a wide range of dat abase
t ypes, including MySQL, SQL Server, and Post greSQL.
Use t he OGG plug-in for real-t ime synchronizat ion of dat a in an Oracle dat abase.
Use DT S for real-t ime synchronizat ion of dat a in an ApsaraDB for RDS dat abase.
Log collection
You can use t ools such as Flume, Fluent d, and Logst ash t o collect logs.
Dat a Int egrat ion of Dat aWorks is a st able, efficient , and scalable dat a synchronizat ion plat form
provided by Alibaba Cloud. It is designed t o provide full offline and increment al real-t ime dat a
synchronizat ion, int egrat ion, and exchange services for t he het erogeneous dat a st orage syst ems on
Alibaba Cloud.
Dat a synchronizat ion t asks support t he following dat a source t ypes: MaxComput e, ApsaraDB for RDS
(MySQL, SQL Server, and Post greSQL), Oracle, FT P, Analyt icDB (ADS), OSS, ApsaraDB for Memcache,
and DRDS.
Based on t he bat ch dat a t unnel SDK, t he client provides built -in T unnel commands for dat a upload
and download.
DT S (T unnel)
Dat a T ransmission (DT S) is an Alibaba Cloud dat a service t hat support s dat a exchange bet ween
mult iple dat a sources, such as Relat ional Dat abase Management Syst em (RDBMS), NoSQL, and Online
Analyt ical Processing (OLAP) dat abases. It provides dat a t ransmission feat ures, such as dat a
migrat ion, real-t ime dat a subscript ion, and real-t ime dat a synchronizat ion.
DT S support s dat a synchronizat ion from ApsaraDB for RDS and MySQL inst ances t o MaxComput e
t ables. Ot her dat a source t ypes are not support ed.
O pen-source products
T he project s corresponding t o each product are open-sourced. You can visit aliyun-maxcomput e-dat a-
collect ors t o view det ails.
Sqoop (T unnel)
Sqoop 1.4.6 on t he communit y is furt her developed t o provide enhanced MaxComput e support . It
can import dat a from relat ional dat abases such as MySQL and dat a from HDFS or Hive t o
MaxComput e t ables. It can also export dat a from MaxComput e t ables t o relat ional dat abases such
as MySQL.
Ket t le (T unnel)
Ket t le is an open-source ET L t ool t hat is developed in Java. It can run on Windows, Unix, or Linux. It
provides graphic int erfaces for you t o define dat a t ransmission t opology by using drag-and-drop
component s.
T he Dat aHub plug-in of OGG allows you t o increment ally synchronize dat a in t he Oracle dat abase t o
Dat aHub in real t ime and archive t he dat a in MaxComput e t ables.
7.5.1. Overview
Dat a upload and download t ools provided by MaxComput e are compiled based on t he T unnel SDK. T his
t opic describes t he major APIs of t he T unnel SDK.
T he usage of t he SDK varies according t o t he version. For specific informat ion, see SDK Java Doc.
Major APIs
API Description
Not e T he t unnel endpoint support s aut omat ic rout ing based on t he MaxComput e endpoint
set t ings.
7.5.2. TableTunnel
T his t opic describes t he T ableT unnel API.
Definition
Definit ion:
Descript ion:
Lif ecycle : t he durat ion from t he creat ion of t he T ableT unnel inst ance t o t he end of t he program.
T ableT unnel provides a met hod t o creat e UploadSession and DownloadSession object s.
T ableT unnel.UploadSession is used t o upload dat a, and T ableT unnel.DownloadSession is used t o
download dat a.
A session refers t o t he process of uploading or downloading a t able or part it ion. A session consist s
of one or more HT T P request s t o T unnel REST ful APIs.
Upload sessions of T ableT unnel use t he INSERT INT O semant ics. Mult iple upload sessions of t he same
t able or part it ion does not affect each ot her, and t he dat a uploaded in each session is st ored in an
independent direct ory.
In an upload session, each RecordWrit er is mat ched wit h an HT T P request and is ident ified by a unique
block ID. T he block ID is t he name of t he file corresponding t o t he RecordWrit er.
If you use t he same block ID t o enable a RecordWrit er mult iple t imes in t he same session, t he dat a
uploaded by t he RecordWrit er t hat calls t he close() funct ion last will overwrit e all previous dat a. T his
feat ure can be used t o ret ransmit dat a of a block when dat a upload fails.
API limits
T he value of a block ID must be great er t han or equal t o 0 and less t han 20000. T he size of dat a t o
be uploaded in a block cannot exceed 100 GB.
A session is uniquely ident ified by it s session ID. T he lifecycle of a session is 24 hours. If your session
t imes out due t o t he t ransfer of large volumes of dat a, you must t ransfer your dat a in mult iple
sessions.
T he lifecycle of an HT T P request corresponding t o a RecordWrit er is 120 seconds. If no dat a flows
over an HT T P connect ion wit hin 120 seconds, t he server closes t he connect ion.
Not e HT T P has an 8 KB buffer. When you call t he RecordWrit er.writ e() funct ion, your dat a
may be saved t o t he buffer and no inbound t raffic flows over t he corresponding HT T P
connect ion. In t his case, you can call t he T unnelRecordWrit er.flush() funct ion t o forcibly flush
dat a from t he buffer.
When you use a RecordWrit er t o writ e logs t o MaxComput e, t he RecordWrit er may t ime out due t o
unexpect ed t raffic fluct uat ions. T herefore, we recommend t hat you:
Do not use a RecordWrit er for each dat a record. Ot herwise, a large number of small files are
generat ed, because each RecordWrit er corresponds t o a file. T his affect s t he performance of
MaxComput e.
Do not use a RecordWrit er t o writ e dat a unt il t he size of cached code reaches 64 MB.
T he lifecycle of a RecordReader is 300 seconds.