0% found this document useful (0 votes)
33 views

MaxComputeSQL Only Modules

MaxCompute SQL can be used for offline and online computing scenarios. It uses SQL syntax but is not a database and lacks some database features. Data types in MaxCompute SQL support explicit and implicit conversions between different types according to predefined rules.

Uploaded by

Hanifah Busainah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

MaxComputeSQL Only Modules

MaxCompute SQL can be used for offline and online computing scenarios. It uses SQL syntax but is not a database and lacks some database features. Data types in MaxCompute SQL support explicit and implicit conversions between different types according to predefined rules.

Uploaded by

Hanifah Busainah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 256

MaxComput e User Guide· MaxComput e SQL

6.MaxCompute SQL
6.1. Overview
6.1.1. Scenarios
T his t opic describes t he scenarios of MaxComput e SQL.

MaxComput e SQL offline comput ing is applicable t o scenarios where large volumes of dat a (t erabyt es)
need t o be processed, but do not have high real-t ime requirement s. In such scenarios, it t akes a
relat ively long t ime t o prepare and submit each job. MaxComput e SQL is not well-suit ed for businesses
t hat require t o process t housands of t ransact ions per second. MaxComput e SQL online comput ing
provides near real-t ime (NRT ) processing capabilit ies.

MaxComput e SQL uses t he synt ax t hat is similar t o SQL synt ax. It can be considered as a subset of
st andard SQL. However, MaxComput e SQL is not equivalent t o a dat abase. It does not have common
dat abase charact erist ics, such as t ransact ions, primary key const rains, and indexes. T he maximum lengt h
of SQL st at ement s current ly support ed by MaxComput e is 2 MB.

6.1.2. Reserved words


Keywords of SQL st at ement s are reserved words in MaxComput e. Do not use reserved words t o name
t ables, columns, or part it ions. Ot herwise, an error is ret urned. Reserved words are case-insensit ive.

Common reserved words are list ed as follows. For a complet e list of reserved words, see Reserved
words.

% & && ( ) * + - . / ; < <= <>


= > >= ? ADD ALL ALTER
AND AS ASC BETWEEN BIGINT BOOLEAN BY
CASE CAST COLUMN COMMENT CREATE DESC DISTINCT
DISTRIBUTE DOUBLE DROP ELSE FALSE FROM FULL
GROUP IF IN INSERT INTO IS JOIN
LEFT LIFECYCLE LIKE LIMIT MAPJOIN NOT NULL
ON OR ORDER OUTER OVERWRITE PARTITION RENAME
REPLACE RIGHT RLIKE SELECT SORT STRING TABLE
THEN TOUCH TRUE UNION VIEW WHEN WHERE

6.1.3. Partitioned table


Part it ion columns provide many benefit s, such as higher SQL operat ing efficiency and lower cost s.
However, t oo many part it ions can cause problems. Using part it ion columns as filt ering condit ions in
WHERE clauses of SELECT st at ement s can bring great er benefit s. Some SQL part it ion st at ement s run
inefficient ly. For example, a st at ement fails when a large volume of dat a (more t han 2,048 MB) is
generat ed in dynamic part it ions in a single MaxComput e inst ance.

It is easy t o underest imat e t he number of part it ions generat ed when mult i-level part it ions are used.
When a huge number of part it ions are generat ed, you must evaluat e t he original dat a t o det ermine if
t here are excessive part it ions.

> Document Version: 20220928 98


User Guide· MaxComput e SQL MaxComput e

You can creat e up t o six levels of part it ions. For some MaxComput e commands, t he synt ax differs
bet ween part it ioned and non-part it ioned t ables. For more informat ion, see DDL st at ement s and DML
st at ement s.
For more informat ion about t he t able creat ion st at ement , see Creat e a t able.

6.1.4. Type conversion


6.1.4.1. Explicit type conversion
Explicit conversion uses CAST t o convert a value t ype t o anot her one. T his t opic describes explicit t ype
conversion.
T he following t able list s explicit t ype conversions support ed by MaxComput e SQL.

Explicit type conversion

From/T o Bigint Double String Datetime Boolean Decimal

Bigint – Y Y N N Y

Do uble Y – Y N N Y

St ring Y Y – Y N Y

Dat et ime N N Y – N N

Bo o lean N N N N – N

Decimal Y Y Y N N –

Y indicat es t hat t he t ype can be convert ed. N indicat es t hat t he t ype cannot be convert ed.

99 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e
When double t ype values are convert ed t o bigint , t he fract ional is t runcat ed. For example,
cast (1.6 as bigint ) = 1.
When a st ring t hat meet s double t ype requirement s is convert ed t o bigint , t he st ring is first
convert ed t o t he double t ype before it is convert ed t o t he bigint t ype. Hence, t he fract ional
is t runcat ed. For example, cast ("1.6" as bigint ) = 1.
When a st ring t hat meet s bigint t ype requirement s is convert ed t o t he double t ype, one
decimal is ret ained. For example, cast ("1" as double) = 1.0.
T o convert a const ant st ring t o t he decimal t ype, enclose t he const ant st ring wit hin a pair
of quot at ion marks. If t he value is not enclosed in quot at ion marks, it is t reat ed as a double
t ype value. For example, cast ("1.234567890123456789" as decimal).
Unsupport ed explicit t ype conversion operat ions cause an except ion.
If a conversion fails during execut ion, t he syst em ret urns an error and exit s.
T he dat et ime dat a conversion uses t he default format yyyy-mm-dd hh:mi:ss. For more
informat ion, see Convert dat a bet ween st ring and dat et ime t ypes.
Some t ypes cannot be explicit ly convert ed, but can be convert ed using built -in SQL
funct ions. For example, t he t o_char funct ion can be used t o convert boolean t ype values t o
t he st ring t ype. For more informat ion, see T O_CHAR. T he t o_dat e funct ion can be used t o
convert st ring t ype values t o t he dat et ime t ype. For more informat ion, see T O_DAT E.
For more informat ion about CAST , see CAST .
When t he values of t he decimal t ype are out of t he value range, t he cast st ring t o decimal
operat ion may ret urn an error, such as most significant bit overflow or least significant bit
overflow t runcat ion.

6.1.4.2. Implicit type conversion and its scope


Implicit t ype conversion is an aut omat ic t ype conversion performed by MaxComput e based on t he
cont ext and a predefined set of rules. T his t opic describes t he rules of implicit t ype conversion.

T he following t able list s implicit t ype conversion rules support ed by MaxComput e.

Implicit type conversion 1

From/T o BOOLEAN T INYINT SMALLINT INT BIGINT FLOAT

BOOLEAN T F F F F F

T INY INT F T T T T T

SMALLINT F F T T T T

INT F F F T T T

BIGINT F F F F T T

FLOAT F F F F F T

DOUBLE F F F F F F

> Document Version: 20220928 100


User Guide· MaxComput e SQL MaxComput e

From/T o BOOLEAN T INYINT SMALLINT INT BIGINT FLOAT

DECIMAL F F F F F F

ST RING F F F F F F

V ARCHAR F F F F F F

T IMEST AMP F F F F F F

BINARY F F F F F F

Implicit type conversion 2

From/T o DOUBLE DECIMAL ST RING VARCHAR T IMEST AMP BINARY

BOOLEAN F F F F F F

T INY INT T T T T F F

SMALLINT T T T T F F

INT T T T T F F

BIGINT T T T T F F

FLOAT T T T T F F

DOUBLE T T T T F F

DECIMAL F T T T F F

ST RING T T T T F F

V ARCHAR T T T T F F

T IMEST AMP F F T T T F

BINARY F F F F F T

T indicat es t hat t he t ype conversion can be performed, while F indicat es t hat t he t ype conversion
cannot be performed.

101 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e
An unsupport ed implicit t ype conversion will cause an except ion.
If t he conversion fails, an error is ret urned.
Implicit t ype conversion is aut omat ically performed by MaxComput e based on cont ext . If t he
t ypes do not mat ch, we recommend t hat you perform explicit t ype conversion using cast .
T he rules of implicit t ype conversion are applied t o different specific scopes. In cert ain
scenarios, only part of t he rules will t ake effect . For more informat ion, see t he scope of
implicit t ype conversions.

Implicit t ype conversion wit h relat ional operat ors

Relat ional operat ors include equal t o (=), not equal t o (<>), less t han (<), less t han or equal t o (<=),
great er t han (>), great er t han or equal t o (>=), IS NULL, IS NOT NULL, LIKE, RLIKE, and IN. T he implicit
conversion rules of LIKE, RLIKE, and IN are different from t hose of t he ot her relat ional operat ors. T hese
t hree operat ors are described in a separat e sect ion. T he rules described in t his sect ion do not apply t o
t hese t hree operat ors. T he following t able list s implicit conversion rules when different t ypes of dat a
are involved in relat ional calculat ions.

Implicit type conversion with relational operators

From/T o BIGINT DOUBLE ST RING DAT ET IME BOOLEAN DECIMAL

BIGINT – DOUBLE DOUBLE N N DECIMAL

DOUBLE DOUBLE – DOUBLE N N DECIMAL

ST RING DOUBLE DOUBLE – DAT ET IME N DECIMAL

DAT ET IME N N DAT ET IME – N N

BOOLEAN N N N N – N

DECIMAL DECIMAL DECIMAL DECIMAL N N –

Not e
If implicit t ype conversion is not support ed bet ween t wo values t o be compared, t he
relat ional operat ion cannot be complet ed and an error is ret urned.
For more informat ion about relat ional operat ors, see Relat ional operat ors.

Implicit conversion wit h special relat ional operat ors

Special relat ional operat ors are LIKE, RLIKE, and IN.

LIKE and RLIKE are used as follows:

source like pattern;


source rlike pattern;

Not e t he following point s for t he t wo relat ional operat ors in implicit t ype conversion:

> Document Version: 20220928 102


User Guide· MaxComput e SQL MaxComput e

T he source and pat t ern paramet ers of LIKE and RLIKE must be of t he st ring t ype.
Ot her t ypes are not support ed by t his operat ion and cannot be implicit ly convert ed t o t he ST RING
t ype.
If t he value of source or pat t ern is NULL, t he operat ion ret urns NULL.
IN is used as follows:

key in (value1, value2,...)

T he implicit conversion rules of IN are as follows:

T he dat a t ypes in t he value list specified by IN must be consist ent .


If keys and values are compared, t he BIGINT , DOUBLE, and ST RING t ypes compared are convert ed t o
DOUBLE, whereas t he DAT ET IME and ST RING t ypes compared are convert ed t o DAT ET IME. Conversion
bet ween ot her t ypes is not allowed.

Not e t he following point s for t he IN operat or:

T he memory used by t he compiler increases wit h t he number of paramet ers used by t he IN operat ion.
An IN operat ion wit h 5,000 paramet ers consumes 17 GB of memory wit h t he GCC compiler. We
recommend t hat you limit t he number of paramet ers t o around 1,024. In t his case, memory consumpt ion
will peak at 1 GB and compilat ion will only t ake 39 seconds.
Implicit t ype conversion wit h arit hmet ic operat ors

Arit hmet ic operat ors include plus (+), minus (-), mult iplier (*), divider (/), and percent (%). T he implicit
conversion rules are as follows:

Only t he ST RING, BIGINT , DECIMAL, and DOUBLE t ypes can be used in arit hmet ic operat ions.
Before an arit hmet ic operat ion, ST RING values are implicit ly convert ed t o DOUBLE values.
When an arit hmet ic operat ion involves values of bot h t he BIGINT and DOUBLE t ypes, BIGINT values are
implicit ly convert ed t o DOUBLE values.
T he DAT ET IME and BOOLEAN t ypes cannot be used in arit hmet ic operat ions.

Not e For more informat ion about arit hmet ic operat ors, see Arit hmet ic operat ors.

Implicit conversion wit h logical operat ors


Logical operat ors include AND, OR, and NOT . T he implicit conversion rules are as follows:

Only t he BOOLEAN t ype can be used in logic operat ions.


T he ot her t ypes are not support ed by logical operat ions or implicit t ype conversions.

Not e For more informat ion about logical operat ors, see Logical operat ors.

6.1.4.3. SQL built-in functions


MaxComput e SQL provides a variet y of syst em funct ions, which can be used t o calculat e one or more
columns of any row and out put any t ype of dat a.
T he implicit conversion rules as follows:

In a call of a funct ion, if t he dat a t ype of an input paramet er is not consist ent wit h t he dat a t ype
defined in t he funct ion, t he dat a t ype of t he input paramet er is convert ed t o t he funct ion-defined

103 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

dat a t ype.
T he paramet ers of each built -in SQL funct ion on MaxComput e can have different requirement s for
implicit t ype conversion. For more informat ion, see Built -in funct ions.

6.1.4.4. CASE WHEN


T his t opic describes t he implicit conversion rules of CASE WHEN.

T he implicit conversion rules of case when are as follows:

If t he ret urned dat a t ypes are only bigint and double, t hey are convert ed t o t he double t ype.
If dat a of t he st ring t ype is also ret urned, all dat a t ypes are convert ed t o st ring. If a dat a t ype
cannot be convert ed t o st ring (for example, boolean), an error is ret urned.
Conversion bet ween ot her t ypes is not allowed.

6.1.4.5. Partition column


MaxComput e SQL support s part it ioned t ables. For t he definit ion of part it ioned t ables, see DDL
st at ement s and DML st at ement s. MaxComput e support s part it ions of t he following t ypes: t inyint ,
smallint , int , bigint , varchar, and st ring.

6.1.4.6. UNION ALL


T he dat a t ype, number of column, and column names involved in UNION ALL operat ion must all be
consist ent . Ot herwise, an error is ret urned.

6.1.4.7. Conversion between string and datetime types


MaxComput e support s conversion bet ween st ring and dat et ime t ypes.

T he format used in conversion is yyyy-mm-dd hh:mi:ss.ff3.

Value ranges of units

Unit String (case-insensitive) Value range

Y ear yyyy 0001–9999

Mo nt h mm 01–12

Day dd 01–28,29,30,31

Ho ur hh 00–23

Minut e mi 00–59

Seco nd ss 00–59

ms ff3 00–999

> Document Version: 20220928 104


User Guide· MaxComput e SQL MaxComput e

Not e
Leading zeros cannot be omit t ed. For example, 2017-1-9 12:12:12 is an invalid st ring and
cannot be convert ed int o dat et ime. It must be writ t en as 2017-01-09 12:12:12.
Only st rings t hat meet t he preceding format requirement s can be convert ed int o dat et ime.
For example, cast ("2017-12-31 02:34:34" as dat et ime) convert s t he "2017-12-31 02:34:34"
st ring int o dat et ime. Similarly, when dat et ime is convert ed int o st rings, t he default
conversion format is yyyy-mm-dd hh:mi:ss. If you at t empt t o convert t he following
examples (or similar st rings), t he operat ion will fail and cause an except ion.

cast("2017/12/31 02/34/34" as datetime)


cast("20171231023434" as datetime)
cast("2017-12-31 2:34:34" as datetime)

MaxCompput e provides t he t o_dat e funct ion, which convert s a st ring t ype t hat does not meet t he
dat et ime format int o dat et ime t ype. For more informat ion, see T O_DAT E.

6.2. Operators
6.2.1. Relational operators
T his t opic describes relat ional operat ors in MaxComput e SQL operat ors.

Relational operators

Operator Description

If A or B is NULL, NULL is returned. If A is equal to B, T RUE is returned. Otherwise, FALSE is


A= B
returned.

If A or B is NULL, NULL is returned. If A is not equal to B, T RUE is returned. Otherwise,


A< > B
FALSE is returned.

If A or B is NULL, NULL is returned. If A is less than B, T RUE is returned. Otherwise, FALSE is


A< B
returned.

If A or B is NULL, NULL is returned. If A is less or equal to B, T RUE is returned. Otherwise,


A< = B
FALSE is returned.

If A or B is NULL, NULL is returned. If A is greater than B, T RUE is returned. Otherwise,


A> B
FALSE is returned.

If A or B is NULL, NULL is returned. If A is greater than or equal to B, T RUE is returned.


A> = B
Otherwise, FALSE is returned.

A IS NULL If A is NULL, T RUE is returned. Otherwise, FALSE is returned.

A IS NOT NULL If A is not NULL, T RUE is returned. Otherwise, FALSE is returned.

105 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Operator Description

If A or B is NULL, NULL is returned. A is a string and B is the pattern to be matched. If A


matches B, T RUE is returned. Otherwise, FALSE is returned. T he percent sign (%) is a
wildcard character that matches an arbitrary number of characters. T he underscore (_) is
a wildcard character that matches a single character. T o use these two characters as
ordinary characters, use backslashes to escape them: \% and \_.

'aaa'like 'a ' = T RUE'aaa'

A LIKE B like'a%' = T RUE'aaa'like

'aab' = FALSE'a%b'like

'a\%b' = T RUE'axb'like

'a\%b' = FALSE

If A or B is NULL, NULL is returned. A is a string and B is a string constant regular


A RLIKE B expression. If A matches B, T RUE is returned. Otherwise, FALSE is returned. If B is NULL,
the system returns an error and exits.

B is a set. If A is NULL, NULL is returned. If A is in B, T RUE is returned. Otherwise, FALSE is


returned. If B contains only one element NULL, that is, A IN (NULL), NULL is returned. If B
A IN B contains NULL, the type of NULL is considered the same as the other elements in B. B
must be a constant and have at least one element. All elements must be of the same
type.

Double t ype values have variable precision. We recommend t hat you do not use t he equal sign (=) t o
compare t wo double t ype values. You can subt ract bet ween t wo values of t he double t ype, and t hen
t ake t he absolut e value of t he result for comparison. When t he absolut e value is negligible, t he t wo
values of t he double t ype are considered equal. For example:

abs(0.9999999999 - 1.0000000000) < 0.000000001


-- 0.9999999999 and 1.0000000000 have 10 decimal digits, while 0.000000001 has 9 decimal di
gits.
-- 0.9999999999 is considered equal to 1.0000000000.

Not e
ABS is a built -in funct ion provided by MaxComput e t o t ake t he absolut e value of it s input .
For more informat ion, see ABS.
A value of t he double t ype in MaxComput e can ret ain 16 valid digit s.

6.2.2. Arithmetic operators


T his t opic describes arit hmet ic operat ors in MaxComput e SQL operat ors.

Arithmetic operators

> Document Version: 20220928 106


User Guide· MaxComput e SQL MaxComput e

Operator Description

A + B If A or B is NULL, NULL is returned. Otherwise, the result of A + B is returned.

A – B If A or B is NULL, NULL is returned. Otherwise, the result of A - B is returned.

A * B If A or B is NULL, NULL is returned. Otherwise, the result of A * B is returned.

If A or B is NULL, NULL is returned. Otherwise, the result of A / B is returned. If both A and B are
A / B
of the bigint type, the result is of the double type.

A % B If A or B is NULL, NULL is returned. Otherwise, the result of A % B is returned.

+A A is returned.

–A If A is NULL, NULL is returned. Otherwise, –A is returned.

Not e
Only values of t he st ring, bigint , double, and decimal t ypes can be used in arit hmet ic
operat ions. Values of t he dat at ime and boolean t ypes are not allowed in t hese operat ions.
Before t he operat ion, values of t he st ring t ype are convert ed t o t he double t ype by implicit
t ype conversion.
When values of t he bigint and double t ypes are involved in an operat ion, values of t he
bigint t ype are convert ed t o t he double t ype by implicit t ype conversion first . T he ret urned
result is a value of t he double t ype.
When bot h A and B are of t he bigint t ype, t he ret urned result of A / B is a value of t he
double t ype. T he ret urned result s of t he ot her arit hmet ic operat ions are values of t he bigint
t ype.

6.2.3. Bitwise operators


T his t opic describes bit wise operat ors in MaxComput e SQL operat ors.

Bitwise operators

Operator Description

Returns the bitwise AND result of A and B. For example, 1 & 2 returns 0 and 1 & 3 returns 1. T he
A & B bitwise AND result of NULL in combination with another value is always NULL. A and B must be
of the bigint type.

Returns the bitwise OR result of A and B. For example, 1 | 2 returns 3 and 1 | 3 returns 3. T he
A | B bitwise OR result of NULL in combination with another value is always NULL. A and B must be of
the bigint type.

Not ice Bit wise operat ors only support bigint t ype dat a and do not support implicit t ype
conversion.

107 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.2.4. Logical operators


T his t opic describes logical operat ors in MaxComput e SQL operat ors.

Logical operators

Operator Description

T RUE and T RUE = T RUE

T RUE and FALSE = FALSE

FALSE and T RUE = FALSE

FALSE and NULL = FALSE

A and B FALSE and FALSE = FALSE

NULL and FALSE = FALSE

T RUE and NULL = NULL

NULL and T RUE = NULL

NULL and NULL = NULL

T RUE or T RUE = T RUE

T RUE or FALSE = T RUE

FALSE or T RUE = T RUE

FALSE or NULL = NULL


A or B
NULL or FALSE = NULL

T RUE or NULL = T RUE

NULL or T RUE = T RUE

NULL or NULL = NULL

If expression A is NULL, NULL is returned.

NOT A If expression A is T RUE, FALSE is returned.

If expression A is FALSE, T RUE is returned.

Not e Only dat a of t he boolean t ype can be involved in logic operat ions. T hese operat ions
do not support implicit t ype conversion.

6.3. DDL statements


> Document Version: 20220928 108
User Guide· MaxComput e SQL MaxComput e

6.3. DDL statements


6.3.1. Table operations
6.3.1.1. Create a table (CREATE TABLE)
T his t opic describes how t o execut e a DDL st at ement t o creat e a t able.
Synt ax

create table [if not exists] table_name


[(col_name data_type [DEFAULT value] [comment col_comment], ...)]
[comment table_comment]
[partitioned by (col_name data_type [comment col_comment], ...)]
[STORED AS AliOrc]-- Specify the storage format of the table. You can specify AliORC only f
or newly created internal tables.
[lifecycle days]
[as select_statement]
create table [if not exists] table_name like existing_table_name

109 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e
T able and column names are not case-sensit ive.
If you do not specify t he IF NOT EXIST S opt ion and anot her t able wit h t he same name exist s,
an error is ret urned. If you specify t his opt ion, a message t hat indicat es t he operat ion
succeeded is ret urned. T he message is ret urned regardless of whet her an exist ing t able wit h
t he same name exist s. T he message is ret urned even if t he schema of t he exist ing t able is
different from t hat of t he t able you want t o creat e. In addit ion, t he met adat a of t he
exist ing t able does not change.
A t able can cont ain a maximum of 1,200 column definit ions.
Support ed dat a t ypes are BIGINT , DOUBLE, BOOLEAN, DAT ET IME, DECIMAL, ST RING, ARRAY
<T >, and MAP <T 1, T 2>.

Not e If you need t o use t he following newly support ed dat a t ypes: T INYINT ,
SMALLINT , INT , FLOAT , VARCHAR, T IMEST AMP, or BINARY, you must add t he set odps.s
ql.type.system.odps2=true; flag before t he CREAT E T ABLE st at ement . T hen, commit
t hem for execut ion.

MaxComput e allows you t o specify t he default value of a column by using DEFAULT value. If
t he value of a column is not specified in an INSERT operat ion, t he default value is used for
t his column.
A t able or column name cannot cont ain special charact ers. It can cont ain only lowercase
let t ers, uppercase let t ers, digit s, or underscores (_). A name must st art wit h a let t er and can
be up t o 128 byt es in lengt h.
T he part it ioned by opt ion specifies t he part it ion field. T he value can only be a st ring. T he
name of a part it ion key column cannot cont ain double-byt e charact ers. It must st art wit h a
let t er, eit her in lowercase or uppercase, followed by let t ers or digit s. T he name can be up t o
128 byt es in lengt h. T he name can cont ain t he following special charact ers: ! _ : $ . # @ and
spaces. Ot her charact ers, such as \t , \n, and /, are considered undefined charact ers. Aft er
you use part it ion fields t o define part it ions for a t able, a full t able scan is no longer
t riggered when you add part it ions, updat e part it ion dat a, or read part it ion dat a. T his
improves processing efficiency.
A comment is a valid st ring t hat can be up t o 1,024 byt es in lengt h.
T he lifecycle opt ion indicat es t he lifecycle of t he t able in days. T he CREAT E T ABLE LIKE
st at ement does not replicat e t he lifecycle at t ribut e from t he source t able.
T heoret ically, a source t able can have up t o six levels of part it ions. Use as few part it ions as
possible t o avoid ext reme t able expansion of st orage.
You can configure t he maximum number of t able part it ions for a project . T he default
maximum number is 60,000.
ST ORED AS specifies t he st orage format of t he t able. T he default value is CFile2. AliORC in
C++ is now available. It is developed by t he MaxComput e st orage t eam. AliORC is fully
compat ible wit h t he open source Opt imized Row Columnar (ORC). Compared wit h CFile2,
AliORC frees up more t han 10% of ext ra st orage space and improves read performance by
more t han 20%.

Examples

> Document Version: 20220928 110


User Guide· MaxComput e SQL MaxComput e

T he following example describes how t o creat e a t able named sale_det ail t o st ore sales records. T he
sale_dat e and region columns of t he t able are used as part it ion key columns.

create table if not exists sale_detail( shop_name string,


customer_id string,
total_price double)
partitioned by (sale_date string,region string);
-- Create a partitioned table named sale_detail.

Use t he following create table…as select... st at ement t o creat e a t able and replicat e dat a t o it :
create table sale_detail_ctas1 as select * from sale_detail;

Not e If t he sale_det ail t able cont ains dat a, all t he dat a is replicat ed t o sale_det ail_ct as1.
T he sale_det ail t able is a part it ioned t able. However, t he t able creat ed by t he create table...as
select... st at ement does not replicat e t he part it ion at t ribut e of sale_det ail. Part it ion key
columns in sale_det ail become st andard columns in sale_det ail_ct as1. T herefore, sale_det ail_ct as1
is a non-part it ioned t able t hat has five columns.

In t he create table...as select... st at ement , if you use const ant s as column values in t he SELECT
clause, we recommend t hat you specify column aliases:

create table sale_detail_ctas2 as select shop_name,


customer_id, total_price,
'2017' as sale_date,
'China' as region from sale_detail;

Not e

If you do not specify column aliases, t he fourt h and fift h columns of sale_det ail_ct as3 creat ed in
t he following example are aut omat ically named _c3 and _c4.

create table sale_detail_ctas3 as select shop_name,


customer_id, total_price, '2017',
'China'
from sale_detail;

In t his case, t o reference sale_det ail_ct as3 again, you must enclose _c3 and _c4 in t wo pairs of
grave accent s ('). If you execut e t he select c3, _c4 from sale_det ail_ct as3 st at ement , an error is
ret urned. T he column name in a MaxComput e SQL st at ement cannot st art wit h underscores (_).
T herefore, grave accent s (') must be used. We recommend t hat you use aliases t o avoid t his issue.

select `_c3`, `_c4` from sale_detail_ctas3;

T o ensure t hat t he dest inat ion t able has t he same schema as t he source t able, use t he following
create table...like st at ement :

create table sale_detail_like like sale_detail;

111 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e T he schema of sale_det ail_like is exact ly t he same as t hat of sale_det ail. Bot h t ables
have t he same at t ribut es, such as column names, column comment s, and t able comment s, except
for t he lifecycle. However, dat a in sale_det ail is not replicat ed t o sale_det ail_like.

MaxComput e allows you t o execut e t he DESC st at ement t o view t able informat ion.

desc <table_name>;
desc extended <table_name>;-- View table information and extended information.

MaxComput e allows you t o use t he SHOW CREAT E T ABLE st at ement t o generat e a DDL st at ement for
t able creat ion. T his facilit at es t he SQL-based rebuild of t he t able schema.

SHOW CREATE TABLE <table_name>;

6.3.1.2. Delete a table


T his t opic describes how t o run a DDL st at ement t o delet e a t able.

Command synt ax :

drop table [if exists] table_name;

Not e If t he command is run wit hout t he IF EXIST S opt ion and t he t able does not exist , an
except ion is ret urned. Wit h t his opt ion, a success is ret urned regardless of whet her t he t able exist s.

Example :

create table sale_detail_drop like sale_detail; drop table sale_detail_drop;


-- If the table exists, a success is returned. If not, an exception is returned.
drop table if exists sale_detail_drop2;
-- A success is returned regardless of whether sale_detail_drop2 exists.

6.3.1.3. Rename a table


T his t opic describes how t o run a DDL st at ement t o rename a t able.

Command synt ax :

alter table table_name rename to new_table_name;

Not e
T he rename operat ion only changes t he t able name, not t he t able dat a.
If t he t able specified by new_t able_name already exist s, an error is ret urned.
If t he t able specified by t able_name does not exist , an error is ret urned.

Example :

> Document Version: 20220928 112


User Guide· MaxComput e SQL MaxComput e

create table sale_detail_rename1 like sale_detail;


alter table sale_detail_rename1 rename to sale_detail_rename2;

6.3.1.4. Modify the comment of a table


T his t opic describes how t o run a DDL st at ement t o modify t he comment of a t able.

Command synt ax :

alter table table_name set comment 'tbl comment';

Not e
t able_name must be an exist ing t able.
A comment can cont ain a maximum of 1,024 byt es.

Example :

alter table sale_detail set comment 'new coments for table sale_detail';

You can run t he desc command t o view t he modified comment in t he t able. For more informat ion, see
Obt ain t able informat ion.

6.3.1.5. Modify the lifecycle of a table


MaxComput e provides t he lifecycle management funct ion t o release st orage space and simplify t he
dat a clearance process. T his t opic describes how t o run a DDL st at ement t o modify t he lifecycle of a
t able.

Command synt ax :

alter table table_name set lifecycle days;

113 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e
T he days paramet er indicat es t he lifecycle of a t able. Unit : days. It must be a posit ive
int eger.
If t he t able specified by t able_name is a non-part it ioned t able, and is not modified in t he
period specified by t he days paramet er since t he last modificat ion dat e, MaxComput e
aut omat ically clears t he t able (similar t o t he DROP T ABLE operat ion). In MaxComput e, t he
Last Dat aModifiedT ime value of a t able is updat ed each t ime dat a in t he t able is modified.
MaxComput e det ermines whet her t o clear a t able based on it s Last Dat aModifiedT ime and
lifecycle set t ings.
If t he t able specified by t able_name is a part it ioned t able, MaxComput e det ermines whet her
t o clear each part it ion based on t he Last Dat aModifiedT ime value. Unlike non-part it ioned
t ables, a part it ioned t able is not delet ed aft er t he last part it ion is reclaimed.
You can configure a lifecycle for t ables, but not for part it ions.
You can specify a lifecycle when creat ing a t able.

Example :

create table test_lifecycle(key string) lifecycle 100;


-- Create a table named test_lifecycle with a lifecycle of 100 days.
alter table test_lifecycle set lifecycle 50;
-- Change the lifecycle of the test_lifecycle table to 50 days.

6.3.1.6. Disable or restore the lifecycle feature


In some cases, if you do not want some part it ions t o be aut omat ically reclaimed based on t he lifecycle
feat ure, you can disable t he lifecycle feat ure for t hese part it ions. T his t opic describes how t o execut e
DDL st at ement s t o disable or rest ore t he lifecycle feat ure.

Synt ax

ALTER TABLE table_name partition[partition_spec] ENABLE|DISABLE LIFECYCLE;

> Document Version: 20220928 114


User Guide· MaxComput e SQL MaxComput e

Not e
T ABLE DISABLE LIFECYCLE
It prevent s t he reclamat ion of a t able and it s part it ions based on t he lifecycle
feat ure. T his opt ion has a higher priorit y t han part it ion_spec enable lifecycle.
T he lifecycle set t ings and t he part it ion_spec enable/disable flag of a t able are
ret ained.
You can st ill modify t he lifecycle set t ings of a t able and it s part it ions.

T ABLE ENABLE LIFECYCLE


Aft er t he lifecycle feat ure is enabled again, a t able and it s part it ions can be
reclaimed based on t he lifecycle feat ure. By default , t he lifecycle set t ings of t he
current t able and it s part it ions are used.
Before you rest ore t he lifecycle feat ure for a t able and it s part it ions, you can
configure new lifecycles for t he t able and it s part it ions. T his prevent s dat a from
being mist akenly reclaimed due t o t he use of t he previous set t ings.

Example

ALTER TABLE trans PARTITION(dt='20191111') DISABLE LIFECYCLE;

6.3.1.7. Modify the LastDataModifiedTime value of a


table
MaxComput e SQL support s t he T OUCH operat ion, which allows you t o modify t he
Last Dat aModifiedT ime value of a t able. T his operat ion changes t he Last Dat aModifiedT ime value of a
t able t o t he current t ime. T his t opic describes how t o run a DDL st at ement t o modify t he
Last Dat aModifiedT ime value of a t able.

Command synt ax :

alter table table_name touch;

Not e
If t he specified t able_name does not exist , an error is ret urned.
T his operat ion modifies t he Last Dat aModifiedT ime value of t he t able. In t his case,
MaxComput e considers a change t o t he t able dat a, and recalculat es t he lifecycle.

For more informat ion about how t o modify t he Last Dat aModifiedT ime value of a part it ion, see Modify
t he Last Dat aModifiedT ime value of a part it ion.

6.3.1.8. Clear data from a non-partitioned table


T his t opic describes how t o run a DDL st at ement t o clear dat a from a non-part it ioned t able.

Command synt ax :

115 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

TRUNCATE TABLE table_name;

Not e T his st at ement is used t o clear dat a from a specified non-part it ioned t able. T o clear
dat a from a part it ioned t able, run t he ALTER TABLE table_name DROP PARTITION
(partition_spec) st at ement .

6.3.1.9. Archive table data


T his t opic describes how t o run a DDL st at ement t o archive t he dat a of a t able.

If a project does not have enough space, you can use t he t able archiving feat ure in MaxComput e t o
compress dat a by about 50%. T he archiving feat ure uses a compression algorit hm wit h a higher
compression rat io. It saves dat a as redundant array of independent disks (RAID) files. Dat a is no longer
simply st ored in t hree copies. Inst ead, six copies and t hree check blocks are maint ained t o increase t he
effect ive st orage rat io from 1:3 t o 1:1.5. T he archive feat ure consumes only half of t he usual physical
space.

However, t his feat ure comes at a price. If a dat a block or machine is damaged, t he t ime required t o
rest ore t he dat a is longer, and t he read performance is affect ed. T herefore, t his feat ure is suit able for
compressing cold dat a for st orage. For example, you can st ore large volumes out dat ed log dat a as
RAID files for a long t ime.

Command synt ax :

ALTER TABLE [table_name] <PARTITION(partition_name='partition_value')> ARCHIVE;

Example :

alter table my_log partition(ds='20170101') archive;

Command out put :

Summary:
table name: test0128 /pt=a instance count: 1 run time: 21
before merge, file count: 1 file size: 456 file physical size: 1368
after merge, file count: 1 file size: 512 file physical size: 768

> Document Version: 20220928 116


User Guide· MaxComput e SQL MaxComput e

Not e
T he out put shows t he changes in logical size and physical size during t he archiving process. In t he
archiving process, mult iple small files are aut omat ically merged. Aft er t he archive operat ion is
complet e, you can run t he desc extended command t o check whet her t he dat a in t he part it ion
has been archived, and view t he physical space usage:

desc extended my_log partition(ds='20170101');


+------------------------------------------------------------------------------------+
PartitionSize: 512 |
+------------------------------------------------------------------------------------+
CreateTime: 2017-01-28 07:05:20 |
LastDDLTime: 2017-01-28 07:05:20 |
LastModifiedTime: 2017-01-28 07:05:21 |
+------------------------------------------------------------------------------------+

6.3.1.10. Forcibly delete data from a table (partition)


If you need t o forcibly and irrecoverably delet e dat a from a t able or part it ion t o immediat ely release
st orage space, you can perform t he delet ion operat ion wit h t he PURGE opt ion. T his t opic describes how
t o run a DDL st at ement t o forcibly delet e dat a from a t able (part it ion).

Command synt ax :

DROP TABLE tblname PURGE;


ALTER TABLE tblname DROP PARTITION(part_spec) PURGE;

Example :

drop table my_log purge;


alter table my_log drop partition (ds='20170618') purge;

6.3.2. View-based operation


6.3.2.1. Create a view
T his t opic describes how t o run a DDL st at ement t o creat e a view.

Command synt ax :

create [or replace] view [if not exists] view_name


[(col_name [comment col_comment], ...)]
[comment view_comment]
[as select_statement]

117 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e
T o creat e a view, you must have read permissions on t he t able referenced by t he view.
Views in MaxComput e are not mat erialized views. View operat ions involve accessing dat a of
referenced t ables. Not e t hat changes t o your permission on t he referenced t able can result
in changes t o your permission on t he view.
A view can cont ain only one valid SELECT st at ement .
A view can reference ot her views but cannot reference it self. Circular reference is not
support ed.
You cannot writ e dat a t o a view. For example, t he INSERT INT O and INSERT OVERWRIT E
operat ions do not work on views.
If t he t able referenced by a view changes, you may no longer be able t o access t he view. For
example, a view becomes inaccessible aft er t he t able it references is delet ed. You must
maint ain t he mappings bet ween referenced t ables and views properly.
If t he CREAT E VIEW st at ement is run wit hout t he IF NOT EXIST S opt ion and t he view already
exist s, an except ion is ret urned. In t his case, you can run t he CREAT E VIEW or REPLACE VIEW
st at ement t o recreat e a view. T he permissions on t he recreat ed view remain unchanged.

Example :

create view if not exists sale_detail_view


(store_name, customer_id, price, sale_date, region)
comment 'a view for table sale_detail'
as select * from sale_detail;

6.3.2.2. Delete a view


T his t opic describes how t o run a DDL st at ement t o delet e a view.

Command synt ax :

drop view [if exists] view_name;

Not e If t he command is run wit hout t he IF EXIST S opt ion and t he view does not exist , an error
is ret urned.

Example :

drop view if exists sale_detail_view;

6.3.2.3. Rename a view


T his t opic describes how t o run a DDL st at ement t o rename a view.

Command synt ax :

alter view view_name rename to new_view_name;

> Document Version: 20220928 118


User Guide· MaxComput e SQL MaxComput e

Not e If a view wit h t he same name already exist s, an error is ret urned.

Example :

create view if not exists sale_detail_view


(store_name, customer_id, price, sale_date, region)
comment 'a view for table sale_detail'
as select * from sale_detail;
alter view sale_detail_view rename to market;

6.3.3. Column and partition operations


6.3.3.1. Add a partition (ADD PARTITION)
T his t opic describes how t o use a DDL st at ement t o add a part it ion.

Synt ax

alter table table_name add [if not exists] partition partition_spec;-- Add a partition.
alter table table_name add [if not exists] partition partition_spec [PARTITION partition_sp
ec PARTITION partition_spec...];-- Add multiple partitions at a time.
partition_spec:(partition_col1 = partition_col_value1, partition_col2 = partiton_col_value2
, ...)

Not e
If you do not specify t he IF NOT EXIST S opt ion and anot her part it ion wit h t he same name
exist s, an error is ret urned.
A MaxComput e t able can cont ain a maximum of 60,000 part it ions.
T o add a part it ion t o a t able t hat has mult i-level part it ions, you must specify all part it ioning
column values.

Examples

T he following examples show how t o add part it ions t o t he sale_det ail t able:

alter table sale_detail add if not exists partition (sale_date='201712', region='hangzhou')


;
-- Add a partition to store the sales records of the China (Hangzhou) region for December 2
017.
alter table sale_detail add if not exists partition (sale_date='201712', region='shanghai')
;
-- Add a partition to store the sales records of the China (Shanghai) region for December 2
017.
alter table sale_detail add if not exists partition(sale_date='20171011');
-- Specify only the sale_date partition. An error is returned.
alter table sale_detail add if not exists artition(region='shanghai');
-- Specify only the region partition. An error is returned.

119 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.3.3.2. Delete a partition (DROP PARTITION)


T his t opic describes how t o use a DDL st at ement t o delet e a part it ion.

Synt ax

alter table table_name drop [if exists] PARTITION partition_spec; -- Delete a partition.
alter table table_name drop [if exists] PARTITION partition_spec,PARTITION partition_spec,[
PARTITION partition_spec....] ;-- Delete multiple partitions at a time.
partition_spec:: (partition_col1 = partition_col_value1, partition_col2 = partiton_col_valu
e2, ...)

Not e If you do not specify t he IF EXIST S opt ion and t he part it ion you want t o delet e does
not exist , an error is ret urned.

Example
Execut e t he following st at ement t o delet e a part it ion from t he sale_det ail t able:

alter table sale_detail drop partition(sale_date='201712',region='hangzhou');


-- The sales records of the China (Hangzhou) region for December 2017 are deleted.

6.3.3.3. Add a column


T his t opic describes how t o add a column by using a DDL st at ement .

Command synt ax :

alter table table_name add columns (col_name1 type1, col_name2 type2...)

Not e
A column can only be one of t he following t ypes: bigint , double, boolean, dat et ime,
decimal, st ring, t inyint , smallint , int , float , varchar, binary, t imest amp, array, map, or st ruct .
You can creat e up t o 1,200 columns in a single t able in MaxComput e.

6.3.3.4. Change a column name


T his t opic describes how t o run a DDL st at ement t o change a column name.

Command synt ax :

alter table table_name change column old_col_name rename to new_col_name;

> Document Version: 20220928 120


User Guide· MaxComput e SQL MaxComput e

Not e
You must specify an exist ing column for old_col_name.
You cannot name a column in t he t able new_col_name.

6.3.3.5. Modify the comment of a column or partition


T his t opic describes how t o run a DDL st at ement t o modify t he comment of a column or part it ion.

Command synt ax :

alter table table_name change column col_name comment 'comment';

Not e
T he comment cannot exceed 1,024 byt es.
T he dat a t ype and posit ion of a column cannot be changed.

6.3.3.6. Modify the LastDataModifiedTime value of a


partition
MaxComput e SQL support s t he T OUCH operat ion, which allows you t o modify t he
Last Dat aModifiedT ime value of a part it ion. T his operat ion changes t he Last Dat aModifiedT ime value of
a part it ion t o t he current t ime. T his t opic describes how t o run a DDL st at ement t o modify t he
Last Dat aModifiedT ime value of a part it ion.

Command synt ax :

alter table table_name touch partition(partition_col='partition_col_value', ...);

Not e
If t he specified t able_name or part it ion_col does not exist , an error is ret urned.
If t he specified part it ion_col_value does not exist , an error is ret urned.
T his operat ion modifies t he Last Dat aModifiedT ime value of t he t able. In t his case,
MaxComput e considers a change t o t he t able or part it ion value, and recalculat es t he
lifecycle.

For more informat ion about how t o modify t he Last Dat aModifiedT ime value of a t able, see Modify t he
Last Dat aModifiedT ime value of a t able.

6.3.3.7. Modify partition values


MaxComput e SQL provides t he RENAME operat ion, which allows you t o modify part it ion values of a
t able. T his t opic describes how t o run a DDL st at ement t o modify part it ion values.

Command synt ax :

121 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

ALTER TABLE table_name PARTITION (partition_col1 = partition_col_value1, partition_col2 = p


artiton_col_value2, . ..)
RENAME TO PARTITION (partition_col1 = partition_col_newvalue1, partition_col2 = partiton_co
l_newvalue2, ...);

Not e
T his command cannot modify t he names of part it ion columns. It can only modify t he values
of t he columns.
T o modify t he values in one or more part it ions in t he case of mult i-level part it ions, you must
specify values of part it ions at each level.

6.3.3.8. Merge partitions


MaxComput e allows you t o merge mult iple part it ions in a t able int o one part it ion and delet e original
part it ions.

Synt ax

ALTER TABLE <tableName> MERGE [IF EXISTS] PARTITION(<predicate>) [, PARTITION(<predicate2>)


...] OVERWRITE PARTITION(<fullPartitionSpec>) [PURGE];

Not e
If you do not specify t he IF EXIST S opt ion and t he part it ion you want t o merge does not
exist , an error is ret urned.
If you specify t he IF EXIST S opt ion but no part it ions meet t he merge condit ions, no new
part it ions are generat ed.
If source dat a is concurrent ly modified by operat ions such as INSERT , RENAME, or DROP when
you execut e t he preceding st at ement , an error is ret urned even t hough you have specified
t he IF EXIST S opt ion.
If t he PURGE at t ribut e is specified, merged part it ions cannot be rest ored by using t he
Kunlunjing.

Limit s and t roubleshoot ing

Ext ernal t ables, shard t ables, and t ables wit h ext reme st orage are not support ed. Xlib or Algo t ables
t hat depend on t he file order are not support ed. If you merge part it ions of a clust ered t able, t he
clust ered at t ribut e is removed from t he part it ions.
Hash operat ions are performed by Cat alogServer on t ables t o merge part it ions. A capacit y limit is
imposed on merged part it ions. A hard link in t he Apsara Dist ribut ed File Syst em can have a maximum
of seven replicas.
You can merge a maximum of 4,000 part it ions at a t ime.
T he number of part it ions t hat can wait on Cat alogServer t o be merged is 10 million.
If an error t hat indicat es Cat alogServer is busy occurs, t ry again lat er.
If a hard link in t he Apsara Dist ribut ed File Syst em is fault y, purge t he recycle bin and t hen t ry again.
Example

> Document Version: 20220928 122


User Guide· MaxComput e SQL MaxComput e

T he following code shows t he part it ions and dat a of t he t b_t est t able:

odps@ jet_zwz>list partitions tb_test;


ds=20181101/hh=00/mm=00
ds=20181101/hh=00/mm=10
ds=20181101/hh=10/mm=00
ds=20181101/hh=10/mm=10
OK
odps@ jet_zwz>read intpstringstringstring;
+------------+------------+------------+------------+
| value | ds | hh | mm |
+------------+------------+------------+------------+
| 1 | 20181101 | 00 | 00 |
| 1 | 20181101 | 00 | 10 |
| 1 | 20181101 | 10 | 00 |
| 1 | 20181101 | 10 | 10 |
+------------+------------+------------+------------+

Execut e t he following st at ement t o merge all part it ions t hat meet t he hh='00' condit ion int o t he
ds=20181101/hh=00/mm=00 part it ion:

odps@ jet_zwz>alter table intpstringstringstring merge partition(hh='00') overwrite partiti


on(ds='20181101', hh='00', mm='00');
ID = 20190404025755844g80qwa7a
OK

Execut e t he following st at ement t o view t he part it ions of t he t able aft er t hey are merged:

odps@ jet_zwz>list partitions intpstringstringstring;


ds=20181101/hh=00/mm=00
ds=20181101/hh=10/mm=00
ds=20181101/hh=10/mm=10
OK

Dat a in t wo part it ions t hat meet t he hh='00' condit ion is merged int o t he ds=20181101/hh=00/mm=00
part it ion.

odps@ jet_zwz>read intpstringstringstring;


+------------+------------+------------+------------+
| value | ds | hh | mm |
+------------+------------+------------+------------+
| 1 | 20181101 | 00 | 00 |
| 1 | 20181101 | 00 | 00 |
| 1 | 20181101 | 10 | 00 |
| 1 | 20181101 | 10 | 10 |
+------------+------------+------------+------------+

When you merge part it ions, you can specify mult iple predicat e condit ions. For example, you can
execut e t he following st at ement t o merge all t he part it ions t hat remain t o t he
ds=20181101/hh=00/mm=00 part it ion:

123 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

odps@ jet_zwz>alter table intpstringstringstring merge if exists partition(ds='20181101', h


h='00', mm='00'), partition(ds='20181101', hh='10', mm='00'), partition(ds='20181101', hh=
'10', mm='10') overwrite partition(ds='20181101', hh='00', mm='00') purge;
ID = 20190404034632854g431sqzt2
OK
odps@ jet_zwz>show partitions intpstringstringstring;
ds=20181101/hh=00/mm=00
OK

6.4. DML statements


6.4.1. INSERT statement
6.4.1.1. Update the data of a table
T his t opic describes how t o run an INSERT st at ement t o updat e t he dat a of a t able.

T he INSERT OVERWRIT E and INSERT INT O st at ement s are commonly used for dat a processing in
MaxComput e SQL. T hey are used t o save t he comput ing result s in t he t arget t able for t he next
comput ing. T he INSERT INT O st at ement adds dat a t o a t able or part it ion. T he INSERT OVERWRIT E
st at ement clears t he original dat a before insert ing dat a t o a t able or part it ion.
Command synt ax :

insert overwrite|into table tablename [partition (partcol1=val1, partcol2=val2 ...)] select


_statement
from from_statement;

Not e T he INSERT synt ax in MaxComput e is different from t hat in MySQL or Oracle. In


MaxComput e, INSERT OVERWRIT E or INSERT INT O must be followed by t he keyword T ABLE, not
direct ly by t he t able name.

Example :

T he following example calculat es t he sales of different regions in t he sale_det ail t able.

create table sale_detail_insert like sale_detail;


alter table sale_detail_insert add partition(sale_date='2017', region='china');
insert overwrite table sale_detail_insert partition (sale_date='2017', region='china') sele
ct shop_name, customer_id, total_price from sale_detail;

Not e When dat a is updat ed using an INSERT operat ion, t he mapping bet ween t he source
and t arget t ables depends on t he column sequence in t he SELECT clause, inst ead of t he mapping
of column names bet ween bot h t ables.

T he following st at ement is also valid:

> Document Version: 20220928 124


User Guide· MaxComput e SQL MaxComput e

insert overwrite table sale_detail_insert partition (sale_date='2017', region='china')


select customer_id, shop_name, total_price from sale_detail;
-- When the sale_detail_insert table is created, the column sequence is shop_name string, c
ustomer_id string, and total_price bigint.
-- When data in sale_detail is inserted to sale_detail_insert, the insertion sequence is cu
stomer_id, shop_name, and total_price.
-- In this case, data in sale_detail.customer_id is inserted into sale_detail_insert.shop_n
ame.
-- Data in sale_detail.shop_name is inserted into sale_detail_insert.customer_id.

When dat a is insert ed int o a part it ioned t able, t he part it ion columns cannot appear in t he SELECT list .

insert overwrite table sale_detail_insert partition (sale_date='2017', region='china') sele


ct shop_name, customer_id, total_price, sale_date, region from sale_detail;
-- An error is returned, because partition columns (sale_date and region) cannot appear in
an INSERT statement for a static partition.

6.4.1.2. Output data to multiple objects


T his t opic describes how t o run t he INSERT st at ement t o out put dat a t o mult iple object s.

MaxComput e SQL allows you t o insert dat a t o different result t ables or part it ions by using one SQL
st at ement .

Command synt ax :

from from_statement
insert overwrite | into table tablename1 [partition (partcol1=val1, partcol2=val2 ...)] sel
ect_statement1
[insert overwrite | into table tablename2 [partition ...] select_statement2]

Not e
A SQL st at ement t ypically support s up t o 256 out put s. A synt ax error is ret urned if more
t han 256 out put s are specified.
In a MULT I INSERT st at ement , you can specify a t arget part it ion in a part it ioned t able or
specify a non-part it ioned t able only once.
T he INSERT OVERWRIT E and INSERT INT O operat ions cannot be performed simult aneously on
different part it ions in a part it ioned t able. Ot herwise, an error is ret urned.

Example :

125 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

create table sale_detail_multi like sale_detail;


from sale_detail
insert overwrite table sale_detail_multi partition (sale_date='2016', region='china' ) sele
ct shop_name, customer_id, total_price
insert overwrite table sale_detail_multi partition (sale_date='2017', region='china' ) sele
ct shop_name, customer_id, total_price;
-- A success is returned. Data of the sale_detail table is inserted into the sale records o
f the China region in 2016 and 2017 in the sales table.
from sale_detail
insert overwrite table sale_detail_multi partition (sale_date='2017', region='china' ) sele
ct shop_name, customer_id, total_price
insert overwrite table sale_detail_multi partition (sale_date='2017', region='china' ) sele
ct shop_name, customer_id, total_price;
-- An error is returned. The same partition appears more than once.
from sale_detail
insert overwrite table sale_detail_multi partition (sale_date='2016', region='china' )
select shop_name, customer_id, total_price
insert into table sale_detail_multi partition (sale_date='2017', region='china' ) select sh
op_name, customer_id, total_price;
-- An error is returned. The INSERT OVERWRITE and INSERT INTO operations cannot be performe
d simultaneously on different partitions in a partitioned table.

6.4.1.3. Output data to a dynamic partition


T his t opic describes how t o use t he INSERT st at ement t o out put dat a t o a dynamic part it ion.

When you run t he INSERT OVERWRIT E st at ement on a part it ioned t able, you can specify t he part it ion
values in t he st at ement . Anot her flexible met hod is t o specify part it ion column names inst ead of
set t ing part it ion values. In t he meant ime, specify t he part it ion values in t he corresponding columns of a
SELECT clause.

Command synt ax :

insert overwrite table tablename partition (partcol1, partcol2 ...) select_statement from f
rom_statement;

Not e
When you run a SQL dynamic part it ion st at ement in a dist ribut ed environment , a single
process can out put up t o 512 dynamic part it ions. If t he number of dynamic part it ions
exceeds t his limit , an except ion is ret urned.
Current ly, a SQL dynamic part it ion st at ement can generat e up t o 2,000 dynamic part it ions. If
t he number of dynamic part it ions exceeds t his limit , an except ion is ret urned.
T he dynamic part it ion values cannot be NULL. Ot herwise, an except ion is ret urned.
If a t arget t able has mult i-level part it ions, you can specify some part it ions as st at ic
part it ions in an INSERT st at ement . However, t he st at ic part it ions must be high-level
part it ions.

Example :

> Document Version: 20220928 126


User Guide· MaxComput e SQL MaxComput e

create table total_revenues (revenue bigint) partitioned by (region string); insert overwri
te table total_revenues partition(region)
select total_price as revenue, region from sale_detail;

Not e In t he preceding example, you do not know which part it ions are generat ed before
running t he SQL st at ement . T he part it ions generat ed are det ermined by t he value of t he region
field aft er t he execut ion of t he SELECT st at ement . T his is why t he part it ions are called dynamic
part it ions.

Ot her examples:

create table sale_detail_dypart like sale_detail;


insert overwrite table sale_detail_dypart partition (sale_date, region) select * from sale_
detail;
-- A success is returned.
insert overwrite table sale_detail_dypart partition (sale_date='2017', region) select shop_
name,customer_id,total_price,region from sale_detail;
-- A success is returned. The table has multi-level partitions. Specify a primary partition
.
insert overwrite table sale_detail_dypart partition (sale_date='2017', region) select shop_
name,customer_id,total_price from sale_detail;
-- An error is returned. The inserted dynamic partition must be in the SELECT list.
insert overwrite table sales partition (region='china', sale_date) select shop_name,custome
r_id,total_price,region from sale_detail;
-- An error is returned. You cannot specify only low-level partitions when dynamically inse
rting high-level partitions.

6.4.2. SELECT statement


6.4.2.1. SELECT
T his t opic describes how t o use t he SELECT st at ement .

Synt ax

select [all | distinct] select_expr, select_expr, ... from table_reference


[where where_condition] [group by col_list]
[order by order_condition]
[distribute by distribute_condition [sort by sort_condition] ] [limit number]

T ake not e of t he following point s when you execut e t he SELECT st at ement :

T he SELECT st at ement reads dat a from a t able. You can specify t he names of t he columns you want
t o read or use an ast erisk (*) t o represent all columns.

Examples

select * from sale_detail;


-- Read data from all columns in the sale_detail table.
select shop_name from sale_detail;
-- Read data from the shop_name column in the sale_detail table.

127 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e T he SELECT st at ement can only ret urn a maximum of 1,000 rows of result s. However,
no such limit s are imposed when SELECT is used as a clause. If SELECT is used as a clause, t he
clause ret urns all result s in response t o t he query from t he upper layer. T o obt ain more t han
1,000 rows of result s by using t he SELECT st at ement , you must use T unnel t o download t he
ent ire t able or a t emporary t able ret urned by t he SELECT operat ion. For more informat ion, see
MaxComput e T unnel.

You can use t he WHERE clause t o specify filt er condit ions.

Examples

select * from sale_detail where shop_name like 'hang%';

T he following t able describes filt er condit ions support ed by t he WHERE clause.

Filter conditions

Filter condition Description

>, < , =, >=, <=, <> /

like, rlike /

If a subquery is added after the IN or NOT IN condition, only one


in, no t in column is returned for the subquery and the number of return
values cannot exceed 1,000.

You can specify part it ions in t he WHERE clause of t he SELECT st at ement t o avoid a full t able scan.

Examples

select sale_detail.* from sale_detail


where sale_detail.sale_date >= '2015' and sale_detail.sale_date <= '2017';

Not ice T o check whet her part it ion pruning t akes effect , execut e t he EXPLAIN SELECT
st at ement . A common user-defined funct ion (UDF) or t he met hod t hat is used t o specify
part it ion condit ions in a JOIN operat ion can cause part it ion pruning t o fail t o t ake effect .

UDFs support part it ion pruning. T hese UDFs are execut ed as small jobs and t hen replaced wit h t he
execut ion result s.

You can use one of t he following met hods:

Add an annot at ion t o t he UDF class when you writ e a UDF.

@com.aliyun.odps.udf.annotation.UdfProperty(isDeterministic=true)

Not ice com.aliyun.odps.udf.annot at ion.UdfPropert y defines t hat t he version of


referenced odps-sdk-udf must be 0.30.x or lat er in odps-sdk-udf.jar.

> Document Version: 20220928 128


User Guide· MaxComput e SQL MaxComput e

Add t he set odps.sql.udf.ppr.deterministic = true; flag before SQL st at ement s. T hen, all
UDFs in t he SQL st at ement s are considered det erminist ic.

Not e T his met hod is used wit h limit s. T his met hod backfills part it ions wit h execut ion
result s. A maximum of 1,000 part it ions can be backfilled. If an annot at ion is added t o t he UDF
class, an error t hat indicat es more t han 1,000 part it ions are backfilled may be ret urned. If you
want t o ignore t he error, add t he set odps.sql.udf.ppr.to.subquery = false; flag t o
disable t his feat ure globally. Aft er t his feat ure is disabled, UDF-based part it ion pruning
becomes invalid.

T he WHERE clause in an SQL st at ement can include t he BET WEEN...AND condit ion. Example:

SELECT sale_detail. * FROM sale_detail


WHERE sale_detail.sale_date between '2017' and '2019';

Not e T he number of condit ions t hat can be specified in t he WHERE clause cannot exceed
256.

Nest ed subqueries are support ed in t able_reference.

Examples

select * from (select region from sale_detail) t where region = 'shanghai';

DIST INCT : If duplicat e rows exist , add DIST INCT before t he field t o remove duplicat e values. In t his
case, only one value is ret urned. If you use ALL, all duplicat e values are ret urned. If you do not specify
t he DIST INCT opt ion, t he st at ement ret urns all duplicat e values, same as t he result obt ained by using
t he ALL opt ion.
Examples

select distinct region from sale_detail;


select distinct region, sale_date from sale_detail;
-- The DISTINCT option applies to multiple columns. The option takes effect on all column
s of the SELECT statement, instead of a single column.

GROUP BY: T his clause is used t o perform group-based queries. In most cases, t his clause is used wit h
aggregat e funct ions. If a SELECT st at ement includes aggregat e funct ions, t he key of t he GROUP BY
clause can be t he names of columns in t he input t able or an expression composed of input t able
columns. T he key cannot be t he aliases of t he columns in t he out put t able of t he SELECT operat ion.

Not e If t he set hive.groupby.position.alias=true; flag is added before SQL


st at ement s, int eger const ant s in t he GROUP BY clause are considered column numbers in a SELECT
operat ion. Example:

-- The columns in the sale_detail table are in the format of key-value pairs.
select region, sum(total_price) from sale_detail group by 1;
-- Equivalent to the following statement:
select region, sum(total_price) from sale_detail group by region;

Examples

129 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

select region from sale_detail group by region;


-- The statement is successfully executed because the name of a column in the input table
is used as the column in the GROUP BY clause.
select sum(total_price) from sale_detail group by region;
-- The statement is successfully executed because the values in the region column are use
d to group the table and to return the total sales of each group.
select region, sum(total_price) from sale_detail group by region;
-- The statement is successfully executed because the values in the region column are use
d to group the table and to return the unique region value and the total sales of each gr
oup.
select region as r from sale_detail group by r;
-- An error is returned because the alias of the column in the SELECT operation is used a
s the column in the GROUP BY clause.
select 'China-' + region as r from sale_detail group by 'China-' + region;
-- A complete expression of the column is required.
select region, total_price from sale_detail group by region;
-- An error is returned because all the columns that do not include aggregate functions i
n the SELECT operation must exist in the GROUP BY clause.
select region, total_price from sale_detail group by region, total_price;
-- The statement can be successfully executed.

Not e T he GROUP BY operat ion is performed before t he SELECT operat ion during t he
parsing of SQL st at ement s. T herefore, GROUP BY uses only t he column names or expressions of
t he input t able as keys. For more informat ion about aggregat e funct ions, see Aggregat e
funct ions.

ORDER BY: T his clause is used for global sort ing based on specific columns. T o sort records in
descending order, use t he DESC keyword. T he ORDER BY clause must be used wit h t he LIMIT clause
because records are globally sort ed. In an ORDER BY operat ion, NULL is considered t he lowest of all
values. T his rule is consist ent wit h MySQL, but is different from Oracle. Different from t he GROUP BY
clause, t he columns in t he ORDER BY clause must be t he aliases of t he columns in t he SELECT
operat ion. If you want t o query a column but t he column alias is not specified in t he SELECT
operat ion, t he column name is used as t he column alias.

Not e If t he set hive.orderby.position.alias=true; flag is added before SQL


st at ement s, int eger const ant s in t he ORDER BY clause are considered column numbers in a SELECT
operat ion. Example:

-- The columns in the sale_detail table are in the format of key-value pairs.
select region, sum(total_price) from sale_detail order by 2 limit 100;
-- Equivalent to the following statement:
select region, sum(total_price) from sale_detail order by sum(total_price) limit 100;

Examples

select * from sale_detail order by region;


-- An error is returned because the ORDER BY clause is not used with the LIMIT clause.
select * from sale_detail order by region limit 100;
select region as r from sale_detail order by region;
-- An error is returned because the ORDER BY clause is not followed by a column alias.
select region as r from sale_detail order by r;

> Document Version: 20220928 130


User Guide· MaxComput e SQL MaxComput e

Not e T he number in t he LIMIT clause is a const ant t hat limit s t he number of out put rows. If
a SELECT st at ement is execut ed wit hout t he LIMIT clause, it can ret urn a maximum of 5,000 rows.
T he screen display limit may vary wit h project s and can be configured in t he console.

T he OFFSET clause can be used wit h t he ORDER BY LIMIT clause t o skip t he number of rows specified
by OFFSET .
Examples

SELECT * FROM src ORDER BY key LIMIT 20 OFFSET 10;


-- Sort the rows of the src table in ascending order by key, and return the 11th to 30th
rows. OFFSET 10 indicates that the first 10 rows are skipped, and LIMIT 20 indicates that
a maximum of 20 rows can be returned.

DIST RIBUT E BY: T his clause is used t o shard dat a based on hash values of specific columns. T he
DIST RIBUT E BY clause must be followed by t he alias of an out put column from t he SELECT operat ion.
Examples

select region from sale_detail distribute by region;


-- The statement is successfully executed because the column name is used as the column a
lias.
select region as r from sale_detail distribute by region;
-- An error is returned because the DISTRIBUTE BY clause is not followed by a column alia
s.
select region as r from sale_detail distribute by r;

SORT BY: T his clause is used for part ial sort ing. T he DIST RIBUT E BY clause must be placed before t he
SORT BY clause. In pract ice, t he SORT BY clause is used t o part ially sort t he result s of t he DIST RIBUT E
BY clause. T he SORT BY clause must be followed by t he alias of an out put column from t he SELECT
operat ion.
Examples

select region from sale_detail distribute by region sort by region; select region as r fr
om sale_detail sort by region;
-- An error is returned because the SORT BY clause does not follow a DISTRIBUTE BY clause
.

T he ORDER BY and GROUP BY clauses cannot be used wit h t he DIST RIBUT E BY and SORT BY clauses.
T he ORDER BY and GROUP BY clauses must be followed by t he alias of an out put column from t he
SELECT operat ion.

Not e
T he key of t he ORDER BY, SORT BY, or DIST RIBUT E BY clause must be t he alias of an out put
column from t he SELECT operat ion.
T he SELECT operat ion is performed before t he ORDER BY, SORT BY, and DIST RIBUT E BY
clauses during t he parsing of SQL st at ement s. T herefore, only t he aliases of out put
columns from t he SELECT operat ion can be used as keys.

6.4.2.2. Subquery

131 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

T his t opic describes how t o use t he SELECT st at ement for subquery operat ions.

A common SELECT st at ement reads dat a from mult iple t ables, for example, select column_1, column_2
... from t able_name. T he query object can be anot her SELECT operat ion, which is a subquery.

Command synt ax :

select * from (select shop_name from sale_detail) a;

Not ice A subquery must have an alias.

Example :

create table shop as select * from sale_detail;


select a.shop_name, a.customer_id, a.total_price from
(select * from shop) a join sale_detail on a.shop_name = sale_detail.shop_name;

Not e In a FROM clause, a subquery can be used as a t able, which support s a JOIN operat ion
wit h ot her t ables or subqueries.

6.4.3. UNION statements


6.4.3.1. UNION ALL
T his t opic describes how t o execut e t he SELECT st at ement t o perform t he UNION ALL operat ion.

Synt ax

select_statement union all select_statement

Not e T he UNION ALL clause is used t o combine t wo or more dat aset s ret urned from a SELECT
operat ion int o one dat aset . If duplicat e rows exist in t he result s, all rows t hat meet t he condit ion
are ret urned, wit h duplicat e rows ret ained.

MaxComput e SQL does not support t he combinat ion of t wo t op-level query result s. T o combine t hem,
rewrit e t hem int o a subquery.

Format example bef ore rewrit ing

select * from sale_detail where region = 'hangzhou'


union all
select * from sale_detail where region = 'shanghai';

Format example af t er rewrit ing

select * from (
select * from sale_detail where region = 'hangzhou' union all
select * from sale_detail where region = 'shanghai') t;

> Document Version: 20220928 132


User Guide· MaxComput e SQL MaxComput e

T he synt ax t hat uses a pair of parent heses t o specify t he priorit y of UNION ALL is support ed.

Example:

SELECT * FROM src UNION ALL (SELECT * FROM src2 UNION ALL SELECT * FROM src3);
-- Execute the UNION ALL clause for the src2 and src3 tables. Then, execute the UNION ALL c
lause for the src table based on the obtained result.

Not ice
For a UNION ALL operat ion, all subqueries must have t he same number of columns, column
names, and column t ypes. If t he column names are inconsist ent , use column aliases.
In most cases, MaxComput e allows a UNION ALL operat ion for a maximum of 256 subqueries.
If t he limit is exceeded, a synt ax error is ret urned.

6.4.4. JOIN statement


6.4.4.1. JOIN
T his t opic describes how t o use a JOIN st at ement .

MaxComput e support s mult iple JOIN operat ions in an SQL st at ement . JOIN does not support Cart esian
product s (JOIN wit hout an ON clause).
Synt ax

join_table:
table_reference join table_factor [join_condition]
| table_reference {left outer|right outer|full outer|inner} join table_reference join_condi
tion
table_reference: table_factor
join_table
table_factor: tbl_name [alias]
table_subquery alias
( table_references )
join_condition:
on equality_expression ( and equality_expression )*

Not e equalit y_expression indicat es an equalit y expression.

T ake not e of t he following point s when you perform a JOIN operat ion:

LEFT OUT ER JOIN: ret urns all rows in t he left t able, such as shop in t he following example. T he
ret urned rows include t he rows t hat do not mat ch any rows in t he right t able, such as sale_det ail in
t he following example.

Example

133 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

select a.shop_name as ashop, b.shop_name as bshop from shop a left outer join sale_detail
b on a.shop_name=b.shop_name;
-- Both the shop and sale_detail tables have the shop_name column. You must use aliases t
o distinguish the columns in the SELECT operation.

RIGHT OUT ER JOIN: ret urns all rows in t he right t able, such as sale_det ail in t he following example. T he
ret urned rows include t he rows t hat do not mat ch any rows in t he left t able, such as shop in t he
following example.

Example

select a.shop_name as ashop, b.shop_name as bshop from shop a right outer join sale_detai
l b on a.shop_name=b.shop_name;
-- Both the shop and sale_detail tables have the shop_name column. You must use aliases t
o distinguish the columns in the SELECT operation.

FULL OUT ER JOIN: ret urns all rows in bot h t he left and right t ables.
Example

select a.shop_name as ashop, b.shop_name as bshop from shop a full outer join sale_detail
b on a.shop_name=b.shop_name;

INNER JOIN: only ret urns t he rows in which t wo t ables can be mapped. T he INNER keyword can be
omit t ed.

Example

select a.shop_name from shop a inner join sale_detail b on a.shop_name=b.shop_name; selec


t a.shop_name from shop a join sale_detail b on a.shop_name=b.shop_name;

Join condit ion: You must use equi-joins and combine condit ions by using AND. A maximum of 128 JOIN
operat ions are support ed in an SQL st at ement . You can use non-equi joins or combine condit ions by
using OR in a MAPJOIN operat ion.

Example

select a.* from shop a full outer join sale_detail b on a.shop_name=b.shop_name full oute
r join sale_detail c on a.shop_name=c.shop_name;
-- A maximum of 128 JOIN operations are supported in an SQL statement.
select a.* from shop a join sale_detail b on a.shop_name <> b.shop_name;
-- An error is returned because MaxCompute does not support non-equi joins.

NAT URAL JOIN: In a NAT URAL JOIN operat ion, t he condit ions used t o join t wo t ables are aut omat ically
det ermined based on t he common fields in t he t wo t ables. MaxComput e support s OUT ER NAT URAL
JOIN. You can use t he USING clause so t hat t he JOIN operat ion ret urns common fields only once.

Example

-- To join the src table that contains the key1, key2, a1, and a2 columns and the src2 ta
ble that contains the key1, key2, b1, and b2 columns, execute the following statement:
SELECT * FROM src NATURAL JOIN src2;
-- Both the src and src2 tables include the key1 and key2 fields. In this case, the prece
ding statement is equivalent to the following statement:
SELECT src.key1 as key1, src.key2 as key2, src.a1, src.a2, src2.b1, src2.b2 FROM src INNE
R JOIN src2 ON src.key1 = src2.key1 AND src.key2 = src2.key2;

> Document Version: 20220928 134


User Guide· MaxComput e SQL MaxComput e

T he synt ax t hat uses a pair of parent heses t o specify t he priorit ies of JOIN operat ions is support ed.
Example

SELECT * FROM src JOIN (src2 JOIN src3 on xxx) ON yyy;


-- The src2 JOIN src3 operation is executed first. Then, the JOIN operation is performed on
the src table based on the result.

6.4.4.2. MAPJOIN HINT


T his t opic describes how t o use a MAPJOIN st at ement t o join a large t able wit h one or more small t ables.

A MAPJOIN operat ion is fast er t han common JOIN operat ions.

When t he volume of dat a is small, MAPJOIN accelerat es t he execut ion process by using SQL t o load all
t he specified small t ables int o t he program memory t hrough t he JOIN operat ion.

Example

select /* + mapjoin(a) */ a.shop_name, b.customer_id, b.total_price


from shop a join sale_detail b
on a.shop_name = b.shop_name;

Not ice

Not e t he following point s when you use a MAPJOIN st at ement :

T he left t able of a LEFT OUT ER JOIN clause must be a large t able.


T he right t able of a RIGHT OUT ER JOIN clause must be a large t able.
Bot h t he left and right t ables of an INNER JOIN clause can be large t ables.
MAPJOIN cannot be used in a FULL OUT ER JOIN clause.
MAPJOIN support s small t ables in subqueries.
If you need t o reference a small t able or a subquery when using MAPJOIN, you must
reference t he alias of t he t able or subquery.
In MAPJOIN, you can use non-equi joins or combine mult iple condit ions by using OR.
If MAPJOIN is used, t he t ot al memory occupied by all t he small t ables cannot exceed 512 MB.
However, you can use t he odps.sql.mapjoin.memory.max paramet er t o raise t his limit up t o
2,048 MB.

T he limit here refers t o t he original size of dat a. If you run t he desc command t o obt ain t he
compressed size, you must mult iply it by t he compression rat io.

In MaxComput e SQL, you cannot use non-equi joins or t he OR logic in t he ON condit ion. However, you
can do t his in MAPJOIN. Example:

select /*+ mapjoin(a) */ a.total_price, b.total_price


from shop a join sale_detail b
on a.total_price < b.total_price or a.total_price + b.total_price < 500;

135 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.4.5. EXPLAIN statement


T his t opic describes t he EXPLAIN st at ement in DML st at ement s of MaxComput e SQL.

MaxComput e SQL provides t he EXPLAIN operat ion, which displays t he descript ion of t he ult imat e
execut ion plan st ruct ure of DML st at ement s. An execut ion plan is t he program t hat is ult imat ely used t o
execut e SQL semant ics.
Command synt ax :

EXPLAIN <DMLquery>;

Not e

T he execut ion result of an EXPLAIN st at ement includes t he following:

Dependencies bet ween all t he jobs of t his DML st at ement .


Dependencies bet ween all t he t asks of each job.
All operat or dependency st ruct ures in a t ask.

Example :

EXPLAIN
SELECT abs(a.key), b.value FROM src a JOIN src1 b ON a.value = b.value;

T he EXPLAIN st at ement out put includes t he following:


T he first part is t he dependency bet ween jobs.

Command out put :

job0 is root job

Not e Because t his query only needs one job (job0), only one line of informat ion is needed.

T he second part is t he dependency bet ween t asks.

Command out put :

In Job job0:
root Tasks: M1_Stg1, M2_Stg1
J3_1_2_Stg1 depends on: M1_Stg1, M2_Stg1

Not e

Job0 cont ains t hree t asks, among which M1_St g1 and M2_St g1 are execut ed first , and J3_1_2_St g1
is execut ed aft er t he first t wo t asks are finished.

> Document Version: 20220928 136


User Guide· MaxComput e SQL MaxComput e

Naming rules for t asks: MaxComput e provides four t ask t ypes: MapT ask, ReduceT ask, JoinT ask, and
LocalWork. T he first let t er of a t ask name indicat es t he t ype of t he current t ask (for example,
M2St g1 is a MapT ask). T he number immediat ely following t he first let t er represent s t he current
T ask ID, which is unique among all t asks in t he current query. T he numbers separat ed by
underscores (_) represent t he immediat e dependencies of t he current t ask. For example,
J3_1_2_St g1 means t hat t he current t ask (ID 3) is dependent on t asks wit h ID 1 and ID 2.

T he t hird part is t he operat or st ruct ure in t he t asks, where each operat or st ring describes t he
execut ion semant ics of a t ask.

Command out put :

In Task M1_Stg1:
Data source: yudi_2.src #### "Data source" describes the input content of the current tas
k TS: alias: a #### TableScanOperator
RS: order: + #### ReduceSinkOperator keys:
a.value values:
a.key partitions:
a.value
In Task J3_1_2_Stg1:
JOIN: a INNER JOIN b #### JoinOperator
SEL: Abs(UDFToDouble(a._col0)), b._col5 #### SelectOperator FS: output: None #### FileSin
kOperator
In Task M2_Stg1:
Data source: yudi_2.src1 TS: alias: b
RS: order: + keys:
b.value values:
b.value partitions:
b.value

T he meanings of t he operat ors are shown as below.

O perators

Operator Description

Describes the logic of FROM statement blocks in a query


T ableScanOperat o r statement. T he input table name (alias) is displayed in the
EXPLAIN results.

Describes the logic of SELECT statement blocks in a query


statement. T he columns passed to the next operator,
separated by commas, are displayed in the EXPLAIN results. If
the result is a reference to a column, it is displayed as < alias >.
Select Operat o r
< column_name >. If the result is an expression, it is displayed as
a function, for example, func1(arg1_1, arg1_2, func2(arg2_1,
arg2_2)). If the result is a constant, the value is displayed
directly.

Describes the logic of WHERE statement blocks in a query


statement. A WHERE condition, which complies with a display
Filt erOperat o r
rule similar to that of selectOperator, is displayed in the EXPLAIN
results.

137 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Operator Description

Describes the logic of JOIN statement blocks in a query


Jo inOperat o r statement. T he tables involved in the JOIN operation and the
mode of JOIN operation are displayed in the EXPLAIN results.

Describes the logic of the AGGREGAT E operation. T his structure


is displayed if an aggregate function is used in a query. T he
Gro upByOperat o r
content of the aggregate function is displayed in the EXPLAIN
results.

Describes the logic of the data distribution operation between


tasks. If the result of the current task is transferred to another
task, ReduceSinkOperator must be used to distribute data at the
ReduceSinkOperat o r
end of the current task. T he output sorting method, the
distributed keys, values, and columns used to calculate the hash
value are displayed in the EXPLAIN results.

Describes the final data storage operation. If there is an INSERT


FileSinkOperat o r statement block in the query statement, the name of the target
table is displayed in the EXPLAIN results.

Describes the logic of LIMIT statement blocks in a query


Limit Operat o r
statement. T he limit value is displayed in the EXPLAIN results.

Mapjo inOperat o r Describes JOIN operations in large tables, similar to JoinOperator.

Not e
If a query is complex and has t oo many EXPLAIN result s, t he API rest rict ion is t riggered, and
incomplet e result s are displayed. In t his case, t he query can be split , and t he EXPLAIN
operat ion can be performed on each part t o show t he st ruct ure of t he job.
T he maximum number of part it ions in a query is 10,000. Input t ing t oo many part it ions
leads t o over-lengt h Dat a source cont ent . T o circumvent t his limit , you can filt er out most
part it ions by adding a query filt er.

6.4.6. GROUPING SETS


6.4.6.1. Overview
For scenarios where you need t o aggregat e and analyze dat a of mult iple dimensions, you must execut e
mult iple UNION ALL clauses. For example, you want ed t o aggregat e column a, aggregat e column b, and
aggregat e columns a and b t oget her. T he GROUPING SET S clause is a bet t er choice in such cases.
GROUPING SET S is an ext ension t o t he GROUP BY clause in t he SELECT st at ement . You can group result s
in various ways by using GROUPING SET S wit hout execut ing mult iple SELECT st at ement s. T his can
produce bet t er execut ion plans and result in higher performance from t he MaxComput e engine.

Not ice Many examples in t his t opic are demonst rat ed using MaxComput e St udio. We
recommend t hat you inst all MaxComput e St udio before you proceed wit h subsequent operat ions.

> Document Version: 20220928 138


User Guide· MaxComput e SQL MaxComput e

6.4.6.2. Example
T he following example is for your reference.

1. Prepare dat a.

create table requests LIFECYCLE 20 as


select * from values
(1, 'windows', 'PC', 'Beijing'),
(2, 'windows', 'PC', 'Shijiazhuang'),
(3, 'linux', 'Phone', 'Beijing'),
(4, 'windows', 'PC', 'Beijing'),
(5, 'ios', 'Phone', 'Shijiazhuang'),
(6, 'linux', 'PC', 'Beijing'),
(7, 'windows', 'Phone', 'Shijiazhuang')
as t(id, os, device, city);

2. Use GROUPING SET S.

SELECT os,device, city ,COUNT(*)


FROM requests
GROUP BY os, device, city GROUPING SETS((os, device), (city), ());

A similar out put is displayed.


Command out put

Not e You can also execut e mult iple SELECT st at ement s t o obt ain t he same result .

SELECT NULL, NULL, NULL, COUNT(*)


FROM requests
UNION ALL
SELECT os, device, NULL, COUNT(*)
FROM requests GROUP BY os, device
UNION ALL
SELECT null, null, city, COUNT(*)
FROM requests GROUP BY city;

However, t he GROUPING SET S met hod is simpler and more efficient .

Not ice Expressions not used in GROUPING SET S use NULL as placeholders. You can execut e
UNION st at ement s on grouping set s.

139 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.4.6.3. CUBE and ROLLUP


CUBE and ROLLUP are special GROUPING SET S funct ions. CUBE list s all possible combinat ions of t he
specific columns as grouping set s. ROLLUP aggregat es dat a by level t o generat e grouping set s.

Example

GROUP BY CUBE(a, b, c)
-- Equivalent to the following statement:
GROUPING SETS((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),())
GROUP BY ROLLUP(a, b, c)
-- Equivalent to the following statement:
GROUPING SETS((a,b,c),(a,b),(a), ())
GROUP BY CUBE ( (a, b), (c, d) )
-- Equivalent to the following statement:
GROUPING SETS (
( a, b, c, d ),
( a, b ),
( c, d ),
( )
)
GROUP BY ROLLUP ( a, (b, c), d )
-- Equivalent to the following statement:
GROUPING SETS (
( a, b, c, d ),
( a, b, c ),
( a ),
( )
)
GROUP BY a, CUBE (b, c), GROUPING SETS ((d), (e))
-- Equivalent to the following statement:
GROUP BY GROUPING SETS (
(a, b, c, d), (a, b, c, e),
(a, b, d), (a, b, e),
(a, c, d), (a, c, e),
(a, d), (a, e)
)
GROUP BY grouping sets((b), (c),rollup(a,b,c))
-- Equivalent to the following statement:
GROUP BY GROUPING SETS (
(b), (c),
(a,b,c), (a,b), (a), ()
)

6.4.6.4. GROUPING and GROUPING_ID


NULL is used as placeholders in grouping set s, but it can also be a value t hat is manually ent ered. In t he
code, however, placeholder NULLs are indist inguishable from value NULLs. T he GROUPING funct ion is
provided t o address t his issue.

GROUPING allows you t o specify t he name of a column as a paramet er. If t he specified lines are
aggregat ed based on a column whose name is used as a paramet er in t his funct ion, 0 is ret urned,
indicat ing t hat NULL is an ent ered value. Ot herwise, 1 is ret urned, indicat ing t hat NULL is a placeholder.

> Document Version: 20220928 140


User Guide· MaxComput e SQL MaxComput e

GROUPING_ID can be used t o specify t he names of one or more columns as paramet ers. T he GROUPING
result s in t hese columns are formed int o int egers by using Bit Map.

Example:

SELECT a,b,c ,COUNT(*),


GROUPING(a) ga, GROUPING(b) gb, GROUPING(c) gc, GROUPING_ID(a,b,c) groupingid
FROM VALUES (1,2,3) as t(a,b,c)
GROUP BY CUBE(a,b,c);

A similar out put is displayed.


Command out put

6.4.7. IF statement
MaxComput e SQL support s t he IF-ELSE st at ement .

You can use t he IF-ELSE st at ement t o execut e SQL script s wit h specific condit ions. T he condit ion in t he
IF-ELSE st at ement can be a st andard variable or a scalar subquery t hat ret urns only one column value
from one row.

T he IF st at ement allows t he syst em t o aut omat ically select t he execut ion logic based on t he specified
condit ions. MaxComput e support s t he following IF synt ax:

IF (condition) BEGIN
statement 1
statement 2
...
END
IF (condition) BEGIN
statements
END ELSE IF (condition2) BEGIN
statements
END ELSE BEGIN
statements
END

Not e T he BEGIN and END condit ional clause can be omit t ed because it cont ains only one
st at ement , similar t o '{ }' in Java.

T he IF st at ement can cont ain t wo t ypes of condit ions: expressions and scalar subqueries. Bot h of t hem
are of t he BOOLEAN t ype.
Expressions: A BOOLEAN-t ype expression in t he IF-ELSE st at ement det ermines which branch is
execut ed at t he compiling st age. Example:

141 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

@date := '20190101';
@row TABLE(id STRING); -- Declare the row variable. The type of the row is Table and sche
ma is STRING.
IF ( cast(@date as bigint) % 2 == 0 ) BEGIN
@row := SELECT id from src1;
END ELSE BEGIN
@row := SELECT id from src2;
END
INSERT OVERWRITE TABLE dest SELECT * FROM @row;

Scalar subqueries: A BOOLEAN-t ype scalar subquery in t he IF-ELSE st at ement det ermines which
branch is execut ed at t he running st age. T herefore, you must submit mult iple jobs. Example:

@i bigint;
@t table(id bigint, value bigint);
IF ((SELECT count(*) FROM src WHERE a = '5') > 1) BEGIN
@i := 1;
@t := select @i, @i*2;
END ELSE
BEGIN
@i := 2;
@t := select @i, @i*2;
END
select id, value from @t;

6.5. SELECT TRANSFORM


6.5.1. Overview
SELECT T RANSFORM implement s feat ures t hat MaxComput e SQL does not provide. SELECT T RANSFORM
allows you t o st art a specified child process and ent er dat a of a required format int o t he child process
t hrough st andard input (st din). T hen, you can parse t he st andard out put (st dout ) of t he child process
t o obt ain t he final out put . T his process does not require you t o compile UDFs.
SELECT T RANSFORM simplifies t he reference of script code and support s programming languages such
as Java, Pyt hon, Shell, and Perl. It is suit able for ad hoc dat a analysis. MaxComput e Select T ransform is
fully compat ible wit h Hive synt ax, feat ures, and act ions, including input /out put row format and
reader/writ er. Most Hive script s can be added direct ly t o t he SELECT T RANSFORM st at ement . Ot hers can
be used aft er a few changes.

Command synt ax:

SELECT TRANSFORM(arg1, arg2 ...)


(ROW FORMAT DELIMITED (FIELDS TERMINATED BY field_delimiter (ESCAPED BY character_escape)?)
? (LINES SEPARATED BY line_separator)? (NULL DEFINED AS null_value)?)?
USING 'unix_command_line'
(RESOURCES 'res_name' (',' 'res_name')*)?
( AS col1, col2 ...)?
(ROW FORMAT DELIMITED (FIELDS TERMINATED BY field_delimiter (ESCAPED BY character_escape)?)
? (LINES SEPARATED BY line_separator)? (NULL DEFINED AS null_value)?)?

Descript ion:

SELECT T RANSFORM: T he SELECT T RANSFORM keyword can be replaced wit h t he MAP or REDUCE

> Document Version: 20220928 142


User Guide· MaxComput e SQL MaxComput e

keyword while maint aining t he same semant ic meaning. However, we recommend t hat you
useSELECT T RANSFORM because it s synt ax is simpler.
(arg1, arg2 ...): argument s in t he T RANSFORM clause. T heir format is similar t o t hose of it ems in t he
SELECT clause. In t he default format , t he result s of expressions for each argument are combined by
using \t aft er t hey are implicit ly convert ed int o st rings. T he argument s are t hen ent ered int o t he
specified child process.

Not e T he default format is configurable. For more informat ion, see ROW FORMAT .

USING: specifies t he command used t o st art a child process. Not e t he following point s about t he
USING clause.
In most MaxComput e SQL st at ement s, t he USING clause can only specify resources. However, in t he
SELECT T RANSFORM st at ement , t he USING clause can specify commands t o ensure compat ibilit y
wit h Hive synt ax.
T he format of t he USING clause is similar t o t he synt ax of a Shell script . However, a Shell script is
not act ually expect ed t o st art t he child process. T he child process is creat ed based on t he
command input . Because of t his, a number of Shell funct ions, such as input and out put redirect ion,
pipe, and loop, are unavailable. A Shell script can be used as t o st art a child process if necessary.

RESOURCES: specifies t he resources t hat t he specified child process can access. You can use one of
t he following met hods t o specify resources:
Use t he RESOURCES clause. Example: using ‘sh foo.sh bar.txt’ Resources ‘foo.sh’,’bar.txt’
.
Add t he set odps.sql.session.resources=foo.sh,bar.txt; clause before SQL st at ement s.

Not ice T his clause t akes effect globally once it is specified. All SELECT T RANSFORM
st at ement s will be able t o access t he resources specified by t his clause.

ROW FORMAT : specifies t he input or out put format . T wo ROW FORMAT clauses are used in t he
synt ax: t he first one specifies t he input format , and t he second one specifies t he out put format . \t
is used t o separat e columns, \n is used t o separat e rows, and NULL is represent ed by \N .

Not ice
For field_delimit er, charact er_escape, and line_separat or, only one charact er can be
accept ed. If you specify a st ring, t he first charact er in t he st ring t akes priorit y over t he
ot hers.
T here are a variet y of Hive synt axes t o specify format s. MaxComput e support s synt axes
such as input RecordReader, out put RecordReader, and Serdeinput . T o use t hese format s,
you must enable Hive compat ibilit y by adding t he set odps.sql.hive.compatible=true;
clause before SQL st at ement s. If you specify a synt ax such as input RecordReader or
out put RecordReader support ed by Hive, st at ement s may be execut ed at lower speeds.

AS: specifies out put columns.

143 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e
You can specify dat a t ypes in t he AS clause, as in as(col1:bigint , col2:boolean). By
default , st rings are ret urned if you do not specify dat a t ypes, as in as(col1, col2).
T he out put is obt ained by parsing t he st dout of t he child process. If t he specified dat a
t ypes do not include ST RING, t he syst em implicit ly calls t he CAST funct ion. Runt ime
except ions may occur when t he CAST funct ion is called.
You cannot specify dat a t ypes for only some of t he columns, as in as(col1, col2:bigint ).
If you skip t he AS clause, t he field preceding t he first \t in t he st dout is a key, and all t he
following part s are a value. T his is equivalent t o as(key, value).

6.5.2. SELECT TRANSFORM examples


6.5.2.1. Call Shell scripts
In t his example, a Shell script is used t o generat e 50 lines of dat a st art ing from 1 t o 50. T he out put of
t he dat a field is as follows:

SELECT TRANSFORM(script) USING 'sh' AS (data)


FROM (
SELECT 'for i in `seq 1 50`; do echo $i; done' AS script
) t
;

T he Shell commands are used as t he input of t he T RANSFORM clause.

Not e In addit ion t o language ext ensions, SELECT T RANSFORM also provides simple feat ures
of AWK, Pyt hon, Perl, and Shell t o compile script s in commands. You do not need t o compile script
files or upload resources separat ely.

You can upload script files for complex cases, as in t he following example Pyt hon script call.

6.5.2.2. Call Python scripts


T his t opic provides an example of how t o use SELECT T RANSFORM t o call Pyt hon script s.

1. Compile a Pyt hon script file. In t his example, t he file name is myplus.py.

#! /usr/bin/env python
import sys
line = sys.stdin.readline()
while line:
token = line.split('\t')
if (token[0] == '\\N') or (token[1] == '\\N'):
print '\\N'
else:
print int(token[0]) + int(token[1])
line = sys.stdin.readline()

2. Add t he Pyt hon script file as a resource t o MaxComput e.

> Document Version: 20220928 144


User Guide· MaxComput e SQL MaxComput e

add py ./myplus.py -f;

Not e You can also add resources from t he Dat aWorks console.

3. Execut e t he SELECT T RANSFORM st at ement t o call t he resource.

Create table testdata(c1 bigint,c2 bigint); -- Create a test table.


insert into Table testdata values (1,4),(2,5),(3,6); -- Insert test data into the test
table.
-- Execute the SELECT TRANSFORM statement:
SELECT
TRANSFORM (testdata.c1, testdata.c2)
USING 'python myplus.py'resources 'myplus.py'
AS (result bigint)
FROM testdata;
-- Or
set odps.sql.session.resources=myplus.py;
SELECT
TRANSFORM (testdata.c1, testdata.c2)
USING 'python myplus.py'
AS (result bigint)
FROM testdata;

4. A similar out put is displayed:

+-----+
| cnt |
+-----+
| 5 |
| 7 |
| 9 |
+-----+

Pyt hon script s are not subject t o any format requirement s and do not require a Pyt hon framework t o be
run in MaxComput e. In MaxComput e, Pyt hon commands can be used as t he input of t he T RANSFORM
clause. For example, you can call Shell script s by running Pyt hon commands.

SELECT TRANSFORM('for i in xrange(1, 50): print i;') USING 'python' AS (data);

6.5.2.3. Call Java scripts


Java script s are called in a similar manner t o Pyt hon script s. In t his example, you need t o compile a Java
script file, export it as a JAR package, and t hen run t he add command t o add t he JAR package as a
resource t o MaxComput e. T he resource will be called by using SELECT T RANSFORM.

1. Compile a Java script file and export it as a JAR package. In t his example, t he name of t he JAR
package is Sum.jar.

145 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

package com.aliyun.odps.test;
import java.util.Scanner;
public class Sum {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
while (sc.hasNext()) {
String s = sc.nextLine();
String[] tokens = s.split("\t");
if (tokens.length < 2) {
throw new RuntimeException("illegal input");
}
if (tokens[0].equals("\\N") || tokens[1].equals("\\N")) {
System.out.println("\\N");
}
System.out.println(Long.parseLong(tokens[0]) + Long.parseLong(tokens[1]));
}
}
}

2. Add t he JAR package as a resource t o MaxComput e.

add jar . /Sum.jar -f;

3. Execut e t he SELECT T RANSFORM st at ement t o call t he resource.

Create table testdata(c1 bigint,c2 bigint); -- Create a test table.


insert into Table testdata values (1,4),(2,5),(3,6); -- Insert test data into the test
table.
-- Execute the SELECT TRANSFORM statement:
SELECT TRANSFORM(testdata.c1, testdata.c2)
USING 'java -cp Sum.jar com.aliyun.odps.test.Sum' resources 'Sum.jar'
from testdata;
-- Or
set odps.sql.session.resources=Sum.jar;
SELECT TRANSFORM(testdata.c1, testdata.c2)
USING 'java -cp Sum.jar com.aliyun.odps.test.Sum'
FROM testdata;

4. A similar out put is displayed:

+-----+
| cnt |
+-----+
| 5 |
| 7 |
| 9 |
+-----+

You can use t he preceding met hod t o run most Java ut ilit ies.

Alt hough UDT F frameworks are provided for Java and Pyt hon, it is easier t o compile code by using
SELECT T RANSFORM. SELECT T RANSFORM is a simpler process because it is not subject t o any format
requirement s and can be called offline. T he pat hs for Java and Pyt hon offline script s can be obt ained
from t he JAVA_HOME and PYT HON_HOME environment variables.

> Document Version: 20220928 146


User Guide· MaxComput e SQL MaxComput e

6.5.2.4. Call scripts of other languages


In addit ion t o language ext ensions, SELECT T RANSFORM also support s commonly used Unix command
and script int erpret ers, such as AWK and Perl.

An example of calling AWK:

SELECT TRANSFORM(*) USING "awk '//{print $2}'" as (data) from testdata;

An example of calling Perl:

SELECT TRANSFORM (testdata.c1, testdata.c2) USING "perl -e 'while($input = <STDIN>){print $


input;}'" FROM testdata;

Not ice PHP and Ruby are not deployed in t he MaxComput e clust er and cannot be called.

6.5.2.5. Call scripts in series


SELECT T RANSFORM allows you t o call script s in series. For example, you can use DIST RIBUT E BY and
SORT BY t o pre-process dat a.

SELECT TRANSFORM(key, value) USING 'cmd2' from


(
SELECT TRANSFORM(*) USINg 'cmd1' from
(
SELECt * FROM data distribute by col2 sort by col1
) t distribute by key sort by value
) t2;

More oft en, you can use eit her t he map or reduce keywords t o produce t he same result s.

@a := select * from data distribute by col2 sort by col1;


@b := map * using 'cmd1' distribute by col1 sort by col2 from @a;
reduce * using 'cmd2' from @b;

6.5.3. Performance advantages


T he performance of SELECT T RANSFORM and UDT F varies depending on t he specific scenario. In general,
SELECT T RANSFORM performs bet t er. However, UDT F performs bet t er as t he volume of dat a increases.
Because t he development of t ransform is easier, SELECT T RANSFORM is more suit able for ad hoc dat a
analysis.

T he advant ages of UDT Fs and SELECT T RANSFORM are list ed in t he following sect ions.

Advantages of UDTFs
Out put and input follow specified dat a t ypes and do not require conversion.
Processes are not suspended if t he operat ing syst em pipe is empt y or fully occupied. T he operat ing
syst em pipe has a 4 KB buffer.
Const ant paramet ers do not need t o be t ransmit t ed.

147 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Advantages of SELECT TRANSFO RM


Support s child and parent processes and can ut ilize mult iple server cores when high CPU usage and
low t hroughput is needed.
Calls underlying syst ems t o read and writ e dat a t o be t ransmit t ed, giving it a higher performance
t han Java.
Support s t ools such as AWK and can run nat ive code.

6.6. UNION, INTERSECT, and EXCEPT


T his t opic describes SQL synt ax, descript ions and examples of UNOIN ALL, UNION DIST INCT , INT ERSECT
ALL, INT ERSECT DIST INCT , EXCEPT ALL, and EXCEPT DIST INCT .
Synt ax:

select_statement UNION ALL select_statement;


select_statement UNION [DISTINCT] select_statement;
select_statement INTERSECT ALL select_statement;
select_statement INTERSECT [DISTINCT] select_statement;
select_statement EXCEPT ALL select_statement;
select_statement EXCEPT [DISTINCT] select_statement;
select_statement MINUS ALL select_statement;
select_statement MINUS [DISTINCT] select_statement;

Purpose: It is used t o ret urn t he union of t wo dat a set s, t he int ersect ion of t wo dat a set s,
or t he complement of t he second dat aset in t he f irst dat aset .

Descript ion:

UNION: ret urns t he union of t wo dat aset s. It combines t he t wo dat aset s int o one dat aset .
INT ERSECT : ret urns t he int ersect ion of t wo dat aset s. It out put s t he records cont ained in bot h
dat aset s.
EXCEPT : ret urns t he complement of t he second dat aset in t he first dat aset . It out put s t he records
t hat are cont ained in t he first dat aset , but not in t he second dat aset .
MINUS: equivalent t o EXCEPT .

Examples:

UNOIN ALL example:

SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b)


UNION ALL
SELECT * FROM VALUES (1, 2), (1, 4) t(a, b);

Ret urned result : t wo dat aset s are combined.

> Document Version: 20220928 148


User Guide· MaxComput e SQL MaxComput e

+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 4 |
| 1 | 2 |
| 1 | 2 |
| 3 | 4 |
+------------+------------+

UNION DIST INCT example:

SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b)


UNION
SELECT * FROM VALUES (1, 2), (1, 4) t(a, b);

Ret urned result : equivalent t o SELECT DISTINCT * FROM (< the result of UNOIN ALL >) t; .

+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 4 |
| 3 | 4 |
+------------+------------+

INT ERSECT ALL example:

SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b)
INTERSECT ALL
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);

Ret urned result : deduplicat ion is skipped in INT ERSECT ALL. It seems t hat t here is a hidden serial
number behind t he same row and each row can be displayed separat ely.

+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 2 |
| 3 | 4 |
+------------+------------+

INT ERSECT DIST INCT example:

SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b)
INTERSECT
SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);

Ret urned result : SELECT DISTINCT * FROM (< the result of INTERSECT ALL >) t; .

149 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 3 | 4 |
+------------+------------+

EXCEPT ALL example:

SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b)
EXCEPT ALL
SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);

Ret urned result : deduplicat ion is skipped in EXCEPT ALL. T here is a hidden serial number behind t he
same row and each row can be displayed separat ely.

+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 1 | 2 |
| 3 | 4 |
| 7 | 8 |
+------------+------------+

EXCEPT DIST INCT example:

SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b)
EXCEPT
SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);

Ret urned result : equivalent t o Select distinct * FROM left_branch limit t all select distinct
* FROM right_branch; .

+------------+------------+
| a | b |
+------------+------------+
| 1 | 2 |
| 7 | 8 |
+------------+------------+

> Document Version: 20220928 150


User Guide· MaxComput e SQL MaxComput e

Not e
Sort ing may be skipped in t he preceding operat ions.
T he left and right branches in t he preceding operat ions must have t he same number of
columns. In addit ion, if dat a t ypes in t he left and right branches are not consist ent , t hey may
be implicit ly convert ed. Due t o compat ibilit y issues, implicit conversion is not carried out
bet ween ST RING and no-ST RING t ypes for t he preceding operat ions.
Up t o 256 branches are allowed in t he preceding operat ions. An error is ret urned if more
branches are used.
If t he UNION st at ement is followed by t he CLUST ER BY, DIST RIBUT E BY, SORT BY, ORDER BY or
LIMIT clause and you add set odps.sql.type.system.odps2=false; , t he SET st at ement is
applicable t o t he last select_statement; of t he UNION st at ement . If you add set odps.
sql.type.system.odps2=true; , t he SET st at ement is applicable t o all select _st at ement s of
t he UNION st at ement . Example:

set odps.sql.type.system.odps2=true;
SELECT explode(array(3, 1)) AS (a) UNION ALL SELECT explode(array(0, 4, 2)) AS (a
) ORDER
BY a LIMIT 3;

Ret urned result :

+------+
| a |
+------+
| 0 |
| 1 |
| 2 |
+------+

6.7. Built-in functions


6.7.1. Mathematical functions
6.7.1.1. ABS
T his t opic describes t he ABS funct ion.
Funct ion declarat ion:

double abs(double number)


bigint abs(bigint number)
decimal abs(decimal number)

Purpose: It is used t o ret urn absolut e values.

Descript ion:

151 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

number: double, bigint or decimal t ype. When t he input is of t he bigint t ype, a value of t he bigint t ype
is ret urned; when t he input is of t he double t ype, a value of t he double t ype is ret urned. If t he input is
of t he st ring t ype, it is implicit ly convert ed int o a value of t he double t ype before t his comput at ion. If
t he input is of anot her t ype, an error is ret urned.

Ret urned value: double, bigint , or decimal t ype, depending on t he t ype of t he input . If t he input is
NULL, NULL is ret urned.

Not e When t he input is of t he bigint t ype and is out of t he maximum range of t he bigint
t ype, t he ret urned value is of t he double t ype. In t his case, t he precision may be diminished.

Example:

abs(null) = null
abs(-1) = 1
abs(-1.2) = 1.2
abs("-2") = 2.0
abs(122320837456298376592387456923748) = 1.2232083745629837e32

T he following example shows t he usage of a complet e ABS funct ion in SQL. Ot her built -in funct ions
(except window funct ions and aggregat ion funct ions) are in similar usage t o t his funct ion and are not
shown here.

select abs(id) from tbl1;


-- Take the absolute value of the id field in tbl1.

6.7.1.2. ACOS
Funct ion declarat ion:

double acos(double number)


decimal acos(decimal number)

Purpose: It is used t o calculat e t he arccosine of a number.

Descript ion:

number: double or decimal t ype. Value range: -1 t o 1. If t he input is of t he st ring or bigint t ype, it is
implicit ly convert ed int o a value of t he double t ype before t his comput at ion. For all ot her input t ypes,
an error is ret urned.

Ret urned value: double or decimal t ype. Value range: 0 t o π. If number is NULL, NULL is ret urned.

Example:

acos("0.87") = 0.5155940062460905
acos(0) = 1.5707963267948966

6.7.1.3. ASIN
Funct ion declarat ion:

> Document Version: 20220928 152


User Guide· MaxComput e SQL MaxComput e

double asin(double number)


decimal asin(DECIMAL number)

Purpose: It is used t o calculat e t he arcsine of a number.

Descript ion:

number: double or decimal t ype. Value range: -1 t o 1. If t he input is of t he st ring or bigint t ype, it is
implicit ly convert ed int o a value of t he double t ype before t his comput at ion. For all ot her input t ypes,
an error is ret urned.

Ret urned value: double or decimal t ype. Value range: -π/2 t o π/2. If number is NULL, NULL is ret urned.

Example:

asin(1) = 1.5707963267948966
asin(-1) = -1.5707963267948966

6.7.1.4. ATAN
Funct ion declarat ion:

double atan(double number)

Purpose : It is used t o calculat e t he arct angent of a number.

Descript ion:

number: double t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a value of
t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value: double t ype. Value range: -π/2 t o π/2. If number is NULL, NULL is ret urned.

Example:

atan(1) = 0.7853981633974483;
atan(-1) = -0.7853981633974483

6.7.1.5. CEIL
Command synt ax :

bigint ceil(double value)


bigint ceil(decimal value)

Purpose : It is uesed t o ret urn t he smallest int eger t hat is equal t o or great er t han t he input value.

Descript ion:

value: double or decimal. If t he value is of t he st ring or bigint t ype, it is implicit ly convert ed t o t he


double t ype. For all ot her input t ypes, an error is ret urned.

Ret urned value : bigint t ype. If t he input is NULL, NULL is ret urned.

Example :

153 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

ceil(1.1) = 2
ceil(-1.1) = -1

6.7.1.6. CONV
Command synt ax :

string conv(string input, bigint from_base, bigint to_base)

Purpose : It is ued t o convert a number from one numeric base number syst em t o anot her.

Descript ion:

input : an int eger of t he st ring t ype t o be convert ed. It accept s values of t he bigint and double t ypes
by means of implicit conversion.
from_base, t o_base: a number syst em value in decimal form. Value range: 2, 8, 10, and 16. It accept s
values of t he st ring and double t ypes by means of implicit conversion.

Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned. T he conversion process runs at a 64-
bit precision. An error is ret urned when overflow occurs. If t he input is a negat ive value (beginning wit h
'-'), an error is ret urned. If t he input is a decimal, it is convert ed t o an int eger before hex conversion. T he
decimal part is left out .
Example :

conv('1100', 2, 10) = '12'


conv('1100', 2, 16) = 'c'
conv('ab', 16, 10) = '171'
conv('ab', 16, 16) = 'ab'

6.7.1.7. COS
Command synt ax :

double cos(double number)


decimal cos(decimal number)

Purpose : It is used t o ret urn t he cosine of a number. T he input must be a radian value.

Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed t o a
value of t he double t ype. For all ot her input t ypes, an error is ret urned.

Ret urned value : double or decimal t ype. If t he input is NULL, NULL is ret urned.

Example :

cos(3.1415926/2) = 2.6794896585028633e-8
cos(3.1415926) = -0.9999999999999986

6.7.1.8. COSH

> Document Version: 20220928 154


User Guide· MaxComput e SQL MaxComput e

Command synt ax :

double cosh(double number)


decimal cosh(decimal number)

Purpose : It is used t o ret urn t he hyperbolic cosine of a number.

Descript ion:

number: double or decimal. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed t o a value
of t he double t ype. For all ot her input t ypes, an error is ret urned.

Ret urned value : double or decimal. If t he input is NULL, NULL is ret urned.

6.7.1.9. COT
Funct ion declarat ion:

double cot(double number)


decimal cot(decimal number)

Purpose : It is used t o ret urn t he cot angent of a number. T he input must be a radian value.
Descript ion:

number: double or decimal. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed a value of
t he double t ype. For all ot her input t ypes, an error is ret urned.

Ret urned value : double or decimal t ype. If t he input is NULL, NULL is ret urned.

6.7.1.10. EXP
Funct ion declarat ion:

double exp(double number)


decimal exp(decimal number)

Purpose : It is used t o ret urn t he exponent value of number.

Descript ion:

number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.

6.7.1.11. FLOOR
Funct ion declarat ion:

bigint floor(double number)


bigint floor(decimal number)

Purpose : It is used t o ret urn t he round-down int eger t hat is less t han or equal t o number.

Descript ion:

155 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : bigint t ype. If number is NULL, NULL is ret urned.

Example :

floor(1.2) = 1
floor(1.9) = 1
floor(0.1) = 0
floor(-1.2) = -2
floor(-0.1) = -1
floor(0.0) = 0
floor(-0.0) = 0

6.7.1.12. LN
Funct ion declarat ion:

double ln(double number)


decimal ln(decimal number)

Purpose : It is used t o ret urn t he nat ural logarit hm of a number.

Descript ion:

number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : double or decimal t ype. If t he input is NULL, negat ive, or zero, NULL is ret urned.

6.7.1.13. LOG
Funct ion declarat ion:

double log(double base, double x)


decimal log(decimal base, DECIMAL x)

Purpose : It is used t o ret urn t he logarit hm of x t o base.


Descript ion:

base: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
x: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : logarit hm value of t he double or decimal t ype. If eit her base or x is NULL, negat ive, or
zero, NULL is ret urned. If base is 1 (which leads t o division by zero), NULL is ret urned.

6.7.1.14. POW
Command synt ax :

> Document Version: 20220928 156


User Guide· MaxComput e SQL MaxComput e

double pow(double x, double y)


decimal pow(decimal x, decimal y)

Purpose : It is used t o ret urn t he yt h power of x, t hat is, x^y.

Descript ion:

x: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
y: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : double or decimal t ype. If x or y is NULL, NULL is ret urned.

6.7.1.15. RAND
Command synt ax :

double rand(bigint seed)

Purpose : It is used t o ret urn a random number of t he double t ype from 0 t o 1 based on t he seed.

Descript ion:

Seed: opt ional, bigint t ype. It is t he seed of a random number, and det ermines t he st art value of t he
random number sequence.

Ret urned value : double t ype.

Example :

select rand() from dual;


select rand(1) from dual;

6.7.1.16. ROUND
Funct ion declarat ion:

double round(double number, [bigint decimal_places])


decimal round(decimal number, [bigint decimal_places])

Purpose : It is used t o ret urn a number rounded t o t he specified decimal place.

Descript ion:

number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed
int o a value of t he double t ype before t his comput at ion. If t he input is of anot her t ype, an error is
ret urned.
decimal_place: a const ant of t he bigint t ype. It indicat es t he specified decimal place t o which t he
result is t o be rounded off. For all ot her input t ypes, an error is ret urned. If it is omit t ed, t he number is
rounded t o t he ones place. T he default value is 0.
Ret urned value : double or decimal t ype. If number or decimal_places is NULL, NULL is ret urned.

157 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e decimal_places can be negat ive. Negat ive numbers are count ed from t he decimal point
t o left and t he decimal part is left out ; if t he value of decimal_places is great er t han t he lengt h of
t he int eger part , 0 is ret urned.

Example :

round(125.315) = 125.0
round(125.315, 0) = 125.0
Round (125.315, 1) = 125.3
round(125.315, 2) = 125.32
round(125.315, 3) = 125.315
round(-125.315, 2) = -125.32
round(123.345, -2) = 100.0
round(null) = null
round(123.345, 4) = 123.345
round(123.345, -4) = 0.0

6.7.1.17. SIN
Funct ion declarat ion:

double sin(double number)


decimal sin(decimal number)

Purpose : It is used t o ret urn t he sine of a number. T he input must be a radian value.

Descript ion:

number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.

6.7.1.18. SINH
Funct ion declarat ion:

double sinh(double number)


decimal sinh(decimal number)

Purpose : It is used t o ret urn t he hyperbolic sine of a number.

Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.

6.7.1.19. SQRT
Funct ion declarat ion:

> Document Version: 20220928 158


User Guide· MaxComput e SQL MaxComput e

double sqrt(double number)


decimal sqrt(decimal number)

Purpose : It is used t o ret urn t he square root of a number.

Descript ion:

number: double or decimal t ype. It must be great er t han 0. If it is less t han 0, an error is ret urned. If t he
input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a value of t he double t ype before t his
comput at ion. For all ot her t ypes of input s, an error is ret urned.

Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.

6.7.1.20. TAN
Funct ion declarat ion:

double tan(double number)


decimal tan(decimal number)

Purpose : It is used t o ret urn t he t angent of a number. T he input must be a radian value.

Descript ion:

number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned.

Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.

6.7.1.21. TANH
Funct ion declarat ion:

double tanh(double number)


decimal tanh(decimal number)

Purpose : It is used t o ret urn t he hyperbolic t angent of a number.

Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned.

Ret urned value : double or decimal t ype. If number is NULL, NULL is ret urned.

6.7.1.22. TRUNC
Funct ion declarat ion:

double trunc(double number[, bigint decimal_places])


decimal trunc(decimal number[, bigint decimal_places])

Purpose : It is used t o t runcat e 'number' t o t he specified decimal place.

Descript ion:

number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed

159 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

int o a value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is
ret urned.
decimal_places: a const ant of t he bigint t ype. It indicat es t he decimal place t o which a number is t o
be t runcat ed. Numbers of ot her t ypes are implicit ly convert ed int o values of t he bigint t ype. If it is
omit t ed, t he result is t runcat ed t o t he ones place by default .

Ret urned value : double or decimal t ype. If number or decimal_places is NULL, NULL is ret urned.

Not e
T he t runcat ed part is supplement ed wit h 0.
decimal_places can be negat ive. Negat ive numbers are t runcat ed from t he decimal point t o
t he left and t he decimal part is left out . If t he value of decimal_places is great er t han t he
lengt h of t he int eger part , 0 is ret urned.

Example :

trunc(125.815) = 125.0
trunc(125.815, 0) =125.0
trunc(125.815, 1) = 125.80000000000001
trunc(125.815, 2) = 125.81
trunc(125.815, 3) = 125.815
trunc(-125.815, 2) = -125.81
trunc(125.815, -1) = 120.0
trunc(125.815, -2) = 100.0
trunc(125.815, -3) = 0.0
trunc(123.345, 4) = 123.345
trunc(123.345, -4) = 0.0

6.7.1.23. Additional mathematical functions


MaxComput e 2.0 provides addit ional mat hemat ical funct ions. You must add t he following SET
st at ement before SQL st at ement s cont ained in t he UNHEX funct ion:

set odps.sql.type.system.odps2=true;

Not e You must submit and execut e t he SET st at ement and t he SQL st at ement s of t he new
funct ions simult aneously.

T he mat hemat ical funct ions described in subsequent t opics are new in MaxComput e 2.0.

6.7.1.24. LOG2
Funct ion declarat ion:

Double log2(DOUBLE number)


Double log2(DECIMAL number)

Purpose : It is used t o ret urn t he logarit hm of number t o base 2.

Descript ion:

> Document Version: 20220928 160


User Guide· MaxComput e SQL MaxComput e

number: double or decimal t ype.


Ret urned value : double t ype. If t he input is 0 or NULL, NULL is ret urned.

Example :

log2(null) = null
log2(0) = null
log2(8) = 3.0

6.7.1.25. LOG10
Funct ion declarat ion:

Double log10(Double number)


Double log10(Decimal number)

Purpose : It is used t o ret urn t he logarit hm of number t o base 10.

Descript ion:
number: double or decimal t ype.

Ret urned value : double t ype. If t he input is 0 or NULL, NULL is ret urned.

Example :

log10(null) = null
log10(0) = null
log10(8) = 0.9030899869919435
log10('abc') = null

6.7.1.26. BIN
Command synt ax :

string bin(bigint number)

Purpose : It is used t o ret urn t he binary format of a number.


Descript ion:

number: bigint .

Ret urned value : st ring t ype. If t he input is 0, 0 is ret urned. If t he input is NULL, NULL is ret urned.

Example :

bin(0) = '0'
bin(null) = 'null'
bin(12) = '1100'

6.7.1.27. HEX
Funct ion declarat ion:

161 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

STRING hex(BIGINT number)


STRING hex(STRING number)
STRING hex(BINARY number)

Purpose : It is used t o convert an int eger or charact er int o hexadecimal format .

Descript ion:

number: If t his value is of t he bigint t ype, t he hexadecimal format of t he number is ret urned. If t his
value is of t he st ring t ype, t he hexadecimal value of t he st ring is ret urned.

Ret urned value : st ring t ype. If t he input is 0, 0 is ret urned. If t he input is NULL, NULL is ret urned.

Example :

hex(0) = '0'
hex('abc') = '616263'
hex(17) = '11'
hex('17') = '3137'
hex(null) = 'null'

6.7.1.28. UNHEX
Funct ion declarat ion:

BINARY unhex(STRING number)

Purpose : It is used t o ret urn t he regular charact er st ring represent ed in t he hexadecimal format .

Descript ion:

number: a hexadecimal st ring.

Ret urned value : binary t ype. If t he input is 0, a failure is ret urned. If t he input is NULL, NULL is ret urned.

Example :

unhex('616263') = 'abc'
unhex(616263) = 'abc'

6.7.1.29. RADIANS
Command synt ax :

double radians(double number)

Purpose : It is used t o convert degrees int o radians.

Descript ion:

number: double t ype

Ret urned value : double t ype. If t he input is NULL, NULL is ret urned.

Example :

> Document Version: 20220928 162


User Guide· MaxComput e SQL MaxComput e

radians(90) = 1.5707963267948966
radians(0) = 0.0
radians(null) = null

6.7.1.30. DEGREES
Funct ion declarat ion:

DOUBLE degrees(DOUBLE number)


DOUBLE degrees(DECIMAL number)

Purpose : It is used t o convert radians int o degrees.

Descript ion:

number: double or decimal t ype.

Ret urned value : double t ype. If t he input is NULL, NULL is ret urned.

Example :

degrees(1.5707963267948966) = 90.0
degrees(0) = 0.0
degrees(null) = null

6.7.1.31. SIGN
Funct ion declarat ion:

DOUBLE sign(DOUBLE number)


DOUBLE sign(DECIMAL number)

Purpose : It is used t o indicat e t he sign of t he input dat a. 1.0 indicat es posit ive and -1.0 indicat es
negat ive. 0.0 indicat es 0.

Descript ion:

number: double or decimal t ype.

Ret urned value : double t ype. If t he input is 0, 0.0 is ret urned. If t he input is NULL, NULL is ret urned.
Example :

sign(-2.5) = -1.0
sign(2.5) = 1.0
sign(0) = 0.0
sign(null) = null

6.7.1.32. E
Funct ion declarat ion:

DOUBLE e()

163 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Purpose : It is used t o ret urn t he value of e (Euler's number).

Ret urned value : double t ype.

Example :

e() = 2.718281828459045

6.7.1.33. PI
Funct ion declarat ion:

DOUBLE pi()

Purpose : It is used t o ret urn t he value of π.

Ret urned value : double t ype.

Example :

pi() = 3.141592653589793

6.7.1.34. FACTORIAL
Funct ion declarat ion:

BIGINT factorial(INT number)

Purpose : It is used t o ret urn t he fact orial of number.

Descript ion:

number: int t ype. Value range: 0 t o 20.

Ret urned value : bigint t ype. If t he input is 0, 1 is ret urned. If t he input is NULL or any value out side t he
range of 0 t o 20, NULL is ret urned.

Example :

factorial(5) = 120 --5! = 5*4*3*2*1 = 120

6.7.1.35. CBRT
Command synt ax :

double cbrt(double number)

Purpose : It is used t o ret urn t he cube root of a number.

Descript ion:

number: double t ype.

Ret urned value : double t ype. If t he input is NULL, NULL is ret urned.

> Document Version: 20220928 164


User Guide· MaxComput e SQL MaxComput e

Example :

cbrt(8) = 2
cbrt(null) = null

6.7.1.36. SHIFTLEFT
Funct ion declarat ion:

INT shiftleft(TINYINT|SMALLINT|INT number1, INT number2)


BIGINT shiftleft(BIGINT number1, INT number2)

Purpose : It is used t o shift left a value by a given number of places (<<).

Descript ion:

number1: an int eger of t he t inyint , smallint , int , or bigint t ype.


number2: an int eger of t he int t ype.

Ret urned value : int or bigint t ype.

Example :

shiftleft(1,2) = 4
-- Shift left the binary value of 1 by two places (1<<2, 0001 changed to 0100)
shiftleft(4,3) = 32
-- Shift left the binary value of 4 by three places (4<<3, 0100 changed to 100000)

6.7.1.37. SHIFTRIGHT
Funct ion declarat ion:

INT shiftright(TINYINT|SMALLINT|INT number1, INT number2)


BIGINT shiftright(BIGINT number1, INT number2)

Purpose : It is used t o shift right a value by a given number of places (>>).

Descript ion:

number1: an int eger of t he t inyint , smallint , int , or bigint t ype.


number2: an int eger of t he int t ype.

Ret urned value : int or bigint t ype.

Example :

shiftright(4,2) = 1
-- Shift right the unsigned binary value of 4 by two places (4>>2, 0100 changed to 0001)
shiftright(32,3) = 4
-- Shift right the unsigned binary value of 32 by two places (32>>3, 100000 changed to 0100
)

165 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.7.1.38. SHIFTRIGHTUNSIGNED
Funct ion declarat ion:

INT shiftrightunsigned(TINYINT|SMALLINT|INT number1, INT number2)


BIGINT shiftrightunsigned(BIGINT number1, INT number2)

Purpose : It is used t o shift right an unsigned value by a given number of places (>>>).
Descript ion:

number1: an int eger of t he t inyint , smallint , int , or bigint t ype.


number2: an int eger of t he int t ype.

Ret urned value : int or bigint t ype.

Example :

shiftrightunsigned(8,2) = 2
-- In this example, shift right the unsigned binary value of 8 (1000 in binary) by two plac
es and return 2 (0010 in binary).
shiftrightunsigned(-14,2) = 1073741820
-- Shift right the unsigned binary value of -14 by two places (-14>>>2, 11111111 11111111 1
1111111 11110010 changed to 00111111 11111111 11111111 11111100)

6.7.2. String processing functions


6.7.2.1. CHAR_MATCHCOUNT
Command synt ax :

bigint char_matchcount(string str1, string str2)

Purpose : It is used t o ret urn t he number of charact ers in st r1 t hat appear in st r2 (repeat ed charact ers
are not count ed).

Descript ion:

st r1 and st r2: st ring t ype. Bot h must be valid UT F-8 st rings. If invalid charact ers are found during
mat ching, a negat ive value is ret urned.

Ret urned value : bigint t ype. If any input is NULL, NULL is ret urned.

Example :

char_matchcount('abd', 'aabc') = 2
-- The a and b characters in str1 appear in str2.

6.7.2.2. CHR
Command synt ax :

string chr(bigint ascii)

> Document Version: 20220928 166


User Guide· MaxComput e SQL MaxComput e

Purpose : It is used t o convert an ASCII code int o t he corresponding charact er.

Descript ion:

ascii: ASCII value of t he bigint t ype. If t he input is of t he st ring, double, or decimal t ype, it is implicit ly
convert ed int o a value of t he bigint t ype before t his comput at ion. If t he input is of anot her t ype, an
error is ret urned.

Ret urned value : st ring t ype. T he paramet er value range is from 0 t o 255. A value out of range will
cause an error. If t he input is NULL, NULL is ret urned.

6.7.2.3. CONCAT
Command synt ax :

string concat(string a, string b...)

Purpose : It is used t o join input st rings int o a single st ring.


Descript ion:

a, b...: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype. For all ot her input t ypes, an error is ret urned.

Ret urned value : st ring t ype. If t here is no input or if any input is NULL, NULL is ret urned.

Example :

concat('ab', 'c') = 'abc'


concat() = null
concat('a', null, 'b') = null

6.7.2.4. INSTR
Funct ion declarat ion:

bigint instr(string str1, string str2[, bigint start_position[, bigint nth_appearance]])

Purpose : It is used t o calculat e t he posit ion of subst ring st r2 in st ring st r1.

Descript ion:

st r1: st ring t ype. It indicat es a st ring t o be searched. If t he input is of t he bigint , decimal, double, or
dat et ime t ype, it is implicit ly convert ed int o a value of t he st ring t ype before t his comput at ion. For all
ot her input t ypes, an error is ret urned.
st r2: st ring t ype. It indicat es a subst ring t o be searched out . If t he input is of t he bigint , decimal,
double, or dat et ime t ype, it is implicit ly convert ed int o a value of t he st ring t ype before t his
comput at ion. For all ot her input t ypes, an error is ret urned.
st art _posit ion: bigint t ype. If it is of anot her t ype, an error is ret urned. It indicat es which charact er in
st r1 t he search will st art wit h. T he default st art posit ion is t he first charact er, marked as 1.
nt h_appearance: bigint t ype. If it is great er t han 0, it indicat es t he posit ion where t he subst ring
mat ches t he st ring for t he nt h_appearance t ime. If it is of anot her t ype or if it is less t han or equal t o
0, an error is ret urned.

Ret urned value : bigint t ype.

167 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e
If st r2 is not found in st r1, 0 is ret urned.
If any input is NULL, NULL is ret urned.
If st r2 is NULL, t he mat ching will always be successful. T herefore, 1 is ret urned for inst r('abc',
'').

Example :

instr('Tech on the net', 'e') = 2


instr('Tech on the net', 'e', 1, 1) = 2
instr('Tech on the net', 'e', 1, 2) = 11
instr('Tech on the net', 'e', 1, 3) = 14

6.7.2.5. IS_ENCODING
Funct ion declarat ion:

boolean is_encoding(string str, string from_encoding, string to_encoding)

Purpose : It is used t o det ermine whet her an input st ring can be convert ed from a specified charact er
set (from_encoding) t o anot her charact er set (t o_encoding). It can be used t o det ermine whet her t he
input is garbled. from_encoding is usually set t o ut f-8, and t o_encoding is set t o gbk.

Descript ion:

st r: st ring t ype. If t he input is NULL, NULL is ret urned. Null is considered t o belong t o any charact er set .
from_encoding, t o_encoding: st ring t ype. T hey indicat e t he source and t he dest inat ion charact er
set s respect ively. If t he input is NULL, NULL is ret urned.

Ret urned value : boolean t ype. If a st ring is convert ed successfully, t rue is ret urned. Ot herwise, false is
ret urned.

Example :

is_encoding('test', 'utf-8', 'gbk') = true


is_encoding('test', 'utf-8', 'gbk') = true
-- These two traditional Chinese characters are in GBK stock in China.
is_encoding('test', 'utf-8', 'gb2312') = false
-- The grapheme inventory of 'GB2312' does not contain these two Chinese characters.

6.7.2.6. KEYVALUE
Funct ion declarat ion:

KEYVALUE(STRING srcStr, STRING split1, STRING split2, STRING key)


KEYVALUE(STRING srcStr, STRING key) //split1 = ";", split2 = ":"

Purpose : It is used t o split t he source st ring int o key-value pairs by split 1, separat e key-value pairs by
split 2, and ret urn t he value of t he corresponding key.

> Document Version: 20220928 168


User Guide· MaxComput e SQL MaxComput e

Descript ion:
srcSt r: t he source st ring t o be split .
key: st ring t ype. Aft er t he source st ring is split by 'split 1' and 'split 2', ret urn t he corresponding value
according t o t he specificat ion of t he 'key' value.
split 1 and split 2: st rings used as separat ors. T he source st ring is split by t he t wo separat ors. If t hese
t wo paramet ers are not specified in t he expression, split 1 is a semicolon (;) and split 2 is a colon (:) by
default . If a st ring t hat has been split by split 1 has mult iple split 2 values, t he ret urned result is
undefined.

Ret urned value : st ring t ype.

If 'split 1' or 'split 2' is NULL, ret urn NULL.


If 'scrSt r' and 'key' are NULL or if t here is no mat ched 'key', ret urn NULL.
If mult iple 'key-value' mat ches, ret urn t he value corresponding t o t he first mat ched key.

Example :

keyvalue('0:1\;1:2', 1) = '2'
-- The source string is "0:1\;1:2". Because split1 and split2 are not specified, split1 is
a semicolon (;) and split2 is a colon (:) by default. After split1 split, the key-value pai
r is:
0:1\,1:2
After split2 split, it becomes:
0 1/
1 2
Returns the value(2) of the key corresponding to 1.
keyvalue("\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:s
hoes\;instPayAmount:0\;", "\;",":","tf") = "21910"
-- The source string is "\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\
;pf:0\;market:shoes\;instPayAmount:0\;". After the source string is split by split1 "\;", t
he key-value pairs are as follows:
decreaseStore:1, xcard:1, isB2C:1, tf:21910, cart:1, shipping:2, pf:0, market:shoes, instPa
yAmount:0
If split2 is ":", after split it becomes:
decreaseStore 1
xcard 1
isB2C 1
tf 21910
cart 1
shipping 2
pf 0
market shoes
instPayAmount 0
For the key parameter whose value is "tf", the returned value of the corresponding value pa
rameter is 21910.

6.7.2.7. LENGTH
Funct ion declarat ion:

bigint length(string str)

169 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Purpose : It is used t o ret urn t he lengt h of a st ring.

Descript ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : bigint t ype. If a st ring is NULL, NULL is ret urned. If a st ring is not UT F-8 encoded, -1 is
ret urned.

Example :

length('hi! China') = 6

6.7.2.8. LENGTHB
Funct ion declarat ion:

bigint lengthb(string str)

Purpose : It is used t o ret urn t he lengt h of a st ring. Unit : byt e.

Descript ion:

st r: st ring t ype. If t he input is of t he bigint , double, decimal, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. If t he input is of anot her t ype, an error is
ret urned.
Ret urned value : bigint t ype. If t he input is NULL, NULL is ret urned.

Example :

lengthb('hi! china') = 10

6.7.2.9. MD5
Funct ion declarat ion:

string md5(string value)

Purpose : It is used t o calculat e t he MD5 value of t he input st ring value.

Descript ion:
value: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype before t his comput at ion. If t he input is of anot her t ype, an
error is ret urned.

Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

6.7.2.10. PARSE_URL
Funct ion declarat ion:

STRING PARSE_URL(STRING url, STRING part[,STRING key])

> Document Version: 20220928 170


User Guide· MaxComput e SQL MaxComput e

Purpose : It is used t o parse a URL and ext ract informat ion by key.

Descript ion:

If URL or part is NULL, NULL is ret urned. If URL is invalid, an error is ret urned.
part : st ring t ype. It support s HOST , PAT H, QUERY, REF, PROT OCOL, AUT HORIT Y, FILE, and USERINFO,
and is case insensit ive. If it is none of t he preceding values, an error is ret urned.
If part is QUERY, t he value in query st ring t hat corresponds t o t he key value is ext ract ed. Ot herwise,
t he paramet er key is ignored.

Ret urned value : st ring t ype.

Example :

url = file://username:[email protected]:8042/over/there/index.dtb? type=animal&name=narw


hal#nose
parse_url('url', 'HOST') = "example.com"
parse_url('url', 'PATH') = "/over/there/index.dtb"
parse_url('url', 'QUERY') = "type=animal&name=narwhal"
parse_url('url', 'QUERY', 'name') = "narwhal"
parse_url('url', 'REF') = "nose"
parse_url('url', 'PROTOCOL') = "file"
parse_url('url', 'AUTHORITY') = "username:[email protected]:8042"
parse_url('url', 'FILE') = "/over/there/index.dtb? type=animal&name=narwhal"
parse_url('url', 'USERINFO') = "username:password"

6.7.2.11. REGEXP_EXTRACT
Command synt ax :

string regexp_extract(string source, string pattern[, bigint occurrence])

Purpose : It is used t o ret urn part of t he source st ring t hat mat ches t he regular expression and t he
occurrence of t he mat ches.

Descript ion:

source: st ring t ype. It indicat es a st ring t o be searched.


pat t ern: st ring t ype. If pat t ern is NULL or if t here is no specified group in pat t ern, an error is ret urned.
occurrence: bigint t ype. It must be a number t hat is great er t han or equal t o 0. Ot herwise, an error is
ret urned. T he default value is 1 if it is not specified. If it is 0, a subst ring which meet s all pat t ern
requirement s is ret urned.

Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned.

Example :

171 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

regexp_extract('foothebar', 'foo(. *?)( bar)', 1) = the


regexp_extract('foothebar', 'foo(. *?)( bar)', 2) = bar
regexp_extract('foothebar', 'foo(. *?)( bar)', 0) = foothebar
regexp_extract('8d99d8', '8d(\\d+)d8') = 99
-- If the regular expression is submitted at the MaxCompute client, two backslashes (\) are
needed to be used as the escape character.
regexp_extract('foothebar', 'foothebar')
-- An error is returned because no part is specified in the pattern.

6.7.2.12. REGEXP_INSTR
Funct ion declarat ion:

bigint regexp_instr(string source, string pattern[,bigint start_position[, bigint nth_occur


rence[, bigint return_option]])

Purpose : It is used t o ret urn t he st art or end posit ion of t he subst ring t hat mat ches t he pat t ern in t he
source st ring from st art _posit ion for t he nt h_occurrence t ime.

Descript ion:

source: st ring t ype. It indicat es a st ring t o be searched.


pat t ern: a const ant of t he st ring t ype. If pat t ern is null, an error is ret urned.
st art _posit ion: a const ant of 'bigint ' t ype. It is t he st art posit ion for t he search. When it is not
specified, it is 1 by default . If it is of anot her t ype or less t han or equal t o 0, an error is ret urned.
nt h_occurrence: a const ant of t he bigint t ype. When it is not specified, it is 1 by default , indicat ing
t he posit ion where a subst ring mat ches pat t ern in search for t he first t ime. If it is of anot her t ype or if
it is less t han or equal t o 0, an error is ret urned.
ret urn_opt ion: a const ant of t he bigint t ype. T he value is eit her 0 or 1. If it is of anot her t ype or t he
value is not support ed, an error is ret urned. 0 indicat es t hat t he st art posit ion of t he mat ched
subst ring is ret urned, and 1 indicat es t hat t he end posit ion of t he mat ched subst ring is ret urned.

Ret urned value : bigint t ype. It is t he st art or end posit ion of t he mat ched subst ring in source st ring
according t o t he t ype specified by ret urn_opt ion. If any input is NULL, NULL is ret urned.

Example :

regexp_instr("i love www.taobao.com", "o[[:alpha:]]{1}", 3, 2) = 14

6.7.2.13. REGEXP_SUBSTR
Funct ion declarat ion:

string regexp_substr(string source, string pattern[, bigint start_position[, bigint nth_occ


urrence]])

Purpose : It is used t o ret urn t he st ring t hat mat ches pat t ern in t he source st ring from posit ion
st art _posit ion for t he nt h_occurence t ime.

Descript ion:

source: st ring t ype. It indicat es a st ring t o be searched.

> Document Version: 20220928 172


User Guide· MaxComput e SQL MaxComput e

pat t ern: a const ant of t he st ring t ype. It indicat es a pat t ern t o be mat ched. If pat t ern is null, an error
is ret urned.
st art _posit ion: a const ant of t he bigint t ype. It must be great er t han 0. If it is anot her t ype or if it is
less t han or equal t o 0, an error is report ed. When it is not specified, it is regarded as 1 by default , so
t he mat ching st art s from t he first charact er of 'source'.
nt h_occurrence: a const ant of t he bigint t ype. It must be great er t han 0. If it is anot her t ype or is less
t han or equal t o 0, an error is ret urned. If it is not specified, it is regarded as 1 by default , indicat ing
t hat t he st ring in t he first mat ch is ret urned.

Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned. If t here is no mat ching, NULL is
ret urned.

Example :

regexp_substr ("I love aliyun very much", "a[[:alpha:]]{5}") = "aliyun"


regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 1) = " have"
regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 2) = "2"

6.7.2.14. REGEXP_COUNT
Command synt ax :

bigint regexp_count(string source, string pattern[, bigint start_position])

Purpose : It is used t o ret urn t he number of occurrences t hat a st ring pat t ern appears in t he source
st ring, st art ing from st art _posit ion.

Descript ion:

source: st ring t ype. It indicat es a st ring t o be searched. For all ot her input t ypes, an error is ret urned.
pat t ern: st ring t ype. It indicat es a pat t ern t o be mat ched. If t he pat t ern is NULL or of anot her t ype,
an error is ret urned.
st art _posit ion: bigint st art _posit ion must be a number t hat is great er t han 0. Ot herwise, an error is
ret urned. If st art _posit ion is not specified, t he default value is 1 which means st art ing from t he first
charact er of t he source st ring.

Ret urned value : bigint t ype. If any input is NULL, NULL is ret urned. If t here is no mat ching, 0 is ret urned.

Example :

regexp_count('abababc', 'a.c') = 1
regexp_count('abcde', '[[:alpha:]]{2}', 3) = 1

6.7.2.15. SPLIT_PART
Funct ion declarat ion:

string split_part(string str, string delimiter, bigint start[, bigint end])

Purpose : It is used t o split a st ring wit h t he specified delimit er, and ret urn t he st ring bet ween t he
specified st art segment and end segment (inclusive).

173 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Descript ion:

st r: st ring t ype. It indicat es a st ring t o be split . If t he input is of t he bigint , decimal, double, or


dat et ime t ype, it is implicit ly convert ed int o a value of t he st ring t ype before t his comput at ion. If t he
input is of any ot her t ype, an error is ret urned.
delimit er: a const ant of t he st ring t ype. It indicat es t he delimit er used t o split a st ring. It can be a
charact er or a st ring. If it is neit her a charact er nor a st ring, an error is ret urned.
st art : a const ant of t he bigint t ype. It must be great er t han 0. If it is not a const ant or is of a
different t ype, an error is ret urned. It indicat es t he st art number (st art ing from 1) of t he segment t o
be ret urned. If end is not specified, t he segment specified by st art is ret urned.
end: a const ant of t he bigint t ype. It must be great er t han or equal t o t he value of st art ; ot herwise,
an error is ret urned. It indicat es t he end number of t he segment t o be ret urned. If it is not a const ant
or is of a different t ype, an error is ret urned. If end is not specified, t he last segment is ret urned.

Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned. If delimit er is NULL, t he original st ring
is ret urned.

Not e
If delimit er does not exist in st r, and st art is set t o 1, t he ent ire st r is ret urned. If t he input is
NULL, NULL is ret urned.
If st art is set t o a value great er t han t he number of segment s (for example, t he st ring has 6
segment s but t he st art value is great er t han 6), NULL is ret urned.
If end is set t o a value great er t han t he number of segment s, t he st ring bet ween st art and
t he last segment is ret urned.

Example :

split_part('a,b,c,d', ',', 1) = 'a'


split_part('a,b,c,d', ',', 1, 2) = 'a,b'
split_part('a,b,c,d', ',', 10) = ''

6.7.2.16. REGEXP_REPLACE
Funct ion declarat ion:

string regexp_replace(string source, string pattern, string replace_string[, bigint occurre


nce])

Purpose : It is used t o search a source st ring for subst rings t hat mat ch a given pat t ern, replace t hem
wit h t he specified replace_st ring, and ret urn t he result .

Descript ion:

source: st ring t ype. It indicat es a st ring t o be replaced.


pat t ern: a const ant of t he st ring t ype. It indicat es a pat t ern t o be mat ched. If pat t ern is null, an error
is ret urned.
replace_st ring: st ring t ype. It is used t o replace t he mat ched pat t ern.
occurrence: a const ant of t he bigint t ype. It must be great er t han or equal t o 0. T his paramet er
indicat es t he number of t imes at which t he subst ring mat ches t he pat t ern for replacement wit h
replace_st ring. If t he input value is 0, all mat ched subst rings are replaced. If it is of anot her t ype or

> Document Version: 20220928 174


User Guide· MaxComput e SQL MaxComput e

less t han 0, an error is ret urned. It can be omit t ed. T he default value is 0.

Ret urned value : st ring t ype. When t he referenced group does not exist , t he replace operat ion is not
performed. When t he input paramet ers source, pat t ern, and occurrence are NULL, NULL is ret urned. If
replace_st ring is NULL and t he pat t ern is mat ched, NULL is ret urned. If replace_st ring is NULL but t he
pat t ern is not mat ched, t he original st ring is ret urned.

Not e When t he referenced group does not exist , t he act ion is not defined.

Example :

regexp_replace("123.456.7890", "([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4})", "(


\\1)\\2-\\3", 0) = "(123)456-7890"
regexp_replace("abcd", "(.)", "\\1 ", 0) = "a b c d "
regexp_replace("abcd", "(.)", "\\1 ", 1) = "a bcd"
regexp_replace("abcd", "(.)", "\\2", 1) = "abcd"
-- Only a group is defined in pattern and the referenced second group is not existent.
-- Please avoid this. The result to reference nonexistent group is not defined.
regexp_replace("abcd", "(. *)(.)$", "\\2", 0) = "d"
regexp_replace("abcd", "a", "\\1", 0) = "bcd"
-- No group definition is in pattern, so '\1' references a nonexistent group.
-- Try to avoid this. The result of referencing a nonexistent group is not defined.

6.7.2.17. SUBSTR
Funct ion declarat ion:

string substr(string str, bigint start_position[, bigint length])

Purpose : It is used t o ret urn a subst ring of 'lengt h' from 'st r' st art ing from 'st art _posit ion'.
Descript ion:

st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype before t his comput at ion. For all ot her t ypes of input s, an
error is ret urned.
st art _posit ion: bigint t ype. T he st art posit ion is 1. If st art _posit ion is a negat ive value, t he count ing
st art s from t he end t o t he st art of t he st ring and t he last charact er is –1. If t he input is of anot her
t ype, an error is ret urned.
lengt h: bigint t ype. It indicat es t he lengt h of t he subst ring, which is great er t han 0. If it is of anot her
t ype or less t han or equal t o 0, an error is ret urned.
Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned.

Not e If t he lengt h is omit t ed, t he subst ring from st art t o end is ret urned.

Example :

substr("abc", 2) = "bc"
substr("abc", 2, 1) = "b"
substr("abc",-2,2) = "bc"
substr("abc",-3) = "abc"

175 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.7.2.18. TOLOWER
Funct ion declarat ion:

string tolower(string source)

Purpose : It is used t o convert 'source' int o a lowercase st ring and ret urn t he value.

Descript ion:

source: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is
ret urned.

Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

Example :

tolower("aBcd") = "abcd"
tolower("Haha Cd") = "haha cd"

6.7.2.19. TOUPPER
Funct ion declarat ion:

string toupper(string source)

Purpose : It is used t o convert 'source' int o an uppercase st ring and ret urn t he value.

Descript ion:

source: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype before t his comput at ion. For all ot her t ypes of input s, an error
is ret urned.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

Example :

toupper("aBcd") = "ABCD"
toupper("HahaCd") = "HAHACD"

6.7.2.20. TO_CHAR
Funct ion declarat ion:

string to_char(boolean value)


string to_char(bigint value)
string to_char(double value)
string to_char(decimal value)

Purpose : It is used t o convert t he input of t he boolean, bigint , decimal, or double t ype int o a value of
t he st ring t ype.

> Document Version: 20220928 176


User Guide· MaxComput e SQL MaxComput e

Descript ion:
value: boolean, bigint , or double t ype. For all ot her t ypes of input s, an error is ret urned. For more
informat ion about t he format t ed out put of dat a of t he dat et ime t ype, see Dat e processing funct ions
— T O_CHAR.

Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

Example :

to_char(123) = '123'
to_char(true) = 'TRUE'
to_char(1.23) = '1.23'
to_char(null) = 'null'

6.7.2.21. TRIM
Funct ion declarat ion:

string trim(string str)

Purpose : It is used t o remove t he spaces from bot h ends of 'st r'.

Descript ion:

st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her t ypes of input s, an error is
ret urned.

Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

6.7.2.22. LTRIM
Funct ion declarat ion:

string ltrim(string str)

Purpose : It is used t o remove t he left spaces for input st ring st r.


Descript ion:

st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

Example :

select ltrim(' abc ') from dual;


-- Returned result:
+-----+
| _c0 |
+-----+
| abc |
+-----+

177 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.7.2.23. RTRIM
Funct ion declarat ion:

string rtrim(string str)

Purpose : It is used t o remove t he right most spaces from t he input st ring 'st r'.

Descript ion:

st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

Example :

select rtrim('a abc ') from dual;


-- Returned result:
+-----+
| _c0 |
+-----+
| a abc |
+-----+

6.7.2.24. REVERSE
Funct ion declarat ion:

STRING REVERSE(string str)

Purpose : It is used t o ret urn a reverse st ring.

Descript ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

Example :

select reverse('abcedfg') from dual;


-- Returned result:
+-----+
| _c0 |
+-----+
| gfdecba |
+-----+

6.7.2.25. SPACE
Funct ion declarat ion:

> Document Version: 20220928 178


User Guide· MaxComput e SQL MaxComput e

STRING SPACE(bigint n)

Purpose : It is used t o ret urn a st ring wit h 'n' consecut ive space charact ers.

Descript ion:

n: bigint t ype. T he lengt h cannot exceed 2 MB. If t he input is NULL, an error is ret urned.
Ret urned value : st ring t ype.

Example :

select length(space(10)) from dual;


-- 10 is returned.
select space(400000000000) from dual;
-- An error is returned as the length exceeds 2 MB.

6.7.2.26. REPEAT
Funct ion declarat ion:

STRING REPEAT(string str, bigint n)

Purpose : It is used t o ret urn st ring 'st r' t hat has been repeat ed n t imes.

Descript ion:
st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error
is ret urned.

n: bigint t ype. T he lengt h cannot exceed 2 MB. If it is NULL, an error is ret urned.

Ret urned value : st ring t ype.

Example :

select repeat('abc',5) from lxw_dual;


-- abcabcabcabcabc is returned.

6.7.2.27. ASCII
Funct ion declarat ion:

Bigint ASCII(string str)

Purpose : It is used t o ret urn t he ASCII code of t he first charact er of st ring 'st r'.

Descript ion:

st r: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed
int o a value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : bigint t ype.


Example :

179 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

select ascii('abcde') from dual;


-- 97 is returned.

6.7.2.28. URL_ENCODE
Funct ion declarat ion:

STRING URL_ENCODE(STRING input[, STRING encoding])

Purpose : It is used t o encode t he input st ring in t he applicat ion/x-www-form-urlencoded MIME


format :

a–z and A–Z remain unchanged.


".", "-", "*", and "_" remain unchanged.
Spaces are convert ed int o "+".
T he rest of t he charact ers are convert ed int o byt e values according t o t he specified encoding. If
encoding is not specified, UT F-8 is used by default . In t his case, each byt e value is represent ed in t he
%xy format , where xy represent s t he hexadecimal form of t he charact er.
Descript ion:

input : st ring t ype.


encoding: specifies an encoding format . If it is not specified, UT F-8 is used by default .

Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

Example :

url_encode('Example for url_encode:// (fdsf)') = "%E7%A4%BA%E4%BE%8Bfor+url_encode%3A%2F%2


F+%28fdsf%29"
url_encode('Example for url_encode :// dsf(fasfs)', 'GBK') = "Example+for+url_encode+%3A%2F
%2F+dsf%28fasfs%29"

6.7.2.29. URL_DECODE
Funct ion declarat ion:

STRING URL_DECODE(STRING input[, STRING encoding])

Purpose : It is used t o convert an input st ring from t he applicat ion/x-www-form-urlencoded MIME


format int o a normal st ring. T his is t he inverse funct ion of URL_ENCODE:

a–z and A–Z remain unchanged.


".", "-", "*", and "_" remain unchanged.
"+" is convert ed int o a space.
T he %xy format t ed sequence is convert ed int o byt e values. Consecut ive byt e values are int erpret ed
as t he corresponding st rings based on t he input encoding.
Ot her charact ers remain unchanged.
T he final ret urned value of t he funct ion is a UT F-8 st ring.

Descript ion:

> Document Version: 20220928 180


User Guide· MaxComput e SQL MaxComput e

input : st ring t ype.


encoding: specifies an encoding format . If it is not specified, UT F-8 is used by default .
Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

Example :

url_decode('%E7%A4%BA%E4%BE%8Bfor+url_encode%3A%2F%2F+%28fdsf%29')= "Example for url_encode


:// (fdsf)"
url_decode('Exaple+for+url_encode+%3A%2F%2F+dsf%28fasfs%29', 'GBK') = "Exaple for url_encod
e :// dsf(fasfs)" ```

6.7.2.30. Additional string processing functions


MaxComput e 2.0 provides addit ional st ring processing funct ions. You must add t he following SET
st at ement before SQL st at ement s cont ained in t he LPAD, RPAD, and T RANSLAT E funct ions:

set odps.sql.type.system.odps2=true;

Not e You must submit and execut e t he SET st at ement and t he SQL st at ement s of t he new
funct ions simult aneously.

T he st ring processing funct ions described in subsequent t opics are new in MaxComput e 2.0.

6.7.2.31. CONCAT_WS
Command synt ax :

string concat_ws(string SEP, string a, string b...)

Purpose : It is used t o join input st rings st art ing from t he second wit h t he first st ring as t he separat or.
Descript ion:

SEP: delimit er of t he t ring t ype. If it is not specified, an error is ret urned.


a, b...: st ring t ype. If t he input is of t he bigint , decimal, double, or dat et ime t ype, it is implicit ly
convert ed int o a value of t he st ring t ype. For all ot her input t ypes, an error is ret urned.

Ret urned value : st ring t ype. If t here is no input or if any input is NULL, NULL is ret urned.
Example :

concat_ws(':','name','bob') = 'name:bob'
concat_ws(':','avg',null,'34')= 'null'

6.7.2.32. LPAD
Funct ion declarat ion:

string lpad(string a, int len, string b)

Purpose : It is used t o pad t he left side of st ring a wit h st ring b unt il t he new padded st ring has len bit s.

181 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Descript ion:

len: int t ype.


a, b: st ring t ype.
Ret urned value : st ring t ype. If len is less t han t he number of bit s in a, a is t runcat ed from t he left t o
obt ain a st ring wit h t he number of bit s specified by len. If len is 0, NULL is ret urned.

Example :

lpad('abcdefgh',10,'12')='12abcdefgh'
lpad('abcdefgh',5,'12')='abcde'
lpad('abcdefgh',0,'12')
-- NULL is returned.

6.7.2.33. RPAD
Funct ion declarat ion:

string rpad(string a, int len, string b)

Purpose : It is used t o pad t he right side of st ring 'a' wit h st ring 'b' unt il t he new padded st ring has 'len'
places.

Descript ion:

len: int t ype.


a, b: st ring t ype.

Ret urned value : st ring t ype. If len is smaller t han t he number of charact ers in a, a is t runcat ed from
t he left t o obt ain a st ring wit h t he number of charact ers specified by len. If len is 0, NULL is ret urned.

Example :

rpad('abcdefgh',10,'12')='abcdefgh12'
rpad('abcdefgh',5,'12')='abcde'
rpad('abcdefgh',0,'12')
-- NULL is returned.

6.7.2.34. REPLACE
Funct ion declarat ion:

string replace(string a, string OLD, string NEW)

Purpose : It is used t o replace t he part of st ring a t hat is exact ly t he same as st ring OLD wit h st ring
NEW, and ret urn st ring a.

Descript ion:

All paramet ers are of t he st ring t ype.

Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned.

Example :

> Document Version: 20220928 182


User Guide· MaxComput e SQL MaxComput e

replace('ababab','abab','12') = '12ab'
replace('ababab','cdf','123') = 'ababab'
replace('123abab456ab',null,'abab') = 'null'

6.7.2.35. SOUNDEX
Funct ion declarat ion:

string soundex(string a)

Purpose : It is used t o convert an ordinary st ring int o a soundex st ring.

Descript ion:
All paramet ers are of t he st ring t ype.

Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

Example :

soundex('hello') = 'H400'

6.7.2.36. SUBSTRING_INDEX
Funct ion declarat ion:

string substring_index(string a, string SEP, int count))

Purpose : It is used t o ret urn t he subst ring in 'a' t hat comes before t he 'count ' (nt h) delimit er ('SEP'). If
'count ' is a posit ive value, it st art s from t he left of t he st ring. If 'count ' is a negat ive value, it st art s from
t he right of t he st ring.

Descript ion:

a, SEP: st ring t ype.


count : int t ype.

Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.
Example :

substring_index('https://help.aliyun.com', '.', 2) = 'https://help.aliyun'


substring_index('https://help.aliyun.com', '.', -2) = 'aliyun.com'
substring_index('https://help.aliyun.com', null, 2) = 'null'

6.7.2.37. TRANSLATE
Funct ion declarat ion:

string translate(string|varchar str1, string|varchar str2, string|varchar str3)

Purpose : It is used t o replace st r2 in st r1 wit h st r3.

183 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Ret urned value : ST RING t ype. If any input is NULL, NULL is ret urned.

Example :

translate('MaxComputer','puter','pute')='MaxCompute'
translate('aaa','b','c')='aaa'
translate('MaxComputer','puter',null)=null

6.7.2.38. JSON_TUPLE
Funct ion declarat ion:

string json_tuple(string json,string key1,string key2,...)

Descript ion: T his funct ion ext ract s specific st rings from a st andard JSON st ring based on a set of input
keys, such as key1 and key2.

Paramet ers:

json: a value of t he ST RING t ype, which indicat es a st andard JSON st ring.


key: a value of t he ST RING t ype, which is used t o describe t he JSON pat h. You can ent er mult iple keys
at a t ime. A key cannot st art wit h a dollar sign ($).
Ret urn value : A value of t he ST RING t ype is ret urned.

Not e
If t he json paramet er is empt y or invalid, NULL is ret urned.
If t he key paramet er is empt y or invalid, NULL is ret urned. If t he key paramet er does not exist
in t he JSON st ring, it is considered invalid.
If t he json paramet er is valid and t he key paramet er exist s, t he required st ring is ret urned.
T his funct ion parses a JSON st ring t he same way as t he GET _JSON_OBJECT funct ion for which
set odps.sql.udf.getjsonobj.new=true; is added. T o parse a JSON st ring mult iple t imes,
you must call t he GET _JSON_OBJECT funct ion mult iple t imes. However, t he JSON_T UPLE
funct ion allows you t o ent er mult iple keys at a t ime and parse t he JSON st ring only once. T his
improves parsing efficiency.
JSON_T UPLE is a user-defined t able-valued funct ion (UDT F). T o select ot her columns, use
JSON_T UPLE wit h LAT ERAL VIEW.

Example :

T he school t able cont ains t he following dat a:

> Document Version: 20220928 184


User Guide· MaxComput e SQL MaxComput e

Table: school
+------------+------------+
| Id | json |
+------------+------------+
| 1 | {
"School name": "湖畔⼤学",
"Location":"杭州",
"SchoolRank": "00",
"Class1":{
"Student":[{
"studentId":1,
"scoreRankIn3Year":[1,2,[3,2,6]]
}, {
"studentId":2,
"scoreRankIn3Year":[2,3,[4,3,1]]
}]}
} |
+------------+------------+

Ext ract JSON object s.

SELECT json_tuple(school.json,"SchoolRank","Class1") AS (item0,item1) FROM school;


-- Equivalent to the following statement: SELECT get_json_object(school.json,"$.SchoolRank"
) item0,get_json_object(school.json,"$.Class1") item1 FROM school;
-- The following result is returned:
+-------+-------+
| item0 | item1 |
+-------+-------+
| 00 | {"Student":[{"studentId":1,"scoreRankIn3Year":[1,2,[3,2,6]]},{"studentId":2,"scor
eRankIn3Year":[2,3,[4,3,1]]}]} |
+-------+-------+

Parse JSON dat a t hat cont ains Chinese charact ers.

SELECT json_tuple(school.json,"School name","Location") AS (item0,item1) FROM school;


-- The following result is returned:
+-------+-------+
| item0 | item1 |
+-------+-------+
| 湖畔⼤学 | 杭州 |
+-------+-------+

Parse nest ed JSON dat a.

185 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

SELECT sc.Id, q.item0, q.item1


FROM school sc LATERAL VIEW json_tuple(sc.json,"Class1.Student.[ *].studentId","Class1.Stud
ent.[0] .scoreRankIn3Year") q AS item0,item1;
-- The following result is returned:
+------------+-------+-------+
| id | item0 | item1 |
+------------+-------+-------+
| 1 | [1,2] | [1,2,[3,2,6]] |
+------------+-------+-------+

Parse JSON dat a t hat cont ains nest ed arrays.

SELECT sc.Id, q.item0, q.item1


FROM school sc LATERAL VIEW json_tuple(sc.json,"Class1.Student[0].scoreRankIn3Year[2]","Cla
ss1.Student[0].scoreRankIn3Year[2][1]") q AS item0,item1;
-- The following result is returned:
+------------+-------+-------+
| id | item0 | item1 |
+------------+-------+-------+
| 1 | [3,2,6] | 2 |
+------------+-------+-------+

6.7.3. Date processing functions


6.7.3.1. DATEADD
Funct ion declarat ion:

datetime dateadd(datetime date, bigint delta, string datepart)

Purpose : It is used t o modify dat e based on delt a and dat epart .

Descript ion:

dat e: T his value must be a st ring t ype dat e.If t he input is of t he st ring t ype, it is implicit ly convert ed
int o a value of t he dat et ime t ype before t his comput at ion. For all ot her t ypes of input s, an error is
ret urned.
delt a: bigint t ype. It indicat es t he scope of modificat ion. If t he input is of t he st ring or double t ype, it
is implicit ly convert ed int o a value of t he bigint t ype before t his comput at ion. If t he input is of
anot her t ype, an error is ret urned. If delt a is great er t han 0, t he delt a is added t o t he value. If delt a is
less t han 0, t he delt a is subt ract ed from t he value.
dat epart : a const ant of t he st ring t ype. T his field is set based on t he st ring-dat et ime conversion
convent ion. yyyy indicat es year and mm indicat es mont h. For rules of t ype conversion, see Conversion
bet ween t he st ring t ype and dat et ime t ype. In addit ion, t he ext ended dat e format is also
support ed: year, mont h or mon, day, and hour. If t he paramet er value is not a const ant or of an
unsupport ed format or anot her t ype, an error is ret urned.

Ret urned value : dat et ime t ype. If any input is NULL, NULL is ret urned.

> Document Version: 20220928 186


User Guide· MaxComput e SQL MaxComput e

Not e
When delt a is added t o or subt ract ed from t he value, carrying and borrowing are base-10
for year, base-12 for mont h, base-24 for hour, and base-60 for minut e and second. If delt a
is measured in mont hs, t he following calculat ion is applied: If t he mont h in t he dat et ime
value does not cause t he day value t o become invalid aft er delt a is added, t he day value is
kept . Ot herwise, t he day value is adjust ed t o t he last day of t he result ing mont h.
T his field is set based on t he st ring-dat et ime conversion convent ion. yyyy indicat es t he year
and mm indicat es t he mont h. Unless ot herwise specified, all built -in funct ions relat ed t o t he
dat et ime t ype follow t his convent ion. Unless ot herwise specified, t he dat epart of all built -in
funct ions relat ed t o t he dat et ime t ype also support s t he ext ended dat e format : year,
mont h or mon, day, and hour.

Example :

If trans_date = 2017-02-28 00:00:00:


dateadd(trans_date, 1, 'dd') = 2017-03-01 00:00:00
-- Add one day. The result is beyond the last day of February. The actual value is the firs
t day of next month.
dateadd(trans_date, -1, 'dd') = 2017-02-27 00:00:00
-- Subtract one day.
dateadd(trans_date, 20, 'mm') = 2018-10-28 00:00:00
-- 20 months are added. The month overflows, and 1 is added to the year.
trans_date = 2017-02-28 00:00:00, dateadd(transdate, 1, 'mm') = 2017-03-28 00:00:00
trans_date = 2017-01-29 00:00:00, dateadd(transdate, 1, 'mm') = 2017-02-28 00:00:00
-- February has 28 days only, so the last day of the month is returned.
trans_date = 2017-03-30 00:00:00, dateadd(transdate, -1, 'mm') = 2017-02-28 00:00:00

T he values of t rans_dat e used only serve as examples. T he dat et ime examples in t his document use
simple format s. In MaxComput e SQL, a const ant cannot be of t he dat et ime t ype. T he following synt ax
is incorrect :

select dateadd(2017-03-30 00:00:00, -1, 'mm') from tbl1;

If you must use a const ant of t he dat et ime t ype, use t he following met hod:

select dateadd(cast("2017-03-30 00:00:00" as datetime), -1, 'mm') from tbl1;


-- The String type constant is converted to datatime type by explicit conversion.

6.7.3.2. DATEDIFF
Funct ion declarat ion:

bigint datediff(datetime date1, datetime date2, string datepart)

Purpose : It is used t o calculat e t he difference bet ween dat e1 and dat e2 based on t he specified
dat epart .

Descript ion:

dat e1 and dat e2: minuend and subt rahend of t he dat et ime t ype respect ively. If t he input is a st ring,

187 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

it is implicit ly convert ed int o a value of t he dat et ime t ype before t his comput at ion. For all ot her input
t ypes, an error is ret urned.
dat apart : A const ant of t he st ring t ype. It support s t he ext ended dat e format . If dat epart is not in
t he specified format or is of anot her t ype, an error is ret urned.

Ret urned value : bigint t ype. If any input is NULL, NULL is ret urned. If dat e1 is less t han dat e2, t he
ret urned value may be negat ive.

Not e T he lower unit part is t runcat ed based on 'dat epart ' in t he comput at ion process and
t hen t he result is calcualt ed.

Example :

If start = 2017-12-31 23:59:59 and end = 2018-01-01 00:00:00:


datediff(end, start, 'dd') = 1
datediff(end, start, 'mm') = 1
datediff(end, start, 'yyyy') = 1
datediff(end, start, 'hh') = 1
datediff(end, start, 'mi') = 1
datediff(end, start, 'ss') = 1
datediff('2017-05-31 13:00:00', '2017-05-31 12:30:00', 'ss') = 1800
datediff('2017-05-31 13:00:00', '2017-05-31 12:30:00', 'mi') = 30

6.7.3.3. DATEPART
Funct ion declarat ion:

bigint datepart(datetime date, string datepart)

Purpose : It is used t o ext ract t he value of t he specified dat epart in dat e.


Descript ion:

dat e: dat et ime t ype. If t he input is a st ring, it is implicit ly convert ed int o a value of t he dat et ime t ype
before t his comput at ion. For all ot her input t ypes, an error is ret urned.
dat epart : a const ant of t he st ring t ype. It support s t he ext ended dat e format . If dat epart is not in
t he specified format or is of anot her t ype, an error is ret urned.
Ret urned value : bigint t ype. If any input is NULL, NULL is ret urned.

Example :

datepart('2017-06-08 01:10:00', 'yyyy') = 2017


datepart('2017-06-08 01:10:00', 'mm') = 6

6.7.3.4. DATETRUNC
Funct ion declarat ion:

datetime datetrunc (datetime date,string datepart)

Purpose : It is used t o ret urn t he value of a dat e aft er t he specified dat epart is t runcat ed.

> Document Version: 20220928 188


User Guide· MaxComput e SQL MaxComput e

Descript ion:

dat e: dat et ime t ype. If t he input is a st ring, it is implicit ly convert ed int o a value of t he dat et ime t ype
before t his comput at ion. For all ot her input t ypes, an error is ret urned.
dat epart : a const ant of t he st ring t ype. It support s t he ext ended dat e format . If dat epart is not in
t he specified format or is of anot her t ype, an error is ret urned.

Ret urned value : dat et ime t ype. If any input is NULL, NULL is ret urned.

Example :

datetrunc('2017-12-07 16:28:46', 'yyyy') = 2017-01-01 00:00:00


datetrunc('2017-12-07 16:28:46', 'month') = 2017-12-01 00:00:00
datetrunc('2017-12-07 16:28:46', 'DD') = 2017-12-07 00:00:00

6.7.3.5. GETDATE
Funct ion declarat ion:

datetime getdate()

Purpose : It is used t o obt ain t he current syst em t ime. Use UT C+8 as t he st andard t ime of MaxComput e.

Ret urned value : t he current dat e and t ime of t he dat et ime t ype.

Not e In a MaxComput e SQL t ask (execut ed in a dist ribut ed manner), 'get dat e' always ret urns
a fixed value. T he ret urned result is any t ime in MaxComput e. T he t ime ret urned is precise t o t he
second. In lat er versions, t he t ime will be precise t o t he milisecond.

6.7.3.6. ISDATE
Funct ion declarat ion:

boolean isdate(string date, string format)

Purpose : It is used t o det ermine whet her a dat e st ring can be convert ed int o a dat e value based on
t he corresponding format st ring. If t he conversion can be performed, t rue is ret urned. Ot herwise, false is
ret urned.

Descript ion:

dat e: T his value must be a st ring t ype dat e. If t he input is of t he bigint , decimal, double, or dat et ime
t ype, it is implicit ly convert ed int o a value of t he st ring t ype before t his comput at ion. For all ot her
input t ypes, an error is ret urned.
format : a const ant of t he st ring t ype. T he ext ended dat e format is not support ed. If it is of anot her
t ype or an unsupport ed format , an error is ret urned. If t here are redundant format st rings appearing
in 'format ', t he dat e value corresponding t o t he first format st ring is used. Ot her st rings are t aken as
delimit ers. If isdat e("1234-yyyy", "yyyy-yyyy"), t rue is ret urned.

Ret urned value : boolean t ype. If any input is NULL, NULL is ret urned.

6.7.3.7. LASTDAY

189 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Funct ion declarat ion:

datetime lastday(datetime date)

Purpose : It is used t o ret urn t he last day of t he current mont h t o which t he dat e belongs. T he value is
accurat e t o day. T he hour, minut e, and second part is expressed as 00:00:00.

Descript ion:

dat e: dat et ime t ype. If t he input is a st ring, it is implicit ly convert ed int o a value of t he dat et ime t ype
before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : dat et ime t ype. If t he input is NULL, NULL is ret urned.

6.7.3.8. TO_DATE
Funct ion declarat ion:

datetime to_date(string date, string format)

Purpose : It is used t o convert t he 'dat e' st ring int o a dat e value.


Descript ion:

dat e: st ring t ype. It indicat es t he dat e value of t he st ring t ype t o be convert ed. If t he input is of t he
bigint , decimal, double, or dat et ime t ype, it is implicit ly convert ed int o a value of t he st ring t ype
before t his comput at ion. For all ot her t ypes of input s or NULL, an error is ret urned.
format : a const ant of t he st ring t ype in t he dat e format . For all ot her t ypes of input s and non-
const ant values, an error is ret urned. It does not support t he ext ended dat e format . Ot her charact ers
are ignored as invalid charact ers in parsing. T he format paramet er must cont ain yyyy. Ot herwise, an
error is ret urned. If t here are redundant format st rings in t he format , t he corresponding dat e value of
t he first format st ring is used, and t he rest are processed as separat ors. For example, t o_dat e('1234-
2234', 'yyyy-yyyy') ret urns '1234-01-01 00:00:00'.

Ret urned value : dat et ime t ype. T he format is yyyy-mm-dd hh:mi:ss. If any input is NULL, NULL is
ret urned.

Example :

to_date('Alibaba2017-12*03', 'Alibabayyyy-mm*dd') = 2017-12-03 00:00:00


to_date('20170718', 'yyyymmdd') = 2017-07-18 00:00:00
to_date('201707182030','yyyymmddhhmi')=2017-07-18 20:30:00
to_date('2017718', 'yyyymmdd')
-- Invalid format. NULL is returned.
to_date('Alibaba2017-12*3', 'Alibabayyyy-mm*dd')
-- Invalid format. NULL is returned.
to_date('2017-24-01', 'yyyy')
-- Invalid format. NULL is returned.

6.7.3.9. TO_CHAR
Funct ion declarat ion:

string to_char(datetime date, string format)

> Document Version: 20220928 190


User Guide· MaxComput e SQL MaxComput e

Purpose : It is used t o convert a value of t he dat e t ype int o a st ring based on t he specified format .

Descript ion:

dat e: dat e value of t he dat et ime t ype t o be convert ed. If t he input is a st ring, it is implicit ly
convert ed int o a value of t he dat et ime t ype before t his comput at ion. For all ot her t ypes of input s,
an error is ret urned.
format : a const ant of t he st ring t ype. If it is not a const ant or is of a different t ype, an error is
ret urned. In format , t he dat e format part is replaced wit h t he corresponding dat a and ot her
charact ers are out put direct ly.

Ret urned value : st ring t ype. If any input is NULL, NULL is ret urned.
Example :

to_char('2017-12-03 00:00:00', 'Alibabayyyy-mm*dd') = 'Alibaba2017-12*03'


to_char('2017-07-18 00:00:00', 'yyyymmdd') = '20170718'
to_char('Alibaba 2017-12*3', 'Alibaba yyyy-mm*dd')
-- Null is returned.
to_char('2017-24-01', 'yyyy')
-- Null is returned.
to_char('2017718', 'yyyymmdd')
-- Null is returned.

Not e For more informat ion about conversion from ot her t ypes int o t he st ring t ype, see St ring
funct ions — T O_CHAR.

6.7.3.10. UNIX_TIMESTAMP
Funct ion declarat ion:

bigint unix_timestamp(datetime date)

Purpose : It is used t o convert a dat e int o a dat et ime value of t he int eger t ype in t he Unix format .

Descript ion:

dat e: dat et ime t ype. It indicat es t he dat e. If t he input is a st ring, it is implicit ly convert ed int o a value of
t he dat et ime t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : bigint t ype. It indicat es t he dat e value in Unix format . If dat e is NULL, NULL is ret urned.

6.7.3.11. FROM_UNIXTIME
Funct ion declarat ion:

datetime from_unixtime(bigint unixtime)

Purpose : It is used t o convert a Unix dat e value from t he BIGINT t ype t o t he DAT ET IME t ype.

Descript ion:

unixt ime: BIGINT t ype. It is a dat e value in t he Unix format . If t he input is of t he ST RING, DECIMAL, or
DOUBLE t ype, it is implicit ly convert ed int o a value of t he BIGINT t ype before comput at ion.

191 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Ret urned value : DAT ET IME t ype. If unixt ime is NULL, NULL is ret urned.

Not e In t he HIVE-compat ible mode (where set odps.sql.hive.compatible=true; has been


execut ed), if t he input is of t he ST RING t ype, a dat e value of t he ST RING t ype is ret urned.

Example :

from_unixtime(123456789) = 1973-11-30 05:33:09;

6.7.3.12. WEEKDAY
Funct ion declarat ion:

bigint weekday (datetime date)

Purpose : It is used t o ret urn t he day of week for t he specified dat e.

Descript ion:

dat e: dat et ime t ype. If t he input is of t he st ring t ype, it is implicit ly convert ed t o a value of t he
dat et ime t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : bigint t ype. If t he input is NULL, NULL is ret urned. Monday is t he first day of a week
and t he ret urned value is 0. Days are numbered in ascending order st art ing from 0. If t he day is Sunday,
t he ret urned value is 6.

6.7.3.13. WEEKOFYEAR
Funct ion declarat ion:

bigint weekofyear(datetime date)

Purpose : It is used t o ret urn t he calendar week of t he year t hat t he specified dat e falls in. T he syst em
uses Monday as t he first day of t he week.

Not e If a week ext ends int o t he next year, t he week belongs t o t he year cont aining four
days or more. If more days fall in t he first year, t he week is considered as t he last week of t he first
year. If more days fall in t he second year, t he week is considered as t he first week of t he second
year.

Descript ion:

dat e: t he dat e of t he dat et ime t ype. If t he input is of t he st ring t ype, it is implicit ly convert ed t o a
value of t he dat et ime t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : bigint t ype. If t he input is NULL, NULL is ret urned.

Example :

> Document Version: 20220928 192


User Guide· MaxComput e SQL MaxComput e

select weekofyear(to_date("20171229", "yyyymmdd")) from dual;


Returned value:
+------------+
| _c0 |
+------------+
| 1 |
+------------+
-- 20171229 is in year 2017, but the most days of the week are in year 2018. Therefore, the
returned value is 1, which indicates the first week of year 2018.
select weekofyear(to_date("20171231", "yyyymmdd")) from dual;
-- 1 is returned.
select weekofyear(to_date("20181229", "yyyymmdd")) from dual;
-- The returned value is 53.

6.7.3.14. Additional date functions


MaxComput e 2.0 provides addit ional dat e funct ions. You must add t he following SET st at ement before
SQL st at ement s cont ained in t he dat e funct ions:

set odps.sql.type.system.odps2=true;

Not e You must submit and execut e t he SET st at ement and t he SQL st at ement s of t he new
funct ions simult aneously.

Example:

set odps.sql.type.system.odps2=true;
select year('2017-01-01 12:30:00') = 2017 from dual;

T he dat e funct ions described in subsequent t opics are new in MaxComput e 2.0.

6.7.3.15. YEAR
Funct ion declarat ion:

INT year(string date)

Purpose : It is used t o ret urn t he year of t he specified dat e.


Descript ion:

dat e: t he dat e of t he st ring t ype. T he dat e format must include yyyy-mm-dd and have no redundant
st rings. Ot herwise, NULL is ret urned.

Ret urned value : int t ype.

Example :

193 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

year('2017-01-01 12:30:00') = 2017


year('2017-01-01') = 2017
year('17-01-01') = 17
year(2017-01-01) = null
year('2017/03/09') = null
year(null) = null

6.7.3.16. QUARTER
Command synt ax :

int quarter(datetime/timestamp/string date)

Purpose : It is used t o ret urn t he quart er of t he input dat e, ranging from 1 t o 4.

Descript ion:

dat e: dat et ime, t imest amp, or st ring t ype. T he dat e format must include yyyy-mm-dd and have no
redundant st rings. Ot herwise, NULL is ret urned.
Ret urned value : int t ype. If t he input is NULL, NULL is ret urned.

Example :

quarter('2017-11-12 10:00:00') = 4
quarter('2017-11-12') = 4

6.7.3.17. MONTH
Funct ion declarat ion:

INT month(string date)

Purpose : It is used t o ret urn t he mont h of t he input dat e.

Descript ion:
dat e: T his value must be a dat e of t he st ring t ype. For all ot her input t ypes, an error is ret urned.

Ret urned value : int t ype.

Example :

month('2017-09-01') = 9
month('20170901') = null

6.7.3.18. DAY
Funct ion declarat ion:

INT day(string date)

Purpose : It is used t o ret urn t he day of t he input dat e.

> Document Version: 20220928 194


User Guide· MaxComput e SQL MaxComput e

Descript ion:

dat e: T his value must be a st ring t ype dat e. For all ot her input t ypes, an error is ret urned.
Ret urned value : int t ype.

Example :

day('2017-09-01') = 1
day('20170901') = null

6.7.3.19. DAYOFMONTH
Funct ion declarat ion:

INT dayofmonth(date)

Purpose : It is used t o ret urn t he day of t he mont h for t he input dat e.

Descript ion:
dat e: T his value must be a st ring t ype dat e. For all ot her input t ypes, an error is ret urned.

Ret urned value : int t ype.

Example :

dayofmonth('2017-09-01') = 1
dayofmonth('20170901') = null

6.7.3.20. HOUR
Funct ion declarat ion:

INT hour(string date)

Purpose : It is used t o ret urn t he hour of t he input dat e.


Descript ion:

dat e: T his value must be a st ring t ype dat e. For all ot her input t ypes, an error is ret urned.

Ret urned value : int t ype.

Example :

hour('2017-09-01 12:00:00') = 12
hour('12:00:00') = 12
hour('20170901120000') = null

6.7.3.21. MINUTE
Funct ion declarat ion:

INT minute(string date)

195 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Purpose : It is used t o ret urn t he minut e of t he input dat e.

Descript ion:

dat e: T his value must be a st ring t ype dat e. For all ot her input t ypes, an error is ret urned.
Ret urned value : int t ype.

Example :

minute('2017-09-01 12:30:00') = 30
minute('12:30:00') = 30
minute('20170901120000') = null

6.7.3.22. SECOND
Funct ion declarat ion:

INT second(string date)

Purpose : It is used t o ret urn t he second of t he input dat e.

Descript ion:
dat e: T his value must be a st ring t ype dat e. For all ot her input t ypes, an error is ret urned.

Ret urned value : int t ype.

Example :

second('2017-09-01 12:30:45') = 45
second('12:30:45') = 45
second('20170901123045') = null

6.7.3.23. FROM_UTC_TIMESTAMP
Funct ion declarat ion:

timestamp from_utc_timestamp({any primitive type}*, string timezone)

Purpose : It is used t o convert a UT C t imest amp t o a t imest amp for a specified t imezone.
Descript ion:

{any primit ive t ype}*: t he t imest amp. T he t ype can be T IMEST AMP, DAT ET IME, T INYINT , SMALLINT , INT ,
or BIGIN.
t imezone: Specifies t he dest inat ion t imezone, such as PST .

Ret urned value : DAT ET IME t ype.


Example :

select from_utc_timestamp(1501557840,'PST') = '1970-01-18 09:05:57.84'


select from_utc_timestamp('1970-01-30 16:00:00','PST') = '1970-01-30 08:00:00.0'
select from_utc_timestamp('1970-01-30','PST') = '1970-01-29 16:00:00.0'

> Document Version: 20220928 196


User Guide· MaxComput e SQL MaxComput e

6.7.3.24. CURRENT_TIMESTAMP
Funct ion declarat ion:

timestamp current_timestamp()

Purpose : T he current t imest amp is ret urned as a T imest amp-t ype value. T he value is not fixed.

Ret urned value : t imest amp t ype.

Example :

select current_timestamp() from dual;


-- '2017-08-03 11:50:30.661'is returned.

6.7.3.25. ADD_MONTHS
Funct ion declarat ion:

string add_months(string startdate, int nummonths)

Purpose : It is used t o ret urn t he dat e, which is 'nummont hs' mont hs lat er t han 'st art dat e'.

Descript ion:

st art dat e: T his value must be a st ring t ype dat e. T he dat e format must cont ain yyyy-mm-dd.
Ot herwise, NULL is ret urned.
num_mont hs: int t ype.

Ret urned value : T his value must be a st ring t ype dat e. T he format is yyyy-mm-dd.
Example :

Add_months ('2017-02-14', 3) = '2017-05-14'


add_months('17-2-14',3) = '0017-05-14'
add_months('2017-02-14 21:30:00',3) = '2017-05-14'
add_months('20170214',3) = null

6.7.3.26. LAST_DAY
Funct ion declarat ion:

string last_day(string date)

Purpose : It is used t o ret urn t he last dat e of t he mont h.

Descript ion:

dat e: st ring t ype. T he format is yyyy-MM-dd HH:mi:ss or yyyy-mm-dd.


Ret urned value : T his value must be a dat et ime t ype dat e. T he format is yyyy-mm-dd.

Example :

197 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

last_day('2017-03-04') = '2017-03-31'
last_day('2017-07-04 11:40:00') = '2017-07-31'
last_day('20170304') = null

6.7.3.27. NEXT_DAY
Funct ion declarat ion:

string next_day(string startdate, string week)

Purpose : It is used t o ret urn t he next dat e t hat is lat er t han st art dat e and mat ches t he week value.
T hat is, t he dat e of t he day specified of t he next week.

Descript ion:

st art dat e: st ring t ype. T he format is yyyy-MM-dd HH:mi:ss or yyyy-mm-dd.


week: st ring t ype. T he name of a day, or t he first 2 or 3 let t ers of t he day, for example, Mo, T UE, or
FRIDAY.

Ret urned value : T his value must be a st ring t ype dat e. T he format is yyyy-mm-dd.

Example :

next_day('2017-08-01','TU') = '2017-08-08'
next_day('2017-08-01 23:34:00','TU') = '2017-08-08'
Next_day ('20170801 ', 'tu') = NULL

6.7.3.28. MONTHS_BETWEEN
Funct ion declarat ion:

double months_between(datetime/timestamp/string date1, datetime/timestamp/string date2)

Purpose : It is used t o ret urn t he number of mont hs bet ween dat e1 and dat e2.
Descript ion:

dat e1: dat et ime, t imest amp, or st ring t ype. T he format is yyyy-MM-dd HH:mi:ss or yyyy-mm-dd.
dat e2: dat et ime, t imest amp, or st ring t ype. T he format is yyyy-MM-dd HH:mi:ss or yyyy-mm-dd.

Ret urned value : double t ype.

If 'dat e1' is lat er t han 'dat e2', t he ret urned value is posit ive. If 'dat e2' is lat er t han 'dat e1', t he
ret urned value is negat ive.
When dat e1 and dat e2 correspond t o t he last days of t wo mont hs, t he ret urned value is an int eger
represent ing t he number of mont hs. Ot herwise, t he formula is (dat e1 - dat e2)/31.

Example :

months_between('1997-02-28 10:30:00', '1996-10-30') = 3.9495967741935485


months_between('1996-10-30','1997-02-28 10:30:00' ) = -3.9495967741935485
months_between('1996-09-30','1996-12-31') = -3.0

> Document Version: 20220928 198


User Guide· MaxComput e SQL MaxComput e

6.7.3.29. EXTRACT
Funct ion declarat ion:

INT EXTRACT(<datepart> from <timestamp>)

Descript ion: T his funct ion ext ract s t he part specified by dat epart from t he t ime specified by
t imest amp.

Paramet ers:

dat epart : a value t hat can be set t o a t ime unit , such as YEAR, MONT H, DAY, HOUR, or MINUT E
t imest amp: a value of t he T IMEST AMP t ype

Ret urn value : A value of t he INT t ype is ret urned.


Example :

SET odps.sql.type.system.odps2=true;
SELECT extract(YEAR FROM '2019-05-01 11:21:00') year
,extract(MONTH FROM '2019-05-01 11:21:00') month
,extract(DAY FROM '2019-05-01 11:21:00') day
,extract(HOUR FROM '2019-05-01 11:21:00') hour
,extract(MINUTE FROM '2019-05-01 11:21:00') minute;
-- The following result is returned:
+------+-------+------+------+--------+
| year | month | day | hour | minute |
+------+-------+------+------+--------+
| 2019 | 5 | 1 | 11 | 21 |
+------+-------+------+------+--------+

If t he t ime value specified in t he SQL st at ement is invalid or exceeds t he specified range, t he ret urn
value is t he remainder obt ained by dividing t he specified t ime value by t he maximum value in t he t ime
range.

Example :

SET odps.sql.type.system.odps2=true;
SELECT extract(HOUR FROM '2019-05-01 31:01:01') hour
,extract(MINUTE FROM '2019-05-01 23:61:01') minute;
-- The following result is returned:
+------+-------+
| hour | minute|
+------+-------+
| 7 | 1 |
+------+-------+
-- The maximum value of hour is 24, and the specified time value is 31. The return value is
7 (31/24).
-- The maximum value of minute is 60, and the specified time value is 61. The return value
is 1 (61/60).

6.7.4. Window functions

199 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.7.4.1. Overview
In MaxComput e SQL st at ement s, you can use t he window funct ion t o analyze and process dat a flexibly.
T he window funct ion can only appear in SELECT clauses. It does not support nest ed Window or
aggregat ion funct ions. T he window funct ion cannot be used wit h t he same-level aggregat ion
funct ions at t he same t ime.

A MaxComput e SQL st at ement support s up t o f ive window f unct ions.

Synt ax :

window_func() over (partition by col1, [col2...]


[order by col1 [asc|desc][, col2[asc|desc]...]] windowing_clause)

Descript ion:

PART IT ION BY specifies part it ion columns. T he rows on which t he part it ion column values are t he
same are considered t o be in t he same window. A window can cont ain up t o 100 million rows of dat a
(we recommend t hat t he number of rows does not exceed 5 million). Ot herwise, an error is ret urned.
Use ORDER BY t o specify t he rule for sort ing dat a in a window.
You can use ROWS in windowing_clause t o specify t he part it ioning met hod. T here are t wo met hods:
rows bet ween x preceding|following and y preceding|following indicat es a window range from t he
xt h row preceding or following t he current row t o t he yt h row preceding or following t he current
row.
rows x preceding|following indicat es a window range from t he xt h row preceding or following t he
current row t o t he current row.

Not e
x and y must be int eger const ant s great er t han or equal t o 0. T heir values range from 0 t o
10,000. 0 indicat es t he current row.
You must specify ORDER BY before using ROWS t o specify a window range.
Not all window funct ions open windows using t he met hod specified by ROWS. T he
met hod is only support ed by t he following funct ions: AVG, COUNT , MAX, MIN, ST DDEV,
and SUM.

6.7.4.2. COUNT
Command synt ax :

bigint count([distinct] expr) over(partition by col1[, col2…]


[order by col1 [asc|desc][, col2[asc|desc]…]] [windowing_clause])

Purpose : It is used t o ret urn t he number of values on t he expr column.


Descript ion:

expr: any t ype. When it is NULL, t his row is not involved in comput at ion. If t he dist inct keyword is
specified, t his paramet er indicat es t hat only dist inct values are count ed.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc],col2[asc|desc]: T he count value of expr in t he current window is ret urned if

> Document Version: 20220928 200


User Guide· MaxComput e SQL MaxComput e

ORDER BY is not set . T he ret urned result s are sort ed in t he specified order if ORDER BY is specified, and
t he value is t he count value from t he st art row t o t he current row in t he current window.

Ret urned value : bigint .

Not e If t he dist inct keyword is specified, ORDER BY cannot be used.

Example :

T he user_id column of t he bigint t ype exist s in t he t est _src t able.

select user_id,count(user_id) over (partition by user_id) as count from test_src;


+---------+------------+
| user_id | count |
+---------+------------+
| 1 | 3 |
| 1 | 3 |
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
+---------+------------+
+---------+------------+
-- If ORDER BY is not specified, the number of values on the user_id column from the curren
t partition is returned.
select user_id,count(user_id) over (partition by user_id order by user_id) as count from te
st_src;
+---------+------------+
| user_id | count |
+---------+------------+
| 1 | 1 | -- start row of the window
| 1 | 2 | --two records exist from start row to current row. Return 2.
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
+---------+------------+
-- If ORDER BY is specified, the count value from the start row to the current row from the
current partition is returned.

6.7.4.3. AVG
Funct ion declarat ion:

avg([distinct] expr) over(partition by col1[, col2…]


[order by col1 [asc|desc] [, col2[asc|desc]…]] [windowing_clause])

Purpose : It is used t o calculat e t he average value.


Descript ion:

dist inct : If t he dist inct keyword is specified, t his paramet er indicat es t hat t he average value of
dist inct values is calculat ed.
expr: double t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a value of
t he double t ype before comput at ion. If t he input is of anot her t ype, an error is ret urned. If t he input

201 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

is NULL, t his row is not used in comput at ion. T he input cannot be of t he boolean t ype.
part it ion by col1[, col2]…: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he count value of expr in t he current window is ret urned if
ORDER BY is not set . T he ret urned result s are sort ed in t he specified order if ORDER BY is specified, and
t he value is t he count value from t he st art row t o t he current row in t he current window.

Ret urned value : double t ype.

Not e If t he dist inct keyword is specified, ORDER BY cannot be set .

6.7.4.4. MAX
Funct ion declarat ion:

max([distinct] expr) over(partition by col1[, col2…]


[order by col1 [asc|desc][, col2[asc|desc]…]] [windowing_clause])

Purpose : It is used t o ret urn t he maximum value.

Descript ion:

expr: any t ypes except t he boolean t ype. If t he value is NULL, t he corresponding row is not involved
in t he operat ion. If t he dist inct keyword is specified, t his paramet er indicat es t hat t he maximum value
of t he dist inct values is t aken (whet her t his paramet er is set or not does not affect t he result ).
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he maximum value in t he current window is ret urned if ORDER
BY is not set . If ORDER BY is set , t he ret urned result s are sort ed in t he specified order, and t he values
are t he maximum values from t he st art row t o t he current row in t he current window.
Ret urned value : T he t ype is t he same as t hat of expr.

Not e If t he dist inct keyword is specified, ORDER BY cannot be set .

6.7.4.5. MIN
Funct ion declarat ion:

min([distinct] expr) over(partition by col1[, col2…]


[order by col1 [asc|desc][, col2[asc|desc]…]] [windowing_clause])

Purpose : It is used t o ret urn t he minimum value.

Descript ion:
expr: any t ypes except t he boolean t ype. If t he value is NULL, t he corresponding row is not involved
in t he operat ion. If t he dist inct keyword is specified, t his paramet er indicat es t hat t he minimum value
of dist inct values is t aken (whet her t his paramet er is set or not does not affect t he result ).
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he minimum value in t he current window is ret urned if ORDER
BY is not set . If ORDER BY is set , t he ret urned result s are sort ed in t he specified order, and t he
ret urned value is t he minimum value in t he current window from t he st art row t o t he current row.

> Document Version: 20220928 202


User Guide· MaxComput e SQL MaxComput e

Ret urned value : T he t ype is t he same as t hat of expr.

Not e If t he dist inct keyword is specified, ORDER BY cannot be set .

6.7.4.6. MEDIAN
Funct ion declarat ion:

double median(double number) over(partition by col1[, col2…])


decimal median(decimal number) over(partition by col1[,col2…])

Purpose : It is used t o calculat e t he median.

Descript ion:

number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed
int o a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is
ret urned. If t he input is NULL, NULL is ret urned.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.

Ret urned value : double t ype.

6.7.4.7. STDDEV
Funct ion declarat ion:

double stddev([distinct] expr) over(partition by col1[, col2…] [order by col1 [asc|desc][,


col2[asc|desc]…]] [windowing_clause])
decimal stddev([distinct] expr) over(partition by col1[,col2…] [order by col1 [asc|desc][,
col2[asc|desc]…]] [windowi ng_clause])

Purpose : It is used t o calculat e t he populat ion st andard deviat ion.

Descript ion:

expr: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned. If
t he input value is NULL, t hen NULL is ret urned. If t he dist inct keyword is specified, t his paramet er
indicat es t hat t he populat ion st andard deviat ion of dist inct values is calculat ed.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he populat ion st andard deviat ion of t he current window is
ret urned if ORDER BY is not set . If ORDER BY is set , t he ret urned result s are sort ed in t he specified
order, and t he values are t he populat ion st andard deviat ion of t he st art row t o t he current row in
t he current window.

Ret urned value : When t he input is of t he decimal t ype, a value of t he decimal t ype is ret urned.
Ot herwise, a value of t he double t ype is ret urned.

Not e If t he dist inct keyword is specified, ORDER BY cannot be set .

6.7.4.8. STDDEV_SAMP

203 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Funct ion declarat ion:

double stddev_samp([distinct] expr) over(partition by col1[, col2…] [order by col1 [asc|des


c][, col2[asc|desc]…]] [windowing_clause])
decimal stddev_samp([distinct] expr) over(partition by col1[,col2…] [order by col1 [asc|des
c][, col2[asc|desc]…]] [windowing_clause])

Purpose : It is used t o calculat e t he sample st andard deviat ion.

Descript ion:
expr: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o a
value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned. If
t he input is NULL, NULL is ret urned. If t he dist inct keyword is specified, t his paramet er indicat es t hat
t he sample st andard deviat ion of dist inct values is calculat ed.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he sample st andard deviat ion of t he current window is
ret urned if ORDER BY is not set . If ORDER BY is set , t he ret urned result s are sort ed in t he specified
order, and t he values are t he sample st andard deviat ion of t he st art row t o t he current row in t he
current window.

Ret urned value : When t he input is of t he decimal t ype, a value of t he decimal t ype is ret urned.
Ot herwise, a value of t he double t ype is ret urned.

Not e If t he dist inct keyword is specified, ORDER BY cannot be set .

6.7.4.9. SUM
Funct ion declarat ion:

sum([distinct] expr) over(partition by col1[, col2…]


[order by col1 [asc|desc][, col2[asc|desc]…]] [windowing_clause])

Purpose : It is used calculat e t he sum.

Descript ion:

expr: double, decimal, or bigint t ype. If t he input is of t he st ring t ype, it is implicit ly convert ed int o a
value of t he double t ype before comput at ion. If t he input is of anot her t ype, an error is ret urned. If
t he value is NULL, t his row is not calculat ed. If t he dist inct keyword is specified, t his paramet er
indicat es t hat t he sum of dist inct values is calculat ed.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T he sum of t he expr value in t he current window is ret urned if
ORDER BY is not set . If ORDER BY is set , t he ret urned result s are sort ed in t he order specified. T he
ret urned result s are t he cumulat ive sum of st art row t o t he current row in t he current window.

Ret urned value : When t he input is of t he bigint t ype, a value of t he bigint t ype is ret urned. When t he
input is of t he double or st ring t ype, a value of t he double t ype is ret urned.

Not e If t he dist inct keyword is specified, ORDER BY cannot be set .

> Document Version: 20220928 204


User Guide· MaxComput e SQL MaxComput e

6.7.4.10. DENSE_RANK
Funct ion declarat ion:

bigint dense_rank() over(partition by col1[, col2…]


order by col1 [asc|desc][, col2[asc|desc]…])

Purpose : It is used t o calculat e t he consecut ive ranking of values. T he dat a in t he same row of col2 has
t he same rank.

Descript ion:

part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: T his paramet er specifies t he value for deciding t he ranking.

Ret urned value : bigint t ype.

Example :

T he emp t able cont ains t he following dat a:

| empno | ename | job | mgr | hiredate| sal| comm | deptno |


7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

T o obt ain t heir serial number, t he employees must be group by t heir depart ment s and sort ed by SAL in
descending order.

205 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

SELECT deptno
, ename
, sal
, DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums
-- DEPTNO (department) is the partition used in the computation, and SAL (salary) is used a
s basis for sorting returned results.
FROM emp;
-- Returned result:
+------------+-------+------------+------------+
| deptno | ename | sal | nums |
+------------+-------+------------+------------+
| 10 | JACCKA | 5000.0 | 1 |
| 10 | King | 5000.0 | 1 |
| 10 | CLARK | 2450.0 | 2 |
| 10 | WELAN | 2450.0 | 2 |
| 10 | TEBAGE | 1300.0 | 3 |
10 | Miller | 1300.0 | 3 |
| 20 | SCOTT | 3000.0 | 1 |
| 20 | Ford | 3000.0 | 1 |
| 20 | JONES | 2975.0 | 2 |
| 20 | ADAMS | 1100.0 | 3 |
| 20 | SMITH | 800.0 | 4 |
| 30 | BLAKE | 2850.0 | 1 |
| 30 | ALLEN | 1600.0 | 2 |
| 30 | TURNER | 1500.0 | 3 |
| 30 | MARTIN | 1250.0 | 4 |
| 30 | WARD | 1250.0 | 4 |
| 30 | JAMES | 950.0 | 5 |
+------------+-------+------------+------------+

6.7.4.11. RANK
Command synt ax :

bigint rank() over(partition by col1[, col2…] order by col1 [asc|desc][, col2[asc|desc]…])

Purpose : It is used t o ret urn a ranking value. T he ranking of t he same row dat a wit h col2 drops.

Descript ion:

part it ion by col2[, col2..]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: specifies t he rule for deciding t he ranking.

Ret urned value : bigint t ype.

Example :

T able emp cont ains t he following dat a:

> Document Version: 20220928 206


User Guide· MaxComput e SQL MaxComput e

| empno | ename | job | mgr | hiredate| sal| comm | deptno |


7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Now group t he employees by depart ment . Sort t he employees in each group in descending order
based on t he salary. Each employee obt ains a number t hat represent s t heir posit ion in t he group.

SELECT deptno
, ename
, sal
, RANK() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums
-- DEPTNO (department) is the partitioning column. The sal column is sorted to generate the
ranking value for each employee.
FROM emp;
-- Returned result:
+------------+-------+------------+------------+
| deptno | ename | sal | nums |
+------------+-------+------------+------------+
| 10 | JACCKA | 5000.0 | 1 |
| 10 | KING | 5000.0 | 1 |
| 10 | CLARK | 2450.0 | 3 |
| 10 | WELAN | 2450.0 | 3 |
| 10 | TEBAGE | 1300.0 | 5 |
| 10 | MILLER | 1300.0 | 5 |
| 20 | SCOTT | 3000.0 | 1 |
| 20 | FORD | 3000.0 | 1 |
| 20 | JONES | 2975.0 | 3 |
| 20 | ADAMS | 1100.0 | 4 |
| 20 | SMITH | 800.0 | 5 |
| 30 | BLAKE | 2850.0 | 1 |
| 30 | ALLEN | 1600.0 | 2 |
| 30 | TURNER | 1500.0 | 3 |
| 30 | MARTIN | 1250.0 | 4 |
| 30 | WARD | 1250.0 | 4 |
| 30 | JAMES | 950.0 | 6 |
+------------+-------+------------+------------+

207 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.7.4.12. LAG
Funct ion declarat ion:

lag(expr, bigint offset, default) over(partition by col1[, col2…] [order by col1 [asc|desc]
[, col2[asc|desc]…]])

Purpose : It is used t o ret rieve t he value in t he row wit h a negat ive offset from t he current row. For
example, if t he current row is rn, t he value ret rieved is from t he row rn - offset .

Descript ion:

expr: any t ype.


offset : a const ant of t he bigint t ype. If t he input is of t he st ring or double t ype, it is implicit ly
convert ed int o a value of t he bigint t ype before comput at ion, and t he offset is great er t han 0.
default : a const ant . It specifies t he default value when t he offset is out of t he valid range. T he
default value is NULL.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: indicat es t he sort ing order of t he ret urned result s.

Ret urned value : T he t ype is t he same as t hat of expr.

6.7.4.13. LEAD
Funct ion declarat ion:

lead(expr, bigint offset, default) over(partition by col1[, col2…][order by col1 [asc|desc]


[, col2[asc|desc]…]])

Purpose : It is used t o ret rieve t he value in t he row wit h a posit ive offset from t he current row. For
example, if t he current row is rn, t he value ret rieved is from t he row rn + offset .

Descript ion:

expr: any t ype.


offset : a const ant of t he bigint t ype. If t he input is of t he st ring or double t ype, it is implicit ly
convert ed int o a value of t he bigint t ype before comput at ion, and t he offset is great er t han 0.
default : a const ant . It specifies t he default value when t he offset is out of t he valid range.
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], > col2[asc|desc]: indicat es t he sort ing order of t he ret urned result s.

Ret urned value : T he t ype is t he same as t hat of expr.

Example :

select c_double_a,c_string_b,c_int_a,lead(c_int_a,1) over(partition by c_double_a order by


c_string_b) from dual;
select c_string_a,c_time_b,c_double_a,lead(c_double_a,1) over(partition by c_string_a order
by c_time_b) from dual;
select c_string_in_fact_num,c_string_a,c_int_a,lead(c_int_a) over(partition by c_string_in_
fact_num order by c_string_a) from dual;

> Document Version: 20220928 208


User Guide· MaxComput e SQL MaxComput e

6.7.4.14. PERCENT_RANK
Funct ion declarat ion:

percent_rank() over(partition by col1[, col2…] order by col1 [asc|desc][, col2[asc|desc]…])

Purpose : It is used t o ret urn t he relat ive ranking of a row in a group of dat a.

Descript ion:
part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], col2[asc|desc]: specifies t he value for t he ranking.

Ret urned value : double t ype. Value range: 0 t o 1. T he relat ive ranking is calculat ed using t he
following formula: (rank-1)/(number of rows -1).

Not e T he number of rows in a window cannot exceed 10,000,000.

6.7.4.15. ROW_NUMBER
Funct ion declarat ion:

row_number() over(partition by col1[, col2…] order by col1 [asc|desc][, col2[asc|desc]…])

Purpose : It is used t o calculat e t he row number, which st art s from 1.

Descript ion:

part it ion by col1[, col2…]: specifies t he part it ions used in t he comput at ion.
order by col1 [asc|desc], > col2[asc|desc]: indicat es t he sort ing value of t he ret urned result .

Ret urned value : bigint t ype.

Example :

If t able emp cont ains t he following dat a:

209 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

| Empno | ename | job | Mgr | hiredate | Sal | REM | deptno |


7369, Smith, clerk, maid-12-17 00:00:00, 800, 20
7499, Allen, salesman, maid-02-20 00:00:00, 1600,300, 30
7521, Ward, salesman, maid-02-22 00:00:00, 1250,500, 30
7566, Jones, Manager, fig-04-02 00:00:00, 2975, 20
7654 Martin, salesman, fig-09-28 00:00:00, fig, 30
7698, Blake, Manager, fig-05-01 00:00:00, 2850, 30
7782, Clark, Manager, fig-06-09 00:00:00, 2450, 10
7788, Scott, analyst, fig-04-19 00:00:00, 3000, 20
00:00:00, King, President, 1991-11-17 5000, 7839, 10
7844, Turner, salesman, fig-09-08 00:00:00, 1500,0, 30
7876, Adams, clerk, maid-05-23 00:00:00, 1100, 20
7900 James, clerk, maid-12-03 00:00:00, 950, 30
7902 Ford, analyst, fig-12-03 00:00:00, 3000, 20
7934 Miller, clerk, fig-01-23 00:00:00, 1300, 10
7948, jaccka, clerk, fig-04-12 00:00:00, 5000, 10
7956, welan, clerk, fig-07-20 00:00:00, 2450, 10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Now, all employees need t o be grouped by depart ment , and each group must be sort ed in descending
order according t o SAL t o obt ain t he serial number in own group.

SELECT deptno
, ename
,sal
, ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums
-- DEPTNO (department) is the partition used in the computation, and SAL (salary) is used a
s basis for sorting results.
FROM emp;
-- Returned result:
+------------+-------+------------+------------+
| deptno | ename | sal | nums |
+------------+-------+------------+------------+
| 10 | JACCKA | 5000.0 | 1 |
| 10 | KING | 5000.0 | 2 |
| 10 | CLARK | 2450.0 | 3 |
| 10 | WELAN | 2450.0 | 4 |
| 10 | TEBAGE | 1300.0 | 5 |
| 10 | MILLER | 1300.0 | 6 |
| 20 | SCOTT | 3000.0 | 1 |
| 20 | FORD | 3000.0 | 2 |
| 20 | JONES | 2975.0 | 3 |
| 20 | ADAMS | 1100.0 | 4 |
| 20 | SMITH | 800.0 | 5 |
| 30 | BLAKE | 2850.0 | 1 |
| 30 | ALLEN | 1600.0 | 2 |
| 30 | TURNER | 1500.0 | 3 |
| 30 | MARTIN | 1250.0 | 4 |
| 30 | WARD | 1250.0 | 5 |
| 30 | JAMES | 950.0 | 6 |
+------------+-------+------------+------------+

> Document Version: 20220928 210


User Guide· MaxComput e SQL MaxComput e

6.7.4.16. CLUSTER_SAMPLE
Command synt ax :

boolean cluster_sample(bigint x[, bigint y]) over(partition by col1[, col2..])

Purpose : It is used t o conduct clust er sampling.

Descript ion:

x: bigint t ype. x>=1. If t he paramet er y is specified, x indicat es t hat a window is divided int o x part s.
Ot herwise, x indicat es t hat x rows of records are ext ract ed from a window (t hat is, t he ret urned value
is t rue if t here are x rows). If x is NULL, NULL is ret urned.
y: a const ant of t he bigint t ype. y>=1, y<=x. T his paramet er ext ract s y records from x part s int o
which a window is divided (t hat is, t he ret urned value is t rue if y records exist ). If y is NULL, NULL is
ret urned.
part it ion by col1[, col2]: specifies t he part it ions used in t he comput at ion.

Ret urned value : boolean t ype.

Example :

T he t est _t bl t able has t wo columns: key and value. T he key column st ores t he group name of each
value. T he group names are groupa and groupb. T he value column st ores t he values. T he t able
st ruct ure is like t his:

+------------+--------------------+
| key | value |
+------------+--------------------+
| groupa | -1.34764165478145 |
| groupa | 0.740212609046718 |
| groupa | 0.167537127858695 |
| groupa | 0.630314566185241 |
| GroupA | 0.0112401388646925 |
| groupa | 0.199165745875297 |
| groupa | -0.320543343353587 |
| groupa | -0.273930924365012 |
| groupa | 0.386177958942063 |
| groupa | -1.09209976687047 |
| groupb | -1.10847690938643 |
| groupb | -0.725703978381499 |
| groupb | 1.05064697475759 |
| groupb | 0.135751224393789 |
| groupb | 2.13313102040396 |
| groupb | -1.11828960785008 |
| groupb | -0.849235511508911 |
| groupb | 1.27913806620453 |
| groupb | -0.330817716670401 |
| groupb | -0.300156896191195 |
| groupb | 2.4704244205196 |
| groupb | -1.28051882084434 |
+------------+--------------------+

Run t he following SQL st at ement t o t ake a sample of 10% of t he values in each group:

211 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

select key, value from (select key, value, cluster_sample(10, 1) over(partition by key) as
flag from tbl) sub where flag = true;
-- Returned result:
+--------+--------------------+
| key | value |
+--------+--------------------+
| groupa | -0.273930924365012 |
| groupb | -1.11828960785008 |
+-----+-----------------------

6.7.4.17. NTILE
Funct ion declarat ion:

BIGINT ntile(BIGINT n) over(partition by col1[, col2…] [order by col1 [asc|desc] [, col2[a


sc|desc]…]] [windowing_clause]))

Purpose : It is used t o split grouped dat a int o n slices and ret urn t he current slice number. If t he slice is
uneven, t he dist ribut ion of t he first slice is increased.

Descript ion:

n: BIGINT t ype.
Ret urned value : BIGINT t ype.

Example :

T able emp has t he following dat a:

| empno | ename | job | mgr | hiredate| sal| comm | deptno |


7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Group all employees by depart ment , sort each group in descending order by salary, and t hen obt ain
sequence numbers of employees in each group.

> Document Version: 20220928 212


User Guide· MaxComput e SQL MaxComput e

-- Execute the following statement:


select deptno, ename,sal,NTILE(3) OVER(PARTITION BY deptno ORDER BY sal desc) AS nt3 from
emp;
-- Returned result:
+------------+-------+------------+------------+
| deptno | ename | sal | nt3 |
+------------+-------+------------+------------+
| 10 | JACCKA | 5000.0 | 1 |
| 10 | KING | 5000.0 | 1 |
| 10 | WELAN | 2450.0 | 2 |
| 10 | CLARK | 2450.0 | 2 |
| 10 | TEBAGE | 1300.0 | 3 |
| 10 | MILLER | 1300.0 | 3 |
| 20 | SCOTT | 3000.0 | 1 |
| 20 | FORD | 3000.0 | 1 |
| 20 | JONES | 2975.0 | 2 |
| 20 | ADAMS | 1100.0 | 2 |
| 20 | SMITH | 800.0 | 3 |
| 30 | BLAKE | 2850.0 | 1 |
| 30 | ALLEN | 1600.0 | 1 |
| 30 | TURNER | 1500.0 | 2 |
| 30 | MARTIN | 1250.0 | 2 |
| 30 | WARD | 1250.0 | 3 |
| 30 | JAMES | 950.0 | 3 |
+------------+-------+------------+------------+

6.7.4.18. NTH_VALUE
Funct ion declarat ion:

nth_value(expr, bigint n [, boolean skipNulls]) over(partition by col1[, col2…] order by co


l1 [asc|desc][, col2[asc|desc]…]])

Purpose : It is used t o ret urn t he nt h value in part it ions used in t he comput at ion.

Descript ion:

expr: required. Any t ype.


n: ret urns t he nt h value. It st art s from 1 and is of t he BIGINT t ype.
skipNulls: specifies whet her t o ignore t he rows whose values are NULL. T his paramet er is of t he
BOOLEAN t ype. T he default value is false.

Ret urned value : t he nt h value in part it ions used in t he comput at ion.

Not e If skipNulls is set t o t rue, t he nt h non-NULL value is ret urned. If t he nt h non-NULL value
does not exist , NULL is ret urned.

Example :

213 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

select a, nth_value(a + 1, 1) over (partition by a order by a) from values (3), (1), (2) as
t(a);
-- If n is 1, NTH_VALUE is equivalent to FIRST_VALUE.
-- Returned results:
-- 1 2
-- 2 3
-- 3 4

6.7.4.19. CUME_DIST
Funct ion declarat ion:

cume_dist() over(partition by col1[, col2…] order by col1 [asc|desc][, col2[asc|desc]…]])

Purpose : It is used t o ret urn t he cumulat ive dist ribut ion. T he cumulat ive dist ribut ion is t he rat io
bet ween t he number of rows whose values are less t han or equal t o t he current value of t he group and
t he t ot al number of rows in t he group.

Descript ion: None.

Not e T he order by column specifies values t o be compared.

Ret urned value : t he rat io of t he number of rows whose values are equal t o or less t han t he current
value in t he group t o t he t ot al number of rows in t he group.

Example :

T able emp has t he following dat a:

| empno | ename | job | mgr | hiredate| sal| comm | deptno |


7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Group all employees by depart ment , and t hen obt ain t he cumulat ive dist ribut ion of salary for each
group.

> Document Version: 20220928 214


User Guide· MaxComput e SQL MaxComput e

SELECT deptno
, ename
, sal
, concat(round(cume_dist() OVER(PARTITION BY deptno ORDER BY sal desc)*100,2),'%') as cume_
dist
FROM emp;

Ret urned result is as follows.

Returned result

deptno ename sal cume_dist

10 JACCKA 5000.0 33.33%

10 KING 5000.0 33.33%

10 CLARK 2450.0 66.67%

10 WELAN 2450.0 66.67%

10 T EBAGE 1300.0 100.0%

10 MILLER 1300.0 100.0%

20 SCOT T 3000.0 40.0%

20 FORD 3000.0 40.0%

20 JONES 2975.0 60.0%

20 ADAMS 1100.0 80.0%

20 SMIT H 800.0 100.0%

30 BLAKE 2850.0 16.67%

30 ALLEN 1600.0 33.33%

30 T URNER 1500.0 50.0%

30 MART IN 1250.0 83.33%

30 WARD 1250.0 83.33%

30 JAMES 950.0 100.0%

6.7.4.20. FIRST_VALUE
Funct ion declarat ion:

first_value(expr) over(partition by col1[, col2…] order by col1 [asc|desc][, col2[asc|desc]


…]])

215 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Purpose : It is used t o sort part it ions and ret urn t he first value in t he range from t he beginning t o t he
current row.

Descript ion:

expr: required. Any t ype.

Ret urned value : t he first expr value in part it ions used in t he comput at ion.
Example :

T able emp has t he following dat a:

| empno | ename | job | mgr | hiredate| sal| comm | deptno |


7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Group all employees by depart ment , sort each group in descending order by salary, and t hen obt ain t he
name of t he first employee in each group.

SELECT deptno
, ename
, sal
, FIRST_VALUE(ename) OVER(PARTITION BY deptno ORDER BY sal desc) AS first1-- Obtain
the name of the first employee in each group after descending sorting by salary.
FROM emp;

Ret urned result is as follows.

Returned result

deptno ename sal first1

10 JACCKA 5000.0 JACCKA

10 KING 5000.0 JACCKA

10 CLARK 2450.0 JACCKA

10 WELAN 2450.0 JACCKA

> Document Version: 20220928 216


User Guide· MaxComput e SQL MaxComput e

deptno ename sal first1

10 T EBAGE 1300.0 JACCKA

10 MILLER 1300.0 JACCKA

20 SCOT T 3000.0 SCOT T

20 FORD 3000.0 SCOT T

20 JONES 2975.0 SCOT T

20 ADAMS 1100.0 SCOT T

20 SMIT H 800.0 SCOT T

30 BLAKE 2850.0 BLAKE

30 ALLEN 1600.0 BLAKE

30 T URNER 1500.0 BLAKE

30 MART IN 1250.0 BLAKE

30 WARD 1250.0 BLAKE

30 JAMES 950.0 BLAKE

6.7.4.21. LAST_VALUE
Funct ion declarat ion:

last_value(expr) over(partition by col1[, col2…] order by col1 [asc|desc][, col2[asc|desc]…


]])

Purpose : It is used t o sort part it ions and ret urn t he last value in t he range from t he beginning t o t he
current row.

Descript ion:

expr: required. Any t ype.

Ret urned value : t he last expr value in part it ions used in t he comput at ion.

Example :

T able emp has t he following dat a:

217 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

| empno | ename | job | mgr | hiredate| sal| comm | deptno |


7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Group all employees by depart ment , and t hen obt ain t he name of t he last employee in each group.

SELECT deptno
, ename
, sal
, LAST_VALUE(ename) OVER(PARTITION BY deptno ) AS last1
FROM emp;

T he ret urned result is as follows.

Returned result

deptno ename sal last1

10 T EBAGE 1300.0 WELAN

10 CLARK 2450.0 WELAN

10 KING 5000.0 WELAN

10 MILLER 1300.0 WELAN

10 JACCKA 5000.0 WELAN

10 WELAN 2450.0 WELAN

20 FORD 3000.0 JONES

20 SCOT T 3000.0 JONES

20 SMIT H 800.0 JONES

20 ADAMS 1100.0 JONES

> Document Version: 20220928 218


User Guide· MaxComput e SQL MaxComput e

deptno ename sal last1

20 JONES 2975.0 JONES

30 T URNER 1500.0 BLAKE

30 JAMES 950.0 BLAKE

30 ALLEN 1600.0 BLAKE

30 WARD 1250.0 BLAKE

30 MART IN 1250.0 BLAKE

30 BLAKE 2850.0 BLAKE

6.7.5. Aggregate functions


6.7.5.1. Overview
An aggregat e funct ion aggregat es mult iple input records int o an out put record. T he input is mapped
many-t o-one t o t he out put . An aggregat e funct ion can be used wit h t he GROUP BY clause at t he same
t ime.

6.7.5.2. COUNT
Command synt ax :

bigint count([distict|all] value)

Purpose : It is used t o ret urn t he number of records.

Descript ion:

dist inct |all: indicat es whet her duplicat e records are cleared in count ing. T he default value is all,
indicat ing t hat records are count ed. If it is set t o dist inct , only records wit h dist inct values are
count ed.
value: any t ype. When it is NULL, t his row is not involved in comput at ion. value can be *. When it is set
t o count (*), t he number of all rows is ret urned.

Ret urned value : bigint t ype.

Example :

In t he t bla t able, t he col1 column is of t he bigint t ype.

219 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

+------+
| COL1 |
+------+
| 1 |
+------+
| 2 |
+------+
| NULL |
+------+
select count(*) from tbla;
-- 3 is returned.
select count(col1) from tbla;
-- The value is 2.

Aggregat e funct ions can be used wit h t he GROUP BY st at ement . For example, t able t est _src cont ains
t wo columns: key (st ring t ype), and value (double t ype).

T he dat a in t he t est _src t able:

+-----+-------+
| key | value |
+-----+-------+
| a | 2.0 |
+-----+-------+
| a | 4.0 |
+-----+-------+
| b | 1.0 |
+-----+-------+
| b | 3.0 |
+-----+-------+
select key, count(value) as count from test_src group by key;
-- Run the preceding SQL statement. The output is:
+-----+-------+
| key | count |
+-----+-------+
| a | 2 |
+-----+-------+
| b | 2 |
+-----+-------+

Aggregat e funct ions perform aggregat ion on values of t he same key. T he usage of t he following
aggregat e funct ions is t he same as t hat of t his funct ion and is not described in det ail in t his document .

6.7.5.3. AVG
Funct ion declarat ion:

double avg(double value)


decimal avg(decimal value)

Purpose : It is used t o calculat e t he average value.

Descript ion:

> Document Version: 20220928 220


User Guide· MaxComput e SQL MaxComput e

value: double t ype or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed
int o a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.
If t he value is NULL, t his row is not used for calculat ion. T he input cannot be of t he boolean t ype.

Ret urned value : If t he input is of t he decimal t ype, a value of t he decimal t ype is ret urned. For all
ot her valid input t ypes, a value of t he double t ype is ret urned.

Example :

In t he t bla t able, t he value column is of t he bigint t ype.

+-------+
| value |
+-------+
| 1 |
| 2 |
| NULL |
+-------+
select avg(value) as avg from tbla;
+------+
| avg |
+------+
| 1.5 |
+------+
-- The avg result of this column is as follows: (1 + 2) / 2 = 1.5.

6.7.5.4. MAX
Funct ion declarat ion:

max(value)

Purpose : It is used t o ret urn t he maximum value.

Descript ion:

value: can be any dat a t ype. If t he column value is NULL, t he corresponding row is not involved in t he
operat ion. Values of t he boolean t ype are excluded from t he comput at ion.

Ret urned value : T he t ype is t he same as t hat of value.

Example :

In t he t bla t able, t he col1 column is of t he bigint t ype.

+------+
| col1 |
+------+
| 1 |
+------+
| 2 |
+------+
| NULL |
+------+
select max(value) from tbla;
-- 2 is returned.

221 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.7.5.5. MIN
Funct ion declarat ion:

MIN(value)

Purpose : It is used t o ret urn t he minimum value.

Descript ion:

value: a column of any dat a t ype. If a value in t he column is NULL, t he corresponding row is not involved
in t he operat ion. Boolean t ypes are not allowed in t his operat ion.

Ret urned value : T he t ype is t he same as t hat of value.

Example :
In t he t bla t able, t he value column is of t he bigint t ype.

+------+
| value|
+------+
| 1 |
+------+
| 2 |
+------+
| NULL |
+------+
select min(value) from tbla;
-- 1 is returned.

6.7.5.6. MEDIAN
Funct ion declarat ion:

double median(double number)


decimal median(decimal number)

Purpose : It is used t o calculat e t he median.

Descript ion:
number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned. If t he
input is NULL, a failure is ret urned.

Ret urned value : double or decimal t ype.

6.7.5.7. STDDEV
Funct ion declarat ion:

double stddev(double number)


decimal stddev(decimal number)

> Document Version: 20220928 222


User Guide· MaxComput e SQL MaxComput e

Purpose : It is used t o calculat e t he populat ion st andard deviat ion.

Descript ion:

number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned. If
t he input value is NULL, a failure is ret urned.

Ret urned value : double or decimal t ype.

6.7.5.8. STDDEV_SAMP
Funct ion declarat ion:

double stddev_samp(double number)


decimal stddev_samp(decimal number)

Purpose : It is used t o calculat e t he sample st andard deviat ion.

Descript ion:

number: double or decimal t ype. If t he input is of t he st ring or bigint t ype, it is implicit ly convert ed int o
a value of t he double t ype before t his comput at ion. For all ot her t ypes of input s, an error is ret urned. If
t he input is NULL, a failure is ret urned.

Ret urned value : double or decimal t ype.

6.7.5.9. SUM
Funct ion declarat ion:

sum(value)

Purpose : It is used t o calculat e t he sum.

Descript ion:

value: double, decimal, or bigint t ype. If t he input is of t he st ring t ype, it is implicit ly convert ed int o a
value of t he double t ype before comput at ion. If a value in t he column is NULL, t his row is not used for
calculat ion. Values of t he boolean t ype are excluded from calculat ion.

Ret urned value : When t he input is of t he bigint t ype, a value of t he bigint t ype is ret urned. When t he
input is of t he double or st ring t ype, a value of t he double t ype is ret urned.

Example :

In t he t bla t able, t he value column is of t he bigint t ype.

223 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

+------+
| value|
+------+
| 1 |
+------+
| 2 |
+------+
| NULL |
+------+
select sum(value) from tbla;
-- 3 is returned.

6.7.5.10. WM_CONCAT
Funct ion declarat ion:

string wm_concat(string separator, string str)

Purpose : It is used t o use t he specified separat or as t he delimit er t o link values in a st ring.

Descript ion:

separat or: t he delimit er, which is a const ant of t he st ring t ype. If it is of anot her t ype or is not a
const ant , an error is ret urned.
st r: st ring t ype. If t he input is of t he bigint , double, or dat et ime t ype, it is implicit ly convert ed t o a
value of t he st ring t ype before t his comput at ion. For all ot her input t ypes, an error is ret urned.

Ret urned value : st ring t ype.

Not e If t est _src in t he select wm_concat (',', name) from > t est _src; st at ement is an empt y
set , NULL is ret urned.

6.7.5.11. PERCENTILE
Funct ion declarat ion:

DOUBLE percentile(BIGINT col, p)


array<double> percentile(BIGINT col, array(p1 [, p2]...))

Purpose : It is used t o ret urn t he pt h percent ile of t he specified column. p must be bet ween 0 and 1.

Not ice You can only calculat e t rue percent iles for int eger values.

Descript ion:

col: BIGINT t ype.


p: must be bet ween 0 and 1.
Example :

Column c1 in t able t est has t he following dat a:

> Document Version: 20220928 224


User Guide· MaxComput e SQL MaxComput e

+------------+
| c1 |
+------------+
| 8 |
| 9 |
| 10 |
| 11 |
+------------+

Calculat e t he pt h percent ile of column c1 in t able t est .

-- Execute the following statement:


select percentile(c1,0),percentile(c1,0.3),percentile(c1,0.5),percentile(c1,1) from test;
-- Returned result:
+------------+------------+------------+------------+
| _c0 | _c1 | _c2 | _c3 |
+------------+------------+------------+------------+
| 8.0 | 8.9 | 9.5 | 11.0 |
+------------+------------+------------+------------+
-- Execute the following statement:
select percentile(c1,array(0,0.3,0.5,1))from test;
-- Returned result:
+------+
| _c0 |
+------+
| [8, 8.9, 9.5, 11] |
+------+

6.7.5.12. Additional aggregate functions


MaxComput e 2.0 provides addit ional aggregat e funct ions. You must add t he following SET st at ement
before SQL st at ement s cont ained in t he aggregat e funct ions:

set odps.sql.type.system.odps2=true;

Not e You must submit and execut e t he SET st at ement and t he SQL st at ement s of t he new
funct ions simult aneously.

T he aggregat e funct ions described in subsequent t opics are new in MaxComput e 2.0.

6.7.5.13. COLLECT_LIST
Command synt ax :

ARRAY collect_list(col)

Purpose : It is used t o convert t he values on t he col column int o an array.

Descript ion:

col: a t able column of any dat a t ype.

225 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Ret urned value : array t ype.

6.7.5.14. COLLECT_SET
Command synt ax :

ARRAY collect_set(col)

Purpose : It is used t o convert t he values on t he col column wit h duplicat es removed int o an array.

Descript ion:
col: a t able column of any dat a t ype.

Ret urned value : array t ype.

6.7.5.15. VARIANCE/VAR_POP
Funct ion declarat ion:

DOUBLE variance(col)
DOUBLE var_pop(col)

Purpose : It is used t o calculat e t he variance of t he specified numeric column.


Descript ion:

col: numeric t ype column. NULL is ret urned for ot her t ypes.

Ret urned value : DOUBLE t ype.

Example :

Column c1 in t able t est has t he following dat a:

+------------+
| c1 |
+------------+
| 8 |
| 9 |
| 10 |
| 11 |
+------------+

Calculat e t he variance of column c1 in t able t est .

-- Execute the following statement:


select variance(c1) from test;
-- or
select var_pop(c1) from test;
-- Returned result:
+------------+
| _c0 |
+------------+
| 1.25 |
+------------+

> Document Version: 20220928 226


User Guide· MaxComput e SQL MaxComput e

6.7.5.16. VAR_SAMP
Funct ion declarat ion:

DOUBLE var_samp(col)

Purpose : It is used t o calculat e t he sample variance of t he specified numeric column.

Descript ion:

col: numeric t ype column. NULL is ret urned for ot her t ypes.

Ret urned value : DOUBLE t ype.


Example :

Column c1 in t able t est has t he following dat a:

+------------+
| c1 |
+------------+
| 8 |
| 9 |
| 10 |
| 11 |
+------------+

Calculat e t he variance of column c1 in t able t est .

-- Execute the following statement:


select var_samp(c1) from test;
-- Returned result:
+------------+
| _c0 |
+------------+
| 1.6666666666666667 |
+------------+

6.7.5.17. COVAR_POP
Funct ion declarat ion:

DOUBLE covar_pop(col1, col2)

Purpose : It is used t o calculat e t he populat ion covariance of t wo specified numeric columns.

Descript ion:

col1 and col2: numeric t ype columns. NULL is ret urned for ot her t ypes.

Example :

Columns c1 and c2 in t able t est have t he following dat a:

227 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

+------------+------------+
| c1 | c2 |
+------------+------------+
| 3 | 2 |
| 14 | 5 |
| 50 | 14 |
| 26 | 75 |
+------------+------------+

Calculat e t he populat ion covariance of columns c1 and c2.

-- Execute the following statement:


select covar_pop(c1,c2) from test;
-- Returned result:
+------------+
| _c0 |
+------------+
| 123.49999999999997|
+------------+

6.7.5.18. COVAR_SAMP
Funct ion declarat ion:

DOUBLE covar_samp(col1, col2)

Purpose : It is used t o calculat e t he sample covariance of t wo specified numeric columns.

Descript ion:

col1 and col2: numeric t ype columns. NULL is ret urned for ot her t ypes.

Example :

Columns c1 and c2 in t able t est have t he following dat a:

+------------+------------+
| c1 | c2 |
+------------+------------+
| 3 | 2 |
| 14 | 5 |
| 50 | 14 |
| 26 | 75 |
+------------+------------+

Calculat e t he sample covariance of columns c1 and c2.

> Document Version: 20220928 228


User Guide· MaxComput e SQL MaxComput e

-- Execute the following statement:


select covar_samp(c1,c2) from test;
-- Returned result:
+------------+
| _c0 |
+------------+
| 164.66666666666663|
+------------+

6.7.6. Other functions


6.7.6.1. ARRAY
Funct ion declarat ion:

array(value1,value2, ...)

Purpose : It is used t o creat e an array by using input values.

Descript ion:

value: any t ype. All t he values must be of t he same t ype.

Ret urned value : ARRAY t ype.

Example :

select array(123,456,789) from dual;


-- Returned result:
[123, 456, 789]

6.7.6.2. ARRAY_CONTAINS
Funct ion declarat ion:

array_contains(ARRAY<T> a, value v)

Purpose : It is used t o check whet her array a cont ains value v.

Descript ion:

a: array t ype.
v: T he given value v must be of t he same t ype as t he dat a in t he array.

Ret urned value : boolean t ype.

Example :

select array_contains(array('a','b'), 'a') from dual;


-- True is returned.
select array_contains(array(456,789),123) from dual;
-- False is returned.

229 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.7.6.3. CAST
Command synt ax :

cast(expr as <type>)

Purpose : It is used t o convert an expression of one dat a t ype t o anot her. For example, cast ('1' as
bigint ) convert s 1 of t he st ring t ype t o t he int eger t ype. If t he conversion fails, an error is ret urned.

Not e
cast (double as bigint ) convert s a value of t he double t ype int o a value of t he bigint t ype.
cast (st ring as bigint ) convert s a value of t he st ring t ype int o a value of t he bigint t ype. If t he
st ring is composed of numerals expressed in int eger form, it is direct ly convert ed int o a value
of t he bigint t ype. If t he st ring is comprised of numerals expressed in t he 'float ' or
'exponent ' form, it is convert ed t o 'double' t ype first and t hen t o 'bigint ' t ype.
For cast (st ring as dat et ime) or cast (dat et ime as > st ring), t he dat at ime format is yyyy-mm-
dd hh:mi:ss by default .

6.7.6.4. COALESCE
Command synt ax :

coalesce(expr1, expr2, ...)

Purpose : It is used t o ret urn t he first non-NULL value in t he list . If all values in t he list are NULL, NULL is
ret urned.

Descript ion:
expr: a value t o be t est ed. All t hese values must be of t he same t ype or be NULL. Ot herwise, an error is
ret urned.

Ret urned value : T he t ype is t he same as t hat of t he input .

Not e At least one paramet er is provided. Ot herwise, an error is ret urned.

6.7.6.5. DECODE
Funct ion declarat ion:

decode(expression, search, result[, search, result]...[, default])

Purpose : It is used t o implement t he if-t hen-else condit ional branching feat ure.

Descript ion:

expression: expression t o be compared.


search: search st ring t o be compared wit h t he expression.
result : t he value ret urned when t he value of search mat ches t he expression.
default : opt ional. If no search st ring mat ches t he expression, t he default value is ret urned. If it is not

> Document Version: 20220928 230


User Guide· MaxComput e SQL MaxComput e

specified, NULL is ret urned.

Ret urned value : T he mat ched search is ret urned. If t here are no mat ches, t he default value is
ret urned. If default is not specified, NULL is ret urned.

Not e
At least t hree paramet ers are specified.
All result s must share t he same t ype or be NULL. Inconsist ent dat a t ypes will cause an error.
All values of search and expression must be of t he same t ype. Ot herwise, an error is
ret urned.
If t he search opt ion in decode has repeat ed records and mat ches t he expression, t he first
search value is ret urned.

Example :

select decode(customer_id,
1, 'Taobao',
2, 'Alipay',
3, 'Aliyun',
NULL, 'N/A',
'Others') as result from sale_detail;

T he preceding DECODE funct ion implement s t he feat ure in t he following if-t hen-else st at ement :

if customer_id = 1 then result := 'Taobao';


elsif customer_id = 2 then result := 'Alipay';
elsif customer_id = 3 then result := 'Aliyun';
...
else
result := 'Others';
end if;

Not ice T he MaxComput e SQL st at ement ret urns NULL when calculat ing NULL = NULL.
However, in t he DECODE funct ion, values of NULL and NULL are equal. In t he preceding example,
when t he value of cust omer_id is NULL, t he DECODE funct ion ret urns N/A.

6.7.6.6. EXPLODE
Funct ion declarat ion:

explode (var)

Purpose : It is used t o convert one row of dat a int o mult iple rows of UDT F. If var is of t he array t ype,
t he array st ored in t he column is convert ed int o mult iple rows. If var is of t he map t ype, each key-value
pair of t he map st ored in t he column is convert ed int o a row wit h t wo columns, wit h one column for t he
key and t he ot her for t he value.

Descript ion:

var: array < T > t ype or map < K,V > t ype.

231 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Ret urned value : t ransposed rows.

Not e

Limit s on t he use of UDT Fs:

Only one UDT F is allowed in a SELECT st at ement , and ot her columns are not allowed.
One select can only have one UDT F and no ot her columns can appear.

Example :

explode(array(null, 'a', 'b', 'c')) col

6.7.6.7. GET_IDCARD_AGE
Funct ion declarat ion:

get_idcard_age(idcardno)

Purpose : It is used t o ret urn t he current age based on t he ID card number. T he current age is t he
current year minus t he birt h year on t he ID card.

Descript ion:

idcardno: st ring t ype, ID number of 15-digit or 18-digit . During t he calculat ion, t he validit y of t he ID card
is verified based on t he province code and t he last check code. If t he verificat ion fails, NULL is ret urned.

Ret urned value : bigint t ype. If t he input is NULL, NULL is ret urned. If t he difference of t he current year
minus t he birt h year is great er t han 100, t hen NULL is ret urned.

6.7.6.8. GET_IDCARD_BIRTHDAY
Funct ion declarat ion:

get_idcard_birthday(idcardno)

Purpose : It is used t o ret urn t he dat e of birt h based on t he ID card number.

Descript ion:

idcardno: st ring t ype, a 15-digit or 18-digit ID card number. During comput at ion, t he validit y of t he ID
card is verified based on t he province code and t he last check code. If t he verificat ion fails, NULL is
ret urned.

Ret urned value : dat et ime t ype. If t he input is NULL, NULL is ret urned.

6.7.6.9. GET_IDCARD_SEX
Funct ion declarat ion:

get_idcard_sex(idcardno)

Purpose : It is used t o ret urn t he gender based on t he ID card number. T he ret urned value is M (male) or
F (female).

> Document Version: 20220928 232


User Guide· MaxComput e SQL MaxComput e

Descript ion:

idcardno: st ring t ype, a 15-digit or 18-digit ID card number. During comput at ion, t he validit y of t he ID
card is verified based on t he province code and t he last check code. If t he verificat ion fails, NULL is
ret urned.

Ret urned value : st ring t ype. If t he input is NULL, NULL is ret urned.

6.7.6.10. GREATEST
Funct ion declarat ion:

greatest(var1, var2, ...)

Purpose : It is used t o ret urn t he maximum value among t he input values.

Descript ion:

var: bigint , double, dat et ime, or st ring t ype. If all values are NULL, NULL is ret urned.

Ret urned value :

T he great est value in input paramet er. If t he implicit conversion is not needed, ret urn t ype is t he same
as input paramet er t ype.
NULL is int erpret ed as t he minimum value.
If t he input paramet ers are of different t ypes, values of t he double, bigint , and st ring t ypes are
convert ed int o values of t he double t ype for comparison, and values of t he st ring and dat et ime
t ypes are convert ed int o values of t he dat et ime t ype for comparison. Implicit conversion of ot her
t ypes is not allowed.

6.7.6.11. INDEX
Funct ion declarat ion:

index(var1[var2])

Purpose : It is used t o ret urn t he specified element in a given array, or ret urn t he value of t he specified
key in a given map.

Descript ion:

var1: array < T > t ype or map < K,V > t ype.
var2: If var1 is of t he array < T > t ype, var2 must be t he bigint t ype must be larger or equal t o 0. If
var1 is of t he map < K,V > t ype, var2 is of t he K t ype.

Ret urned value :

If var1 is of t he array < T > t ype, a value of t he T t ype is ret urned. If var2 is out of range of array < T >
element s, NULL is ret urned.
If var1 is of t he map < K,V > t ype, a value of t he V t ype is ret urned. If no key is var2 in map < K,V >,
NULL is ret urned.

Example :

If var1 is an array, run t he following SQL st at ement :

233 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

select array('a','b','c')[2] from dual;


-- Returned result:
+-----+
| _c0 |
+-----+
| c |
+-----+

If var1 is of t he map t ype, run t he following SQL st at ement :

select str_to_map("test1=1,test2=2")["test1"] from dual;


-- Returned result:
+-----+
| _c0 |
+-----+
| 1 |
+-----+

Not ice
T o use t he SQL st at ement , remove t he index and run var1[var2] direct ly. Ot herwise, a synt ax
error is ret urned.
If Var1 is NULL, NULL is ret urned.

6.7.6.12. MAX_PT
Funct ion declarat ion:

max_pt(table_full_name)

Purpose : For part it ioned t ables, it is used t o ret urn t he maximum values in t he first -level part it ions t hat
have dat a files and sort t he values in alphabet ic order.

Descript ion:

t able_full_name: st ring t ype. It specifies a t able name (project name required, for example, prj.src). You
must have t he read permission on t he t able.
Ret urned value : maximum value in t he primary part it ion.

Example :

Part it ioned t able t bl has t he following part it ions wit h dat a files: pt ='20170901' and pt ='20170902'. In
t he following st at ement , t he ret urned value of max_pt is '20170902'. T he MaxComput e SQL st at ement
reads dat a from t he '20120902' part it ion.

select * from tbl where pt=max_pt('myproject.tbl');

Not e If a new part it ion is added by using alt er t able, but t here is no dat a file in t his part it ion,
t hen t his part it ion is not ret urned.

> Document Version: 20220928 234


User Guide· MaxComput e SQL MaxComput e

6.7.6.13. ORDINAL
Funct ion declarat ion:

ordinal(bigint nth, var1, var2, ...)

Purpose : It is used t o sort t he input variables in ascending order, and ret urn t he specified nt h value.

Descript ion:

nt h: bigint t ype. It specifies t he posit ion at which t he value is t o be ret urned. If it is NULL, NULL is
ret urned.
var: bigint , double, dat et ime, or st ring t ype.

Ret urned value :

T he value in nt h bit . If t he implicit conversion is not needed, ret urn t ype is t he same as input
paramet er t ype.
If t ype conversion is performed, values of t he double, bigint , and st ring t ypes are convert ed int o
values of t he double t ype. Values of t he st ring and dat et ime t ypes are convert ed int o values of t he
dat et ime t ype. Implicit conversion of ot her t ypes is not allowed.
NULL is t he least value.

Example :

ordinal(3, 1, 3, 2, 5, 2, 4, 6) = 2

6.7.6.14. LEAST
Funct ion declarat ion:

least(var1, var2, ...)

Purpose : It is used t o ret urns t he minimum value among t he input values.

Descript ion:

var: bigint , double, dat et ime, or st ring t ype. If all values are NULL, NULL is ret urned.

Ret urned value :

T he least value in input paramet er; If t he implicit conversion is not needed, ret urn t ype is t he same as
input paramet er t ype.
If t ype conversion is performed, values of t he double, bigint , and st ring t ypes are convert ed int o
values of t he double t ype. Values of t he st ring and dat et ime t ypes are convert ed int o values of t he
dat et ime t ype. Implicit conversion of ot her t ypes is not allowed.
NULL is int erpret ed as t he minimum value.

6.7.6.15. SIZE
Funct ion declarat ion:

size(map<K, V>)
size(array<T>)

235 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Purpose : size(map) is used t o ret urn t he number of key-value pairs in t he given map, and size(array) is
used t o ret urn t he number of element s in t he given array.

Descript ion:

map: map t ype.


array: array t ype.

Ret urned value : int t ype.

Example :

select size(map('a',123,'b',456)) from dual;


-- 2 is returned.
select size(map('a',123,'b',456,'c',789)) from dual;
-- 3 is returned.
select size(array('a','b')) from dual;
-- 2 is returned.
select size(array(123,456,789)) from dual;
-- 3 is returned.

6.7.6.16. SPLIT
Funct ion declarat ion:

split(str, pat)

Purpose : It is used t o split a st ring using t he specified separat or.

Descript ion:

st r: st ring t ype. T he st ring t o be separat ed.


pat : st ring t ype. It indicat es t he separat or and support s regular expressions.
Ret urned value : array <st ring >. T he ret urned array cont ains element s ext ract ed from t he st ring based
on t he specified separat or.

Example :

select split("a,b,c",",") from dual;


-- Returned result:
+------+
| _c0 |
+------+
| [a, b, c] |
+------+

6.7.6.17. STR_TO_MAP
Funct ion declarat ion:

str_to_map(text [, delimiter1 [, delimiter2]])

> Document Version: 20220928 236


User Guide· MaxComput e SQL MaxComput e

Purpose : It is used t o divide 't ext ' int o K-V pairs wit h 'delimit er1', and t o separat e each K-V pair wit h
'delimit er2'.

Descript ion:

ext : st ring t ype. It indicat es t he st ring t o be separat ed.

delimit er1: st ring t ype. It is t he delimit er. If it is not specified, t he default value ',' is used.
delimit er2: st ring t ype. It is t he delimit er. If it is not specified, t he default value '=' is used.

Ret urned value : map < st ring, st ring >. T he element s are t he K-V result s of t he separat ion of 't ext ' by
t he st rings 'delimit er1' and 'delimit er2'.

Example :

select str_to_map("test1=1,test2=2") from dual;


-- Returned result:
+------------+
| a |
+------------+
| {Test1: 1, Test2: 2} |

6.7.6.18. UNIQUE_ID
Funct ion declarat ion:

STRING UNIQUE_ID()

Purpose : It is used t o ret urn a random but unique ID, for example, 29347a88-1e57-41ae-bb68-
a9edbdd94212_1. T his funct ion runs more efficient ly t han UUID.

6.7.6.19. UUID
Funct ion declarat ion:

string uuid()

Purpose : It ret urns a random ID, for example, 29347a88-1e57-41ae-bb68-a9edbdd94212.

6.7.6.20. SAMPLE
Funct ion declarat ion:

boolean sample(x, y, column_name)

Purpose : It is used t o sample all values read from t he specified column based on t he given set t ings,
and filt ers out t he rows t hat do not meet t he sampling condit ion.

Descript ion:

x, y: bigint t ype. It indicat es t hat dat a is hashed t o x port ions and t he yt h port ion is t aken. y can be
omit t ed. If y is omit t ed, t he first port ion is t aken and column_name must also be omit t ed. x and y are
const ant s of t he int eger t ype and are great er t han 0. If t hey are of anot her t ype or if t hey are less
t han or equal t o 0, an error is ret urned. If y is great er t han x, an error is ret urned. If eit her x or y is NULL,

237 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

NULL is ret urned.


column_name: t arget column of sampling. column_name can be omit t ed. If column_name is omit t ed,
random sampling is performed based on values of x and y. It can be of any t ype, and t he column
value can be NULL. No implicit conversion is performed. If column_name is t he const ant NULL, an error
is report ed.

Ret urned value : boolean t ype.

Not e T o avoid dat a skew result ing from t he NULL value, a uniform hash of x is made for a
value of NULL in column_name. If column_name is not added, t he out put is not necessarily uniform
since t he dat a size is smaller. So column_name is suggest ed t o be added t o get bet t er out put .

Example :

T able t bla cont ains a column named cola.

select * from tbla where sample (4, 1 , cola) = true;


-- The values are hashed to four portions based on cola, and the first portion is used.
select * from tbla where sample (4, 2) = true;
-- The values in each row are randomly hashed to four portions, and the second portion is u
sed.

6.7.6.21. CASE WHEN expression


MaxComput e provides t he following t wo kinds of CASE WHEN synt ax format s:

case value
when (_condition1) then result1
when (_condition2) then result2
...
else resultn
end
case
when (_condition1) then result1
when (_condition2) then result2
when (_condition3) then result3
...
else resultn
end

CASE WHEN flexibly ret urns different values based on t he calculat ion result of t he expression. Alibaba
Cloud St reamComput e support s t wo t ypes of CASE WHEN expressions:

select
case
when shop_name is null then 'default_region'
when shop_name like 'hang%' then 'zj_region'
end as region
From sale_detail;

> Document Version: 20220928 238


User Guide· MaxComput e SQL MaxComput e

Not e
If t here are values of only t he bigint and double t ype in t he result s, t he result s are convert ed
int o values of t he double t ype.
If t here is a value of t he st ring t ype in t he result s, t he result s are all convert ed int o values of
t he st ring t ype. If t he result of a t ype cannot be convert ed (for example, boolean t ype), an
error is ret urned.
Conversion bet ween ot her t ypes is not allowed.

6.7.6.22. IF
Funct ion declarat ion:

if(testCondition, valueTrue, valueFalseOrNull)

Purpose : It is used t o det ermine whet her 't est Condit ion' is t rue. If it is t rue, valueT rue is ret urned. If it is
not t rue, valueFalseOrNull is ret urned.

Descript ion:

t est Condit ion: boolean t ype. T he expression t o be det ermined t rue or not .

valueT rue: t he value ret urned when expression 't est Condit ion' is t rue.

valueFalseOrNull: t he value ret urned when expression 't est Condit ion' is false. It can be set t o NULL.

Ret urned value : T he t ype is t he same as t hat of valueT rue or valueFalseOrNull.

Example :

select if(1=2,100,200) from dual;


-- Returned result:
+------------+
| _c0 |
+------------+
| 200 |
+------------+

6.7.6.23. Additional functions


MaxComput e 2.0 provides addit ional funct ions.

T he funct ions described in t he following t opics are new in t his version.

6.7.6.24. MAP
Funct ion declarat ion:

map(K key1, V value1, K key2, V value2, ...)

Purpose : It is used t o creat e a map wit h t he given K-V pairs.

Descript ion:

239 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

key/value: T he t ypes of all keys are t he same and must be of one of t he basic t ypes. T he t ypes of all
values are t he same and can be of any t ype.

Ret urned value : map t ype.

Example :

select map('a',123,'b',456) from dual;


-- Returned result:
{a:123, b:456}

6.7.6.25. MAP_KEYS
Funct ion declarat ion:

map_keys(map<K, V> )

Purpose : It is used t o ret urn all keys in t he map paramet er as an array.

Descript ion:

map: dat a of t he map t ype.

Ret urned value : array t ype. If t he input is NULL, NULL is ret urned.

Example :

select map_keys(map('a',123,'b',456)) from dual;


-- Returned result:
[a, b]

6.7.6.26. MAP_VALUES
Funct ion declarat ion:

map_values(map<K, V>)

Purpose : It is used t o ret urn all values in t he map paramet er as an array.

Descript ion:

map: map t ype.

Ret urned value : array t ype. If t he input is NULL, NULL is ret urned.

Example :

select map_keys(map('a',123,'b',456)) from dual;


-- Returned result:
[123, 456]

6.7.6.27. SORT_ARRAY
Funct ion declarat ion:

> Document Version: 20220928 240


User Guide· MaxComput e SQL MaxComput e

sort_array(ARRAY<T>)

Purpose : It is used t o sort a given array.

Descript ion:

ARRAY: array t ype. T he dat a in t he array is of any t ype.

Ret urned value : array t ype.

Example :

select sort_array(array('a','c','f','b')),sort_array(array(4,5,7,2,5,8)),sort_array(array('
You','Me','He')) from dual;
-- Returned result:
[a, b, c, f] [2, 4, 5, 5, 7, 8] [him, you, me]

6.7.6.28. POSEXPLODE
Command synt ax :

posexplode(ARRAY<T>)

Purpose : It is used t o explode t he given array. Each value is given a row and each row has t wo columns
corresponding t o t he subscript (st art ing from 0) and t he array element .

Descript ion:

ARRAY: array t ype. Dat a in t he array can be of any t ype.

Ret urned value : t able generat ion funct ion.

Example :

select posexplode(array('a','c','f','b')) from dual;


-- Returned result:
+------------+-----+
| pos | val |
+------------+-----+
| 0 | a |
| 1 | c |
| 2 | f |
| 3 | b |
+------------+-----+

6.7.6.29. STRUCT
Funct ion declarat ion:

struct(value1,value2, ...)

Purpose : It is used t o creat e a st ruct using a given value list .

Descript ion:

241 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

value: any t ype.

Ret urned value : st ruct t ype. T he field names of t he creat ed st ruct are col1, col2, and so on.

Example :

select struct('a',123,'ture',56.90) from dual;


-- Returned result:
{col1:a, col2:123, col3:true, col4:56.9}

6.7.6.30. NAMED_STRUCT
Funct ion declarat ion:

named_struct(string name1, T1 value1, string name2, T2 value2, ...)

Purpose : It is used t o creat e a st ruct using a given name-value list .

Descript ion:

value: any t ype.


name: field name of t he st ring t ype.

Ret urned value : st ruct t ype. T he field names of t he creat ed st ruct are name1, name2, and so on.

Example :

select named_struct('user_id',10001,'user_name','bob','married','F','weight',63.50) from du


al;
-- Returned result:
{user_id:10001, user_name:bob, married:F, weight:63.5}

6.7.6.31. INLINE
Funct ion declarat ion:

inline(ARRAY<STRUCT<f1:T1, f2:T2, ... >>)

Purpose : It is used t o expand a st ruct , wit h each element corresponding t o a row, and each st ruct
element in each row corresponding t o a column.

Descript ion:

ST RUCT : T he values in t he array can be of any t ype.

Ret urned value : t able generat ion funct ion.

Example :

> Document Version: 20220928 242


User Guide· MaxComput e SQL MaxComput e

select inline(array(named_struct('user_id',10001,'user_name','bob','married','F','weight',6
3.50))) from dual;
-- Returned result:
+------------+-----------+---------+------------+
| user_id | user_name | married | weight |
+------------+-----------+---------+------------+
| 10001 | bob | F | 63.5 |
+------------+-----------+---------+------------+

6.7.6.32. BETWEEN AND expression


Command synt ax :

A [NOT] BETWEEN B AND C

If A, B, or C is NULL, t hen t he value is NULL. If A is great er t han or equal t o B, and less t han or equal t o C,
t he value is t rue. Ot herwise, t he value is false.

Example :

T he emp t able cont ains t he following dat a:

| empno | ename | job | mgr | hiredate| sal| comm | deptno |


7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Run t he following command t o query dat a where sal is great er t han or equal t o 1,000 and less t han or
equal t o 1,500:

243 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

select * from emp where sal BETWEEN 1000 and 1500;


-- Returned result:
+-------+-------+-----+------------+------------+------------+------------+------------+
| empno | ename | job | mgr | hiredate | sal | comm | deptno |
+-------+-------+-----+------------+------------+------------+------------+------------+
| 7521 | WARD | SALESMAN | 7698 | 1981-02-22 00:00:00 | 1250.0 | 500.0 | 30 |
| 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 00:00:00 | 1250.0 | 1400.0 | 30 |
| 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 00:00:00 | 1500.0 | 0.0 | 30 |
| 7876 | ADAMS | CLERK | 7788 | 1987-05-23 00:00:00 | 1100.0 | NULL | 20 |
| 7934 | MILLER | CLERK | 7782 | 1982-01-23 00:00:00 | 1300.0 | NULL | 10 |
| 7956 | TEBAGE | CLERK | 7748 | 1982-12-30 00:00:00 | 1300.0 | NULL | 10 |
+-------+-------+-----+------------+------------+------------+------------+------------+

6.7.6.33. NVL
Funct ion declarat ion:

nvl(T value, T default_value)

Purpose : It is used t o ret urn default _value if value is NULL and ret urn value ot herwise.

Example :

T able t _dat a has t hree columns of c1 st ring, c2 bigint , and c3 dat et ime, as well as t he following dat a:

+----+------------+------------+
| c1 | c2 | c3 |
+----+------------+------------+
| NULL | 20 | 2017-11-13 05:00:00 |
| ddd | 25 | NULL |
| bbb | NULL | 2017-11-12 08:00:00 |
| aaa | 23 | 2017-11-11 00:00:00 |
+----+------------+------------+

Use t he NVL funct ion t o out put t he NULL values in c1 t o 00000, t he NULL values in c2 t o 0, and t he NULL
values in c3 t o "-".

-- Execute the following statement:


SELECT nvl(c1,'00000'),nvl(c2,0) nvl(c3,'-') from nvl_test;
-- Returned result:
+-----+------------+-----+
| _c0 | _c1 | _c2 |
+------------+---------+
| bbb | 0 | 2017-11-12 08:00:00 |
| ddd | 25 | - |
| 00000 | 20 | 2017-11-13 05:00:00 |
| aaa | 23 | 2017-11-11 00:00:00 |
+-----+------------+-----+

6.7.6.34. TABLE_EXISTS

> Document Version: 20220928 244


User Guide· MaxComput e SQL MaxComput e

Funct ion declarat ion:

boolean table_exists(string table_name)

Descript ion: T his funct ion checks whet her a specific t able exist s.

Paramet ers:

t able_name: t he t able name of t he ST RING t ype. T he value can include t he project name, such as
my_proj.my_t able. If no project name is specified, t he name of t he current project is used.

Ret urn value : A value of t he BOOLEAN t ype is ret urned. If t he specified t able exist s, T rue is ret urned.
Ot herwise, False is ret urned.

Example :

-- Used in a SELECT statement.


SELECT IF(table_exists('abd'), col1, col2) FROM src;
-- Used in an IF-ELSE branch statement.
IF (table_exists('abd'))
-- statments
ELSE
-- statments

6.7.6.35. PARTITION_EXISTS
Funct ion declarat ion:

boolean partition_exists(string table_name, string... partitions)

Descript ion: T his funct ion checks whet her a specific part it ion exist s.

Paramet ers:

t able_name: t he t able name of t he ST RING t ype. T he value can include t he project name, such as
my_proj.my_t able. If no project name is specified, t he name of t he current project is used.
part it ions: t he part it ion names of t he ST RING t ype. Set t his paramet er t o t he part it ioning column
values based on part it ion key columns in sequence. T he number of part it ion names must be t he same
as t hat of part it ion key columns.

Ret urn value : A value of t he BOOLEAN t ype is ret urned. If t he specified part it ions exist , T rue is
ret urned. Ot herwise, False is ret urned.

Example :

CREATE TABLE foo (id BIGINT) PARTITIONED BY (ds STRING, hr STRING);


-- Create a partitioned table named foo.
ALTER TABLE foo ADD PARTITION (ds='20190101', hr='1');
-- Add a partition to foo.
SELECT partition_exists('foo', '20190101', '1');
-- Check whether partitions ds='20190101' and hr='1' exist.

6.8. UDFs
245 > Document Version: 20220928
MaxComput e User Guide· MaxComput e SQL

6.8.1. Overview
UDF is short for user defined funct ion. MaxComput e provides a variet y of built -in funct ions. You can also
creat e UDFs based on specific comput ing requirement s. You can use UDFs as using common built -in
funct ions. T his t opic briefs how t o use SQL UDFs. For more informat ion about SQL UDFs, see t he official
document at ion on UDFs.

T he following t able list s t he ext ended UDFs in MaxComput e.

UDF category

UDF category Description

User defined scalar functions are commonly referred to as UDFs.


UDF T here is a one-to-one mapping between the input and output. Each
time a UDF reads a row of data, it writes an output value.

User defined table valued functions are commonly referred to as


UDT Fs. Each time a UDT F is called, it outputs multiple rows of data.
UDT F
UDT Fs are the only category that returns multiple fields. A UDF only
returns one value each time.

User defined aggregation functions are commonly referred to as


UDAFs. A UDAF aggregates multiple input records into one output
record. T here is a many-to-one mapping between input and output.
UDAF
A UDAF can be used together with the GROUP BY clause (SQL) at the
same time. For more information about the syntax, see aggregation
functions.

Not e In general, UDFs refer t o all user defined funct ions: UDFs, UDAFs, and UDT Fs. In a narrow
sense, UDFs only refer t o user defined scalar funct ions. T his t erm is used int erchangeably in t his
document . You will have t o det ermine t he exact meaning based on t he cont ext .

6.8.2. Types of parameters and returned values


UDFs support t he following MaxComput e SQL dat a t ypes:

Basic dat a t ypes: BIGINT , DOUBLE, BOOLEAN, DAT ET IME, DECIMAL, ST RING, T INYINT , SMALLINT , INT ,
FLOAT , VARCHAR, BINARY, and T IMEST AMP.
Complex dat a t ypes: ARRAY, MAP, and ST RUCT .

Not e In UDFs, you can define t he writ able at t ribut e of paramet ers.

T he usage of some basic dat a t ypes (such as T INYINT , SMALLINT , INT , FLOAT , VARCHAR, BINARY, and
T IMEST AMP) in Java UDFs is as follows:

UDAFs and UDT Fs use t he @Resolve annot at ion t o obt ain signat ures. Example: @Resolve("smallint
->varchar(10)") .

UDFs reflect and analyze t he evaluat e() met hod t o obt ain signat ures. In t his case, t here is a one-t o-
one mapping bet ween MaxComput e built -in t ypes and Java t ypes.
T o use complex dat a t ypes (ARRAY, MAP, and ST RUCT ) in Java UDFs, t ake t he following st eps:

> Document Version: 20220928 246


User Guide· MaxComput e SQL MaxComput e

UDT Fs use t he @Resolve annot at ion t o specify signat ures. Example: @Resolve("array<string>,stru
ct<a1:bigint,b1:string>,string->map<string,bigint>,struct<b1:bigint>") .

UDFs use t he signat ure of t he evaluat e() met hod t o map t he input and out put t ypes. For more
informat ion, see t he mappings bet ween MaxComput e t ypes and Java t ypes. In t he preceding
example, ARRAY corresponds t o java.ut il.List , MAP corresponds t o java.ut il.Map, and ST RUCT
corresponds t o com.aliyun.odps.dat a.St ruct .
UDAFs and UDT Fs use t he @Resolve annot at ion t o obt ain signat ures. Example: @Resolve("smallint
->varchar(10)") .

Not ice
You can use type,* t o add any number of paramet ers. Example: @resolve("st ring,*-
>array<st ring>"). Not e t hat you must add a subt ype aft er array.
T he field name and field t ype of com.aliyun.odps.dat a.St ruct cannot be reflect ed.
T herefore, t he @Resolve annot at ion is required. If you want t o use st ruct in a UDF, you must
add t he @Resolve annot at ion t o t he UDF class. T his annot at ion only affect s t he overloads
of paramet ers or ret urned values t hat cont ain com.aliyun.odps.dat a.St ruct .
A class support s only one @Resolve annot at ion. A UDF t hat cont ains st ruct can only reload
paramet ers or ret urned values once.

T he following t able list s t he mapping bet ween MaxComput e and Java dat a t ypes.

Data type mapping

MaxCompute type Java type

T INY INT java.lang.Byte

SMALLINT java.lang.Short

INT java.lang.Integer

BIGINT java.lang.Long

FLOAT java.lang.Float

DOUBLE java.lang.Double

DECIMAL java.math.BigDecimal

BOOLEAN java.lang.Boolean

ST RING java.lang.String

V ARCHAR com.aliyun.odps.data.Varchar

BINARY com.aliyun.odps.data.Binary

DAT ET IME java.util.Date

T IMEST AMP java.sql.T imestamp

247 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

MaxCompute type Java type

ARRAY java.util.List

MAP java.util.Map

ST RUCT com.aliyun.odps.data.Struct

Not e
Java dat a t ypes and t he dat a t ypes of ret urned values are object s, and must st art wit h a
capit alized let t er.
T he NULL value in SQL is represent ed by a NULL reference in Java. T he Java primit ive t ype is
not allowed because it cannot represent a NULL value in SQL.
T he ARRAY t ype in MaxComput e corresponds t o a list , not an array, in Java.

T he following t able compares t he API feat ures of t wo languages.

API feature comparison

Read Read
Supported DAT ET IME
UDF UDAF UDT F resource resource
language type
file table

Pyt ho n Yes Yes Yes Yes Yes Yes

Java Yes Yes Yes Yes Yes Yes

6.8.3. UDFs
A UDF must inherit t he com.aliyun.odps.udf.UDF class and implement t he EVALUAT E met hod. T he
EVALUAT E met hod must be a non-st at ic public met hod. T he t ypes of paramet ers and ret urned values
of t he EVALUAT E met hod are used as t he UDF signat ures in SQL. T his means t hat users can implement
mult iple EVALUAT E met hods in a UDF. When a UDF is called, t he framework mat ches t he correct
EVALUAT E met hod based on t he paramet er t ype called by t he UDF.

Example :

package org.alidata.odps.udf.examples;
import com.aliyun.odps.udf.UDF;
public final class Lower extends UDF { public String evaluate(String s) { if (s == null) {
return null; } return s.toLowerCase();
}
}

Not e You can implement void setup(ExecutionContext ctx) and void close() to
implement UDF init ializat ion and t erminat ion code, respect ively.

UDFs are used in t he same way as built -in funct ions in MaxComput e SQL. For more informat ion, see Built -
in funct ions.

> Document Version: 20220928 248


User Guide· MaxComput e SQL MaxComput e

6.8.4. UDAFs
T o implement a Java UDAF, you must inherit t he com.aliyun.odps.udf.UDAF class and implement t he
following APIs:

public abstract class Aggregator implements ContextFunction {


@Override
public void setup(ExecutionContext ctx) throws UDFException {
}
@Override
public void close() throws UDFException {
}
/**
* Create an aggregate buffer
* @return Writable - Aggregate buffer
*/
abstract public Writable newBuffer();
/**
* @param buffer - Aggregate buffer
* @param args - Parameter specified when SQL calls UDAFs
* @throws UDFException
*/
abstract public void iterate(Writable buffer, Writable[] args) throws UDFException;
/**
* generate final result
* @param buffer
* @return final result of Object UDAF
* @throws UDFException
*/
abstract public Writable terminate(Writable buffer) throws UDFException;
abstract public void merge(Writable buffer, Writable partial) throws UDFException;
}

249 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

T he most import ant APIs are it erat e, merge, and t erminat e. T he primary logic of UDAFs relies on t he
implement at ion of t hese t hree APIs. In addit ion, you must implement a cust om writ able buffer. As an
example, t he following figure briefly illust rat es t he implement at ion logic and comput at ional flow of
t he avg (average value) MaxComput e UDAF funct ion.

In t he preceding figure, t he input dat a is sliced by a cert ain size (for descript ion of slicing, see
MapReduce). T he size of each slice is suit able for a worker t o complet e in an appropriat e period of t ime.
You need t o manually configure t he size of t he slices.
T he UDAF calculat ion process is divided int o t wo phases:

Phase 1: Each Worker count s t he number of dat a rows and t he sum of t he dat a in each slice. T he user
can regard t he count ed number and sum as an int ermediat e result .
Phase 2: T he Worker summarizes t he informat ion gained from t he previous phase wit hin each slice. In
t he final out put , r.sum / r.count is t he average of all input dat a.

T he f ollowing example shows how t o calculat e an average by using a UDAF :

> Document Version: 20220928 250


User Guide· MaxComput e SQL MaxComput e

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import com.aliyun.odps.io.DoubleWritable;
import com.aliyun.odps.io.Writable;
import com.aliyun.odps.udf.Aggregator;
import com.aliyun.odps.udf.UDFException;
import com.aliyun.odps.udf.annotation.Resolve;
@Resolve({"double->double"})
public class AggrAvg extends Aggregator {
private static class AvgBuffer implements Writable { private double sum = 0;
private long count = 0;
@Override
public void write(DataOutput out) throws IOException { out.writeDouble(sum);
out.writeLong(count);
}
@Override
public void readFields(DataInput in) throws IOException { sum = in.readDouble();
count = in.readLong();
}
}
private DoubleWritable ret = new DoubleWritable();
@Override
public Writable newBuffer() { return new AvgBuffer();
}
@Override
public void iterate(Writable buffer, Writable[] args) throws UDFException { DoubleWritable
arg = (DoubleWritable) args[0];
AvgBuffer buf = (AvgBuffer) buffer; if (arg ! = null) {
buf.count += 1; buf.sum += arg.get();
}
}
@Override
public Writable terminate(Writable buffer) throws UDFException { AvgBuffer buf = (AvgBuffer
) buffer;
if (buf.count == 0) { ret.set(0);
} else {
ret.set(buf.sum / buf.count);
}
return ret;
}
@Override
public void merge(Writable buffer, Writable partial) throws UDFException { AvgBuffer buf =
(AvgBuffer) buffer;
AvgBuffer p = (AvgBuffer) partial; buf.sum += p.sum;
buf.count += p.count;
}
}

Not ice

T he SQL synt ax used by UDAFs is t he same as t hat used by common built -in aggregat e funct ions. For

251 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

more informat ion, see Aggregat e funct ions.


T he way t o run UDT Fs is t he same as t hat t o run UDFs. For more informat ion, see Run UDFs.

6.8.5. UDTFs
6.8.5.1. Overview
Java UDT Fs must inherit t he com.aliyun.odps.udf.UDT F class. T his class requires t he implement at ion of
four APIs. T he following t able list s t he definit ions of t hese APIs.

API definitions

API Description

T he initialization method to call the user-defined


public vo id set up(Execut io nCo nt ext ct x)
initialization behavior before a UDT F processes the
t hro w s UDFExcept io n
input data. SET UP is called once first in each worker.

T his method is called by the framework. Each SQL


record calls PROCESS once. T he parameters of
PROCESS are the specified UDT F input parameters in
public vo id pro cess(Object [] args) t hro w s
the SQL statement. T he input parameters are
UDFExcept io n
passed in as Object[], and the results are output by
the FORWARD function. You need to call FORWARD in
the PROCESS function to determine the output data.

T he termination method of UDT F. T his method is


public vo id clo se() t hro w s UDFExcept io n called by the framework for only once after the last
record is processed.

You can call the FORWARD method to output data.


public vo id f o rw ard(Object ...o ) t hro w s Each time FORWARD is called, it outputs one record.
UDFExcept io n T he record corresponds to the column specified by
the UDT F AS clause in the SQL statement.

UDT F example :

package org.alidata.odps.udtf.examples;
import com.aliyun.odps.udf.UDTF;
import com.aliyun.odps.udf.UDTFCollector;
import com.aliyun.odps.udf.annotation.Resolve;
import com.aliyun.odps.udf.UDFException;
// TODO define input and output types, e.g., "string,string->string,bigint".
@Resolve({"string,bigint->string,bigint"}) public class MyUDTF extends UDTF {
@Override public void process(Object[] args) throws UDFException { String a = (String) args
[0];
Long b = (Long) args[1];
for (String t: a.split("\\s+")) { forward(t, b);
}
}
}

> Document Version: 20220928 252


User Guide· MaxComput e SQL MaxComput e

T he preceding example shows how t o creat e a UDT F in MaxComput e. If t his UDT F is named user_udt f,
you can run t he following SQL st at ement t o call t his UDT F:

select user_udtf(col0, col1) as (c0, c1) from my_table;

T he values in my_t able col0 and col1 are as follows:

+------+------+
| col0 | col1 |
+------+------+
| A B | 1 |
| C D | 2 |
+------+------+

T he result of t he SELECT st at ement is as follows:

+----+----+
| c0 | c1 |
+----+----+
| A | 1 |
| B | 1 |
| C | 2 |
| D | 2 |
+----+----+

6.8.5.2. UDTF description


Common uses of UDT Fs in SQL:

select user_udtf(col0, col1) as (c0, c1) from my_table;


select user_udtf(col0, col1) as (c0, c1) from (select * from my_table distribute by col1 so
rt by col1) t;

Not ice

T he following limit s apply t o t he use of UDT F.

No ot her expressions are allowed in a SELECT clause.

select col0, user_udtf(col0, col1) as (c0, c1) from mytable;

UDT Fs cannot be nest ed.

select user_udtf(mp_udtf(col0,col1)) as (c0,c1)from mytable;

UDT F examples

T he user can use a UDT F t o read MaxComput e resources. T he following are examples of reading
MaxComput e resources by using UDT Fs:

1. Writ e UDT F program. T he JAR package (udt fexample1.jar) is export ed aft er compilat ion.

253 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

package com.aliyun.odps.examples.udf;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.Iterator;
import com.aliyun.odps.udf.ExecutionContext;
import com.aliyun.odps.udf.UDFException;
import com.aliyun.odps.udf.UDTF;
import com.aliyun.odps.udf.annotation.Resolve;
/**
* project: example_project
* table: wc_in2
* partitions: p2=1,p1=2
* columns: colc,colb
*/
@Resolve({ "string,string->string,bigint,string" }) public class UDTFResource extends U
DTF { ExecutionContext ctx;
long fileResourceLineCount;
long tableResource1RecordCount;
long tableResource2RecordCount;
@Override
public void setup(ExecutionContext ctx) throws UDFException { this.ctx = ctx;
try {
InputStream in = ctx.readResourceFileAsStream("file_resource.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
fileResourceLineCount = 0;
while ((line = br.readLine()) ! = null) { fileResourceLineCount++;
}
br.close();
Iterator<Object[]> iterator = ctx.readResourceTable("table_resource1").iterator();
tableResource1RecordCount = 0;
while (iterator.hasNext()) { tableResource1RecordCount++; iterator.next();
}
iterator = ctx.readResourceTable("table_resource2").iterator();
tableResource2RecordCount = 0;
while (iterator.hasNext()) { tableResource2RecordCount++;
iterator.next();
}
} catch (IOException e) { throw new UDFException(e);
}
}
@Override
public void process(Object[] args) throws UDFException { String a = (String) args[0];
long b = args[1] == null ? 0 : ((String) args[1]).length();
forward(a, b, "fileResourceLineCount=" + fileResourceLineCount + "|tableResource1Record
Count=" + tableResource1RecordCount + "|tableResource2RecordCount=" + tableResource2Rec
ordCount);
}
}

2. Add resources t o MaxComput e.

> Document Version: 20220928 254


User Guide· MaxComput e SQL MaxComput e

Add file file_resource.txt;


Add jar udtfexample1.jar;
Add table table_resource1 as table_resource1;
Add table table_resource2 as table_resource2;

3. Creat e UDT F funct ion (mp_udt f) in MaxComput e.

create function mp_udtf as com.aliyun.odps.examples.udf.UDTFResource using 'udtfexample


1.jar, file_resource.txt, table_resource1, table_resource2';

4. Creat e resource t ables 't able_resource1' and 't able_resource2' in MaxComput e, and insert t he
corresponding dat a.
5. Run t his UDT F.

select mp_udtf("10","20") as (a, b, fileResourceLineCount) from table_resource1;


-- Command output:
+-------+------------+-------+
| a | b | fileResourceLineCount |
+-------+------------+-------+
| 10 | 2 | fileResourceLineCount=3|tableResource1RecordCount=0|tableResource2RecordCoun
t=0 |
| 10 | 2 | fileResourceLineCount=3|tableResource1RecordCount=0|tableResourc
e2RecordCount=0 |
+-------+------------+-------+

Not e You can also use t he same met hod t o obt ain resources. For more informat ion, see
MapReduce examples.

UDT F examples — Complex dat a t ypes

T he code in t he following example defines a UDF wit h t hree overloads. T he first overload uses array as
t he paramet er; t he second uses map as t he paramet er; and t he t hird uses st ruct as t he paramet er. T he
t hird overload uses a st ruct t ype as t he paramet er or ret urned value, t he UDF class must be
supplement ed wit h a @Resolve annot at ion t o specify t he specific t ype of st ruct .

@Resolve("struct<a:bigint>,string->string")
public class UdfArray extends UDF {
public String evaluate(List<String> vals, Long len) {
return vals.get(len.intValue());
}
public String evaluate(Map<String,String> map, String key) {
return map.get(key);
}
public String evaluate(Struct struct, String key) {
return struct.getFieldValue("a") + key;
}
}

You can import a complex dat a t ype in t he UDF:

255 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

create function my_index as 'UdfArray' using 'myjar.jar';


select id, my_index(array('red', 'yellow', 'green'), colorOrdinal) as color_name from color
s;

6.8.6. Python UDFs


6.8.6.1. Restricted environment
MaxComput e UDF uses Pyt hon V2.7. It execut es user codes in a sandbox. T he following operat ions are
rest rict ed in t he sandbox:

Read and writ e local files.


St art subprocesses.
St art t hreads.
Conduct socket communicat ion.
Call ot her syst ems.
Due t o t hese rest rict ions, user-uploaded code must all be implement ed by Pyt hon, as C ext ension
modules are disabled.

In addit ion, not all modules in t he Pyt hon st andard library are available for use. Modules t hat involve t he
preceding feat ures are disabled. Descript ion of available modules in t he st andard library:

1. All modules implement ed purely by Pyt hon are available.


2. T he following C ext ension modules are available for use.
array
audioop
binascii
_bisect
cmat h
_codecs_cn
_codecs_hk
_codecs_iso2022
_codecs_jp
_codecs_kr
_codecs_t w
_collect ions
cSt ringIO
dat et ime
_funct ools
fut ure_built ins
_hashlib
_heapq
it ert ools

> Document Version: 20220928 256


User Guide· MaxComput e SQL MaxComput e

_json
_locale
_lsprof
mat h
_md5
_mult ibyt ecodec
operat or
_random
_sha256
_sha512
_sha
_st ruct
st rop
t ime
unicodedat a
_weakref
cPickle
3. Some modules have limit ed funct ionalit y. For example, t he sandbox limit s t he size t hat user codes
can writ e t o t he st andard out put and st andard error out put . sys.stdout and sys.stderr can
writ e up t o 20 KB. Any remaining charact ers are ignored.

6.8.6.2. Third-party libraries


Common t hird-part y libraries are inst alled in t he operat ing environment t o supplement t he st andard
library. T he support ed t hird-part y libraries include NumPy.

Warning T he use of t hird-part y libraries is also subject t o rest rict ions. For example, local or
remot e I/O operat ions are prohibit ed. T herefore, t he relat ed APIs in t he t hird-part y libraries are
disabled.

6.8.6.3. Types of parameters and returned values


You can run t he following command t o specify t he t ypes of paramet ers and ret urned values:

@odps.udf.annotate(signature)

Pyt hon UDFs support t he following MaxComput e SQL dat a t ypes: bigint , st ring, double, boolean, and
dat et ime. Before you run a SQL st at ement , you must specify t he paramet er t ypes and ret urned value
t ypes of all funct ions. Pyt hon is a dynamically-t yped language. You need t o add decorat ors t o t he UDF
class t o specify t he funct ion signat ure.

T he funct ion signat ure is specified by a st ring. T he synt ax is as follows:

257 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

arg_type_list '->' type_list


arg_type_list: type_list | '*' | ''
type_list: [type_list ','] type
type: 'bigint' | 'string' | 'double' | 'boolean' | 'datetime'

Not e
T he part t o t he left of t he arrow indicat es t he t ype of paramet er. T he part t o t he right of
t he arrow indicat es t he t ype of ret urned value.
T he ret urned value of a UDT F can cont ain mult iple columns. T he ret urned value of a UDF or
UDAF can cont ain only one column.
* represent s a variable argument . If a variable argument is specified, t he UDF, UDT F, or UDAF
can mat ch any t ype of paramet er.

Examples of valid signat ure:

'bigint,double->string'
-- The parameter is of the bigint or double type, and the returned value is of the string t
ype.
'bigint,boolean->string,datetime'
-- The UDTF parameter is of the bigint or boolean type, and the returned value is of the st
ring or datetime type.
'*->string'
-- Specify a variable argument: The input parameter can be of any type, and the returned va
lue is of the string type.
'->double'
-- The parameter is NULL and the returned value is of the double type.

If an invalid signat ure is found during query parsing, an error is ret urned and t he execut ion is banned.
During execut ion, t he UDF paramet er wit h t he t ype specified by t he funct ion signat ure is t ransferred t o
t he user. T he user ret urned value must be of t he t ype specified by t he funct ion signat ure. Ot herwise, an
error is ret urned. T he following t able shows t he mappings bet ween MaxComput e SQL t ypes and Pyt hon
t ypes.

Mapping

MaxCompute SQL type Python type

Bigint int

String str

Double float

Boolean bool

Datetime int

> Document Version: 20220928 258


User Guide· MaxComput e SQL MaxComput e

Not e
A value of t he dat et ime t ype is passed t o user code as t he int t ype. T he value is t he number
of milliseconds t hat have elapsed since t he epoch t ime. You can use t he dat et ime module in
t he Pyt hon st andard library t o process t he dat et ime t ype.
NULL corresponds t o none in Pyt hon.

In addit ion, t he paramet er of odps.udf.int (value[, silent =T rue]) is modified. Paramet er silent is added. If
silent is t rue and t he value cannot be convert ed t o t he int t ype, none is ret urned inst ead of an error.

6.8.6.4. UDFs
Implement ing a Pyt hon UDF is as easy as defining a new-st yle class and implement ing t he evaluat e
met hod.

Example :

from odps.udf import annotate


@annotate("bigint,bigint->bigint")
class myplus (object ):
def evaluate (self, arg0, arg1 ):
If none in (arg0, arg1 ):
return none
return arg0 + arg1

Not ice A Pyt hon UDF must have it s signat ure specified t hrough annot at e.

6.8.6.5. UDAFs
Descript ion:

class odps.udf.BaseUDAF: inherit t his class t o implement a Pyt hon UDAF.


BaseUDAF.new_buffer(): implement t his met hod and ret urn t he median 'buffer' of t he aggregat e
funct ion. Buffer must be mut able object (such as list and dict ). T he size of t he buffer should not
increase wit h t he amount of dat a. T he buffer size should not exceed 2 MB aft er marshal.
BaseUDAF.it erat e(buffer[, args, ...]): T his met hod aggregat es args int o t he median buffer.
BaseUDAF.merge(buffer, pbuffer): T his met hod aggregat es t wo median buffers; t hat is, aggregat e
pbuffer int o buffer.
BaseUDAF.t erminat e(buffer): T his met hod convert s t he median 'buffer' int o t he MaxComput e SQL
basic t ypes.

T he following example shows how t o calculat e an average by using a UDAF:

259 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

#coding:utf-8
from odps.udf import annotate
from odps.udf import BaseUDAF
@annotate('double->double')

class Average(BaseUDAF):
def new_buffer(self):
return [0, 0]
def iterate(self, buffer, number):
If number is not None:
buffer[0] += number
buffer[1] += 1
def merge(self, buffer, pbuffer):
buffer [0] + = pbuffer [0]
buffer [1] + = pbuffer [1]
def terminate (self, buffer ):
If buffer [1] = 0:
return 0.0
return buffer[0] / buffer[1]

6.8.6.6. UDTFs
T he paramet ers are described as follows.

Parameters

Parameter Description

Base class for a Python UDT F. Users inherit this class and
class o dps.udf .BaseUDT F
implement methods such as PROCESS and CLOSE.

Initialization method. T o implement this method for an inherited


class, you must call the initialization method super(BaseUDT F,
self).init() for the base class at the beginning . T he INIT method
BaseUDT F.init ()
will only be called once during the entire UDT F life cycle; that is,
before the first record is processed. When the UDT F needs to
save internal states, all states can be initialized in this method.

T he method is called by the MaxCompute SQL framework. T he


process method is called for each record passed in from SQL.
BaseUDT F.pro cess([args, ...])
T he parameters passed into the process method are the
parameters passed into the UDT F in SQL statements.

T he UDT F output method, which is called by user code. Each


time FORWARD is called, one record is output. T he parameters
BaseUDT F.f o rw ard([args, ...])
of FORWARD are the UDT F output parameters specified in SQL
statements.

T he UDT F termination method. T his method is called by the


BaseUDT F.clo se() MaxCompute SQL framework.T his method is called only once,
after the last record is processed.

Example :

> Document Version: 20220928 260


User Guide· MaxComput e SQL MaxComput e

#coding:utf-8
# explode. py
from odps.udf import annotate

from odps.udf import BaseUDTF


@annotate('string -> string')
class Explode(BaseUDTF):
-- Output string as multiple comma-separated records.
def process(self, arg):
props = arg.split(',')
for p in props:
self.forward(p)

Not ice A Pyt hon UDT F can also specify t he paramet er t ype or ret urned value t ype wit hout
adding 'annot at e'. In t his case, t he funct ion can mat ch any input paramet er in SQL. T he t ype of
ret urned value cannot be deduced, but all out put paramet ers will be considered t o be of t he st ring
t ype. T herefore, when FORWARD is called, all out put values must be convert ed int o values of t he
st ring t ype.

6.8.6.7. Reference resources


You can reference file and t able resources in Pyt hon UDF t hrough t he odps.dist cache module.

Synt ax for referencing file resources:

odps.distcache.get_cache_file(resource_name)

Not e
Descript ion: ret urns t he cont ent of t he specified resource. resource_name is a st ring t hat
corresponds t o t he name of an exist ing resource in t he current project . If t he resource name
is invalid or does not exist , an error is ret urned.
Ret urned value: ret urns file-like object . Aft er t his object is used, t he caller must call t he
CLOSE met hod t o release t he resource file t hat is opened.

Example :

261 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

from odps.udf import annotate


from odps.distcache import get_cache_file
@annotate('bigint->string')
class DistCacheExample(object):
def __init__(self):
cache_file = get_cache_file('test_distcache.txt')
kv = {}
for line in cache_file:
line = line.strip()
if not line:
continue
k, v = line.split()
kv[int(k)] = v
cache_file.close()
self.kv = kv
def evaluate(self, arg):
return self.kv.get(arg)

Command synt ax:

odps.distcache.get_cache_table(resource_name)

Not e
Descript ion: ret urns t he cont ent of t he specified resource t able. resource_name is a st ring
t hat corresponds t o t he name of an exist ing resource t able in t he current project . If t he
resource t able name is invalid or does not exist , an error is ret urned.
Ret urned value: ret urns a value of t he generat or t ype. T he caller t raverses t he t able t o
obt ain t he cont ent . Each t ime t he caller t raverses t he t able, a record is obt ained in t he form
of a t uple.

Example :

from odps.udf import annotate


from odps.distcache import get_cache_table
@annotate('->string')
class DistCacheTableExample(object):
def __init__(self):
self.records = list(get_cache_table('udf_test'))
self.counter = 0
self.ln = len(self.records)
def evaluate(self):
if self.counter > self.ln - 1:
return None
ret = self.records[self.counter]
self.counter += 1
return str(ret)

6.9. UDTs
> Document Version: 20220928 262
User Guide· MaxComput e SQL MaxComput e

6.9.1. Overview
User-defined t ypes (UDT s) are int roduced in MaxComput e 2.0 for t he lat est version of t he SQL engine.
UDT s allow you t o reference classes or object s of t hird-part y languages in SQL st at ement s t o obt ain
dat a or call met hods.

UDT s are t ypically applied in t he following scenarios:

Scenario 1 : MaxComput e does not have built -in funct ions t o complet e t asks t hat can be easily
performed using ot her languages. For example, t here are some t asks t hat can be performed by
calling a single built -in Java class. Performing t hese t asks wit h user defined funct ions (UDFs) is
complex.
Scenario 2 : You need t o call a t hird-part y library in SQL st at ement s t o implement t he corresponding
feat ure. You want t o use a feat ure provided by a t hird-part y library direct ly in a SQL st at ement ,
inst ead of wrapping t he feat ure inside a UDF.
Scenario 3 : SELECT T RANSFORM allows you t o include object s and classes in SQL st at ement s t o
make t hese SQL st at ement s easier t o read and maint ain. For some languages, such as Java, t he source
code can be only execut ed aft er it is compiled. You want t o reference object s and classes of t hese
languages in SQL st at ement s.

Not ice
UDT s only support Java.
All operat ors use t he semant ics of MaxComput e SQL.
UDT s cannot be used as shuffle keys in t he JOIN, GROUP BY, DIST RIBUT E BY, SORT BY, ORDER
BY, and CLUST ER BY clauses.
DDL st at ement s do not support UDT s. You cannot creat e t ables t hat cont ain UDT object s.
T he final out put cannot be UDT t ypes.

6.9.2. Feature summary


UDT s allow you t o reference classes or object s of t hird-part y languages in SQL st at ement s t o obt ain
dat a or call met hods.

T he UDT s support ed in MaxComput e are very different from t hose in ot her SQL engines.

UDT s support ed by ot her SQL engines are similar t o t he st ruct composit e t ype in MaxComput e. UDT s
support ed by MaxComput e are similar t o t he CREAT E T YPE st at ement . A UDT cont ains bot h fields and
met hods. Addit ionally, MaxComput e does not require t hat you use Dat a Definit ion Language (DDL)
st at ement s t o define t ype mappings. MaxComput e allows you t o reference t ypes direct ly in SQL
st at ement s.

Example:

set odps.sql.type.system.odps2=true;
SELECT Integer.MAX_VALUE;
-- A similar output is displayed:
+-----------+
| max_value |
+-----------+
| 2147483647 |
+-----------+

263 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

T he expression in t he preceding SELECT st at ement is similar t o a Java expression and execut ed in t he


same manner as it would in Java. T he expression specifies a UDT in MaxComput e.

You can use UDFs t o implement all feat ures provided by UDT s, but wit h some complexit y. If you use a
UDF t o implement t he same feat ure, you need t o follow t hese st eps:

1. Define a UDF class.

package com.aliyun.odps.test;
public class IntegerMaxValue extends com.aliyun.odps.udf.UDF {
public Integer evaluate() {
return Integer.MAX_VALUE;
}
}

2. Compile t he UDF as a JAR package. Upload t he JAR package and creat e a funct ion.

add jar odps-test.jar;


create function integer_max_value as 'com.aliyun.odps.test.IntegerMaxValue' using 'odps
-test.jar';

3. Call t he funct ion in a SQL st at ement .

select integer_max_value();

A UDT simplifies t his procedure. By using UDT s, you can use feat ures provided by ot her languages in SQL
st at ement s.

6.9.3. Feature description


T he example described in t he Feat ure overview t opic demonst rat es how t o use user-defined t ypes
(UDT s) t o access t he st at ic fields of Java classes. UDT s can be used t o implement a number of funct ions.
T he following example shows a UDT execut ion procedure and it s feat ures.

-- Sample data
@table1 := select * from values ('100000000000000000000') as t(x);
@table2 := select * from values (100L) as t(y);
-- Code logic
@a := select new java.math.BigInteger(x) x from @table1; -- Create an object by us
ing the new method.
@b := select java.math.BigInteger.valueOf(y) y from @table2; -- Call a static method.
select /*+mapjoin(b)*/ x.add(y).toString() from @a a join @b b; -- Call an instance metho
d.

T he following result is ret urned:


100000000000000000100

Not e T his example also shows how t o use subqueries wit h UDT columns. User-defined
funct ions (UDFs) cannot be used in such subqueries. Variable a in t he x column is of t he
java.mat h.BigInt eger class, not a built -in class. You can pass UDT dat a t o anot her operat or and
t hen call t he required met hod. You can also use UDT dat a in dat a shuffling.

> Document Version: 20220928 264


User Guide· MaxComput e SQL MaxComput e

UDT execution
Example

T he preceding figure shows t he t hree st ages of a UDT : M1, R2, and J3. Only t he new
java.math.BigInteger(x) met hod is called at t he M1 st age. T he java.math.BigInteger.valueOf(y) and
x.add(y).toString() met hods are called at t he J3 st age.
If a JOIN clause is used in MapReduce, dat a must be reshuffled. As a result , dat a is processed at mult iple
st ages. Dat a is processed at different st ages or even by different processes or physical machines. T he
UDT encapsulat es t hese st ages and funct ions as a JVM.

Description
UDT s support only Java.

UDT s also allow you t o upload JAR packages and direct ly reference t hese packages. Some flags are
provided for UDT s.

265 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

set odps.sql.session.resources: specifies t he resource t hat you want t o reference. Separat e


mult iple resources wit h commas (,). For example, you can set t his flag t o foo.sh,bar.txt .
Example:

set odps.sql.type.system.odps2=true;
set odps.sql.session.resources=odps-test.jar; -- Specify the JAR package that you want
to reference. Before you reference the JAR package, upload the package to your project.
select new com.aliyun.odps.test.IntegerMaxValue().evaluate();

Not ice T his flag is t he same as t he flag t hat is used t o specify resources in t he SELECT
T RANSFORM st at ement . T herefore, t his flag affect s JAR package uploading in UDT s and
resource set t ings in t he SELECT T RANSFORM st at ement .

odps.sql.session.java.import s: specifies t he default Java package. Separat e mult iple Java


packages wit h commas (,). T his flag is similar t o t he IMPORT st at ement in Java. You can specify a
classpat h, such as java.math.BigInteger , or use * . Static import is not support ed.
Example:

set odps.sql.type.system.odps2=true;
set odps.sql.session.resources=odps-test.jar;
set odps.sql.session.java.imports=com.aliyun.odps.test. *; -- Specify the default Java
package.
select new IntegerMaxValue().evaluate();

UDT s allow you t o:

Creat e object s by using t he new met hod.


Creat e arrays by using t he new met hod, including ArrayList init ializat ion. Example: new Integer[]
{ 1, 2, 3 } .

Call met hods, including st at ic met hods. You can creat e object s in t he fact ory met hod pat t ern.
Access fields, including st at ic fields.

Not ice
Ident ifiers in UDT s cont ain package names, class names, met hod names, and field names.
All ident ifiers are case-sensit ive.
UDT s support SQL t ype conversions, such as cast (1 as java.lang.Object ). UDT s do not
support Java t ype conversions, such as (Object )1.
Anonymous classes and lambda expressions are not support ed.
Funct ions t hat do not ret urn values cannot be called in UDT s.

Not e UDT s are used in expressions. Funct ions t hat do not ret urn values cannot
be called in expressions.

All SDK for Java classes can be referenced by UDT s. T he JDK runt ime environment is JDK 1.8. Lat er
versions may not be support ed.
All operat ors use t he semant ic of MaxComput e SQL. T he result of String.valueOf(1) + String.val
ueOf(2) is 3. T he t wo st rings are implicit ly convert ed t o DOUBLE-t ype values and summed. If you use

> Document Version: 20220928 266


User Guide· MaxComput e SQL MaxComput e

Java st ring concat enat ion t o combine t he st rings, t he result is 12.

You may be confused about t he role of t he = operat or. T he = operat or in SQL st at ement s is
used as a comparison operat or. It is used t o compare one expression wit h anot her expression. You
must call t he equals met hod in Java t o compare whet her t wo object s are equivalent . T he = operat or
cannot be used t o verify t he equivalence of t wo object s.

Java dat a t ypes are mapped t o built -in dat a t ypes. T he mapping can be applied t o UDT s.
You can direct ly call t he met hod of t he Java t ype t o which t he built -in t ype is mapped. Example:
'123'.length() , 1L.hashCode() .

UDT s can be used in built -in funct ions and UDFs. For example, in chr(Long.valueOf('100')) , Lo
ng.valueOf ret urns a value of t he java.lang.Long t ype. T he CHR built -in funct ion support s t he
built -in BIGINT t ype.
T he dat a of a Java primit ive t ype is aut omat ically convert ed t o t he boxing t ype and t he preceding
t wo rules are applied.

Not ice For some new built -in dat a t ypes, you must use set odps.sql.type.system.odps2
=true; t o declare t hese t ypes. Ot herwise, an error occurs.

UDT s complet ely support Java generics. For example, based on t he paramet er t ype, t he compiler can
det ermine t hat t he value ret urned by java.util.Arrays.asList(new java.math.BigInteger('1'))
is java.util.List<java.math.BigInteger> .

Not ice You must specify t he t ype paramet er in a const ruct or funct ion or use
java.lang.Object . T his is t he same as Java. For example, t he result of new java.util.ArrayList(j
ava.util.Arrays.asList('1', '2')) is of t he java.util.ArrayList<Object> t ype. T he result
of new java.util.ArrayList<String>(java.util.Arrays.asList('1', '2')) is of t he java.u
til.ArrayList<String> t ype.

UDT s do not have a clear definit ion of object equalit y . T his is caused by dat a reshuffling. T he JOIN
example shows t hat object s may be t ransmit t ed bet ween different processes or physical machines.
During t ransmission, an object may be referenced as t wo different object s. For example, an object
may be shuffled t o t wo machines and t hen reshuffled.

T herefore, when you use UDT s, you must use t he equals met hod inst ead of t he = operat or t o equat e
t wo object s.

Not e Object s in t he same row or column are correlat ed in some way. However, a
correlat ion bet ween object s in different rows or columns cannot be ensured.

UDT s cannot be used as shuffle keys in clauses, such as JOIN, GROUP BY, DIST RIBUT E BY, SORT BY,
ORDER BY, or CLUST ER BY.

UDT s can be used at t he st ages in expressions, but cannot be used as out put s. For example, you
cannot call t he group by new java.math.BigInteger('123') met hod. However, you can call t he
group by new java.math.BigInteger('123').hashCode() met hod. T his is because t he value ret urned
by hashCode is an int .class t ype, which can be used as t he built -in INT t ype.

T he following t ype conversion rules are ext ended in UDT s:


UDT object s can be implicit ly convert ed t o t he object s of t heir base classes.

267 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

UDT object s can be forcibly convert ed t o t he object s of t heir base classes or subclasses.
T he dat a t ype conversion for t wo object s wit hout inherit ance follows nat ive conversion rules.

Not ice T he conversion may cause dat a changes. For example, dat a of t he
java.lang.Long t ype can be forcibly convert ed t o t he java.lang.Int eger t ype. T his conversion
uses t he rules t hat are used t o convert t he built -in BIGINT t ype t o t he INT t ype. T his process
may cause dat a changes or even dat a precision loss.

UDT object s cannot be saved or added t o t ables. DDL st at ement s do not support UDT s. You cannot
creat e t ables t hat cont ain UDT object s unless t he dat a t ype is implicit ly convert ed t o one of t he
built -in t ypes. In addit ion, t he out put cannot be a UDT . However, you can call t he t oSt ring() met hod
t o convert t he dat a t ype t o t he java.lang.St ring t ype because t he t oSt ring() met hod support s all Java
classes. You can use t his met hod t o check UDT dat a during debugging.

You can also add t he set odps.sql.udt.display.tostring=true; flag t o enable MaxComput e t o


convert all out put UDT dat a t o st rings by calling t he java.util.Objects.toString(...) met hod
for debugging.

Not e T his flag is t ypically used for debugging because it can be applied only t o PRINT
st at ement s. It cannot be applied t o INSERT st at ement s.

BINARY is a built -in t ype and support s aut omat ic serializat ion. You can save t he byt e[] arrays. T he
saved byt e[] arrays can be deserialized t o t he BINARY t ype.

Some classes may have t heir own serializat ion and deserializat ion met hods, such as prot obuffer. T o
save UDT s, you must call serializat ion and deserializat ion met hods t o convert t he dat a t ype t o
BINARY.

You can use UDT s t o achieve t he feat ure provided by t he SCALAR funct ion. You can use t he
COLLECT _LIST and EXPLODE built -in funct ions wit h UDT s t o achieve t he feat ures provided by
aggregat e and t able-valued funct ions.
UDT s support resource access. You can call t he com.aliyun.odps.udf.impl.UDTExecutionContext.ge
t() st at ic met hod t o obt ain t he Execut ionCont ext object . T hen, use t he object t o access t he
current execut ion cont ext and t hen t o access resources, such as files and t ables.

6.9.4. More examples


6.9.4.1. Example of using Java arrays
Example:

> Document Version: 20220928 268


User Guide· MaxComput e SQL MaxComput e

set odps.sql.type.system.odps2=true;
set odps.sql.udt.display.tostring=true;
select
new Integer[10], -- Create an array that contains 10 elements.
new Integer[] {c1, c2, c3}, -- Create an array that contains three elements by initial
izing an ArrayList.
new Integer[][] { new Integer[] {c1, c2}, new Integer[] {c3, c4} }, -- Create a multid
imensional array.
new Integer[] {c1, c2, c3} [2], -- Access the elements in the array using indexes.
java.util.Arrays.asList(c1, c2, c3); -- This is another way to create a built-in arr
ay. It creates a List<Integer>, which can be used as an array<int>.
from values (1,2,3,4) as t(c1, c2, c3, c4);

6.9.4.2. Example of using JSON


T he runt ime of UDT carries a GSON dependency (version 2.2.4), which can be direct ly used in GSON.

Example:

set odps.sql.type.system.odps2=true;
set odps.sql.session.java.imports=java.util.*,java,com.google.gson. *; -- To import multipl
e packages, separate the packages with commas (,).
@a := select new Gson() gson; -- Create a GSON object.
select
gson.toJson(new ArrayList<Integer>(Arrays.asList(1, 2, 3))), -- Convert an object to a JSON
string.
cast(gson.fromJson('["a","b","c"]', List.class) as List<String>) --Deserialize the JSON str
ing. GSON also forcibly converts the deserialized result from List<Object> type to List<Str
ing> type.
from @a;

Compared wit h built -in funct ion GET _JSON_OBJECT , t his met hod is simple and improves efficiency by
ext ract ing cont ent from t he JSON st ring and deserializing t he st ring t o a support ed dat a t ype.

In addit ion t o GSON dependencies, MaxComput e runt ime also carries ot her dependencies, including
commons-logging (1.1.1), commons-lang (2.5), commons-io (2.4), and prot obuf-java (2.4.1).

6.9.4.3. Example of using composite types


Built -in t ypes of array and map are mapped t o java.ut il.List and java.ut il.Map, respect ively.

Java object s in classes calling t he java.ut il.List or java.ut il.Map API can be used in MaxComput e SQL
composit e t ype dat a processing.
Array and map t ype dat a in MaxComput e can direct ly call t he java.ut il.List or java.ut il.Map API.

Example:

269 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

set odps.sql.type.system.odps2=true;
set odps.sql.session.java.imports=java.util.*;
select
size(new ArrayList<Integer>()), -- Call built-in function size to obtain the siz
e of the ArrayList.
array(1,2,3).size(), -- Call the List method for built-in type array.
sort_array(new ArrayList<Integer>()), -- Sort the data in the ArrayList.
al[1], -- The Java List method does not support indexin
g. However, the array type supports indexing.
Objects.toString(a), -- With this method, you can convert array type to string t
ype data.
array(1,2,3).subList(1, 2) -- Get a sublist.
from (select new ArrayList<Integer>(array(1,2,3)) as al, array(1,2,3) as a) t;

6.9.4.4. Example of aggregation


T o achieve aggregat ion wit h UDT s, you must first use built -in funct ion COLLECT _SET or COLLECT _LIST t o
convert t he dat a t o t he List t ype and t hen call t he UDT met hods t o aggregat e t he dat a.

T he following example shows how t o obt ain t he median from BigInt eger dat a. You cannot direct ly call
t he built -in MEDIAN funct ion because t he dat a is java.mat h.BigInt eger t ype.

set odps.sql.session.java.imports=java.math.*;
@test_data := select * from values (1),(2),(3),(5) as t(value);
@a := select collect_list(new BigInteger(value)) values from @test_data; -- Aggregate the
data to a list.
@b := select sort_array(values) as values, values.size() cnt from @a; -- To obtain the med
ian, first sort the data.
@c := select if(cnt % 2 == 1, new BigDecimal(values[cnt div 2]), new BigDecimal(values[cnt
div 2 - 1].add(values[cnt div 2])).divide(new BigDecimal(2))) med from @b;
-- Final output.
select med.toString() from @c;

You cannot use t he COLLECT _LIST funct ion t o implement part ial aggregat ion because it aggregat es all
dat a. It is more efficient t o use t he built -in aggregat or or UDAF object . We recommend t hat you use t he
built -in aggregat or. Aggregat ing all dat a in a group increases t he risk of dat a skew.

If t he logic of t he UDAF object is t o aggregat e all dat a in a similar manner t o built -in funct ion
WM_CONCAT , using t he COLLECT _LIST funct ion is more efficient t han using t he UDAF object .

6.9.4.5. Example of using table-valued functions


T able-valued funct ions allow you t o input and out put mult iple rows and columns. T o input or out put
mult iple rows and columns, follow t hese st eps:

1. For more informat ion about how t o input mult iple rows or columns, see t he example of using
aggregat e funct ions.
2. T o out put mult iple rows, you can use a UDT t o define a Collect ion t ype (List or Map), and t hen call
t he EXPLODE funct ion t o split t he collect ion int o mult iple rows.
3. A UDT can cont ain mult iple fields. You can ret rieve t he dat a from t he fields by calling different
get t er met hods. T he dat a is t hen out put in mult iple rows.

T he following example shows how t o split a JSON st ring and out put t he result as mult iple columns:

> Document Version: 20220928 270


User Guide· MaxComput e SQL MaxComput e

@a := select '[{"a":"1","b":"2"},{"a":"1","b":"2"}]' str; -- Sample data


@b := select new com.google.gson.Gson().fromJson(str, java.util.List.class) l from @a; -- D
eserialize the JSON string.
@c := select cast(e as java.util.Map<Object,Object>) m from @b lateral view explode(l) t as
e; -- Call the EXPLODE function to split the string.
@d := select m.get('a') as a, m.get('b') as b from @c; -- Output the splitting result in mu
ltiple columns.
select a.toString() a, b.toString() b from @d; -- The final output. Columns a and b in vari
able d are of the Object type.

6.9.5. Feature advantages


UDT has t he following feat ures:

Easy t o use. You do not need t o define any funct ions.


T o improve t he flexibilit y of SQL, all JDK support ed feat ures can be used direct ly.
You can direct ly reference object s and classes of ot her languages in SQL st at ement s.
You can direct ly reference t he libraries of ot her language and reuse code t hat you have writ t en in
ot her languages.
You can creat e object -orient ed feat ures.

6.9.6. Performance advantages


UDT s and UDFs use similar execut ion procedures and provide similar performance. However, UDT s have
higher performance in cert ain scenarios where t he comput e engine has been great ly improved.

Deserializat ion is not required for object s in only one process. Deserializat ion is required only when
t he object s are t ransmit t ed among processes. T his means t hat UDT do not incur any serializat ion or
deserializat ion overhead when no dat a reshuffling is performed, such as calling t he join or
aggregat or funct ion.
UDT s suffer no performance loss from reflect ion because t he runt ime of UDT s is based on Codegen,
rat her t han based on reflect ion.
Mult iple UDT s can be wrapped int o a single funct ion call and execut ed t oget her. In t he following
example, a single UDT is being called. UDT s focus on small-granularit y dat a processing. T his does not
incur addit ional overhead for t he API where mult iple funct ions are called.

values[x].add(values[y]).divide(java.math.BigInteger.valueOf(2))

6.9.7. Security advantages


UDT s are rest rict ed in t he Java sandbox model similar t o UDFs. T o perform rest rict ed operat ions, you
must enable sandbox isolat ion or apply t o join t he sandbox whit elist .

6.10. UDJ
6.10.1. Overview
MaxComput e provides mult iple JOIN met hods nat ively, including INNER JOIN, RIGHT JOIN, OUT ER JOIN, LEFT
JOIN, FULL JOIN, SEMIJOIN, and ANT ISEMIJOIN met hods. You can use t hese nat ive JOIN met hods in most
scenarios. However, t hese met hods cannot handle mult iple t ables.

271 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

In most cases, you can build your code framework using UDFs. However, t he current UDF, UDT F, and
UDAF frameworks only can handle one t able at a t ime. T o perform user-defined operat ions for mult iple
t ables, you have t o use nat ive JOIN met hods, UDFs, UDT Fs, and complex SQL st at ement s. In cert ain cases
when you handle mult iple t ables, you must use a cust om MapReduce framework inst ead of SQL t o
complet e t he required t ask.

In any sit uat ion, t hese operat ions require t echnological expert ise and may cause t he following
problems:

Calling mult iple JOIN met hods in SQL st at ement s can lead t o comput at ional black box t hat is complex
and difficult t o execut e wit h minimal overheads.
Using MapReduce even make opt imal execut ion of code becomes impossible. Most of t he MapReduce
code is writ t en in Java. T he execut ion of t he MapReduce code is less efficient t han t he execut ion of
MaxComput e code generat ed by t he LLVM code generat or at an opt imized nat ive runt ime.

Wit h t he addit ion of t he MaxComput e 2.0 comput e engine, t he user defined join (UDJ) API has been
added t o t he user defined funct ion (UDF) framework. T his API allows you t o handle mult iple t ables and
simplifies operat ions performed in t he underlying MapReduce dist ribut ed syst em.

6.10.2. UDJ usage


6.10.2.1. Examples
T he following example describes how t o use UDJ in MaxComput e.

T his example uses t he payment t able and t he user_client _log t able.

T he payment (user_id st ring,t ime dat et ime,pay_info st ring) t able st ores t he payment informat ion of a
user. Each payment record includes t he user ID, payment t ime, and t he payment det ails.
T he user_client _log (user_id st ring,t ime dat et ime,cont ent st ring) t able st ores user client records,
including t he user ID, operat ion t ime, and operat ion.

Requirement s: For each record in t he user_client _log t able, locat e t he payment record t hat has t he
t ime closest t o t he operat ion t ime, and join and out put t he cont ent of bot h records.

T o complet e t his t ask by using st andard join met hods, you would need t o join t he t wo t ables based on
t heir common user_id fields, and t hen locat e t he payment record and operat ion t hat most closely
mat ch each ot her's t ime. T he SQL st at ement may be writ t en as follows:

SELECT
p.user_id,
p.time,
merge(p.pay_info, u.content)
FROM
payment p RIGHT OUTER JOIN user_client_log u
ON p.user_id = u.user_id and abs(p.time - u.time) = min(abs(p.time - u.time))

However, when you join t wo rows in t he t ables, you must calculat e t he minimum difference bet ween
t he p.t ime and u.t ime under t he same user_id, and t he aggregat e funct ion cannot be called in t he join
condit ion. Because of t his, t his t ask cannot be complet ed by calling t he st andard JOIN met hod.

Can we use UDJ t o solve t his problem? Yes. T he following t opics describe how t o use UDJ t o sat isfy t he
preceding requirement s.

> Document Version: 20220928 272


User Guide· MaxComput e SQL MaxComput e

6.10.2.2. Use Java to write the UDJ code

Prerequisites
UDJ is a new feat ure, so a new SDK is required.

<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-sdk-udf</artifactId>
<version>0.30.0</version>
<scope>provided</scope>
</dependency>

T he SDK cont ains a new abst ract class UDJ. All UDJ feat ures can be implement ed t hrough t his class.

Sample code
T he following sample code is used for reference only.

package com.aliyun.odps.udf.example.udj;
import com.aliyun.odps.Column;
import com.aliyun.odps.OdpsType;
import com.aliyun.odps.Yieldable;
import com.aliyun.odps.data.ArrayRecord;
import com.aliyun.odps.data.Record;
import com.aliyun.odps.udf.DataAttributes;
import com.aliyun.odps.udf.ExecutionContext;
import com.aliyun.odps.udf.UDJ;
import com.aliyun.odps.udf.annotation.Resolve;
import java.util.ArrayList;
import java.util.Iterator;
/** For each record of right table, find the nearest record of left table and
* merge two records.
*/
@Resolve("->string,bigint,string")
public class PayUserLogMergeJoin extends UDJ {
private Record outputRecord;
/** Will be called prior to the data processing phase. User could implement
* this method to do initialization work.
*/
@Override
public void setup(ExecutionContext executionContext, DataAttributes dataAttributes) {
//
outputRecord = new ArrayRecord(new Column[]{
new Column("user_id", OdpsType.STRING),
new Column("time", OdpsType.BIGINT),
new Column("content", OdpsType.STRING)
});
}
/** Override this method to implement join logic.
* @param key Current join key
* @param left Group of records of left table corresponding to the current key
* @param right Group of records of right table corresponding to the current key
* @param output Used to output the result of UDJ

273 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

* @param output Used to output the result of UDJ


*/
@Override
public void join(Record key, Iterator<Record> left, Iterator<Record> right, Yieldable<Rec
ord> output) {
outputRecord.setString(0, key.getString(0));
if (! right.hasNext()) {
// Empty right group, do nothing.
return;
} else if (! left.hasNext()) {
// Empty left group. Output all records of right group without merge.
while (right.hasNext()) {
Record logRecord = right.next();
outputRecord.setBigint(1, logRecord.getDatetime(0).getTime());
outputRecord.setString(2, logRecord.getString(1));
output.yield(outputRecord);
}
return;
}
ArrayList<Record> pays = new ArrayList<>();
// The left group of records will be iterated from the start to the end
// for each record of right group, but the iterator cannot be reset.
// So we save every records of left to an ArrayList.
left.forEachRemaining(pay -> pays.add(pay.clone()));
while (right.hasNext()) {
Record log = right.next();
long logTime = log.getDatetime(0).getTime();
long minDelta = Long.MAX_VALUE;
Record nearestPay = null;
// Iterate through all records of left, and find the pay record that has
// the minimal difference in terms of time.
for (Record pay: pays) {
long delta = Math.abs(logTime - pay.getDatetime(0).getTime());
if (delta < minDelta) {
minDelta = delta;
nearestPay = pay;
}
}
// Merge the log record with nearest pay record and output to the result.
outputRecord.setBigint(1, log.getDatetime(0).getTime());
outputRecord.setString(2, mergeLog(nearestPay.getString(1), log.getString(1)));
output.yield(outputRecord);
}
}
String mergeLog(String payInfo, String logContent) {
return logContent + ", pay " + payInfo;
}
@Override
public void close() {
}
}

> Document Version: 20220928 274


User Guide· MaxComput e SQL MaxComput e

Not ice In t his example, t he NULL values in t he ent ries are not processed. T o simplify t he dat a
processing procedure, assume t hat no NULL values are cont ained in t he t ables.

Each t ime you call t his JOIN met hod of UDJ, records t hat mat ch t he same key in t he t wo t ables are
ret urned. T herefore, UDJ searches all records in t he payment t able t o locat e t he record wit h t he t ime
closest t o each record in t he user_client _log t able.

Assume t hat t he user only has a few payment records. In t his case, you can load t he dat a in t he
payment t able t o t he memory. T ypically, t here is sufficient memory t o st ore t he user payment dat a
generat ed each day. What if t his assumpt ion is invalid? How can we resolve t his issue? T his issue will be
discussed in Pre-sort ing .

6.10.2.3. Create a UDJ function in MaxCompute


Aft er you have writ t en t he UDJ code in Java, upload t he code t o MaxComput e SQL as a plug-in. You
must have regist ered t he code wit h MaxComput e first .

Assume t hat t he code is compressed int o JAR package odps-udj-example.jar. Use t he Add JAR command
t o upload t he JAR package t o MaxComput e.

add jar odps-udj-example.jar;

Execut e t he CREAT E FUNCT ION st at ement t o creat e UDJ funct ion pay_user_log_merge_join, using JAR
package odps-udj-example.jar and Java class com.aliyun.odps.udf.example.udj.PayUserLogMergeJoin.

create function pay_user_log_merge_join


as 'com.aliyun.odps.udf.example.udj.PayUserLogMergeJoin'
using 'odps-udj-example.jar';

6.10.2.4. Use UDJ in MaxCompute SQL


Aft er you have regist ered UDJ in t he dat abase, UDJ can be used in MaxComput e SQL.

1. Creat e a sample source t able.

create table payment (user_id string,time datetime,pay_info string);


create table user_client_log(user_id string,time datetime,content string);

2. Creat e sample dat a.

Not ice T he dat a in t his example is only used for reference. You may need t o creat e
different dat a in act ual operat ions.

275 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

-- Create data in the payment table


INSERT OVERWRITE TABLE payment VALUES
('1335656', datetime '2018-02-13 19:54:00', 'PEqMSHyktn'),
('2656199', datetime '2018-02-13 12:21:00', 'pYvotuLDIT'),
('2656199', datetime '2018-02-13 20:50:00', 'PEqMSHyktn'),
('2656199', datetime '2018-02-13 22:30:00', 'gZhvdySOQb'),
('8881237', datetime '2018-02-13 08:30:00', 'pYvotuLDIT'),
('8881237', datetime '2018-02-13 10:32:00', 'KBuMzRpsko'),
('9890100', datetime '2018-02-13 16:01:00', 'gZhvdySOQb'),
('9890100', datetime '2018-02-13 16:26:00', 'MxONdLckwa')
;
-- Create data in the user_client_log table
INSERT OVERWRITE TABLE user_client_log VALUES
('1000235', datetime '2018-02-13 00:25:36', 'click FNOXAibRjkIaQPB'),
('1000235', datetime '2018-02-13 22:30:00', 'click GczrYaxvkiPultZ'),
('1335656', datetime '2018-02-13 18:30:00', 'click MxONdLckpAFUHRS'),
('1335656', datetime '2018-02-13 19:54:00', 'click mKRPGOciFDyzTgM'),
('2656199', datetime '2018-02-13 08:30:00', 'click CZwafHsbJOPNitL'),
('2656199', datetime '2018-02-13 09:14:00', 'click nYHJqIpjevkKToy'),
('2656199', datetime '2018-02-13 21:05:00', 'click gbAfPCwrGXvEjpI'),
('2656199', datetime '2018-02-13 21:08:00', 'click dhpZyWMuGjBOTJP'),
('2656199', datetime '2018-02-13 22:29:00', 'click bAsxnUdDhvfqaBr'),
('2656199', datetime '2018-02-13 22:30:00', 'click XIhZdLaOocQRmrY'),
('4356142', datetime '2018-02-13 18:30:00', 'click DYqShmGbIoWKier'),
('4356142', datetime '2018-02-13 19:54:00', 'click DYqShmGbIoWKier'),
('8881237', datetime '2018-02-13 00:30:00', 'click MpkvilgWSmhUuPn'),
('8881237', datetime '2018-02-13 06:14:00', 'click OkTYNUHMqZzlDyL'),
('8881237', datetime '2018-02-13 10:30:00', 'click OkTYNUHMqZzlDyL'),
('9890100', datetime '2018-02-13 16:01:00', 'click vOTQfBFjcgXisYU'),
('9890100', datetime '2018-02-13 16:20:00', 'click WxaLgOCcVEvhiFJ')
;

3. In MaxComput e SQL, use t he UDJ funct ion you have creat ed:

SELECT r.user_id, from_unixtime(time/1000) as time, content FROM (


SELECT user_id, time as time, pay_info FROM payment
) p JOIN (
SELECT user_id, time as time, content FROM user_client_log
) u
ON p.user_id = u.user_id
USING pay_user_log_merge_join(p.time, p.pay_info, u.time, u.content)
r
AS (user_id, time, content)
;

Not e T he synt ax of UDJ is similar t o t hat of t he st andard JOIN st at ement . T he only


difference is t hat t he USING clause is added t o UDJ.

Descript ion:

pay_user_log_merge_join is t he name of t he UDJ funct ion in SQL.


(p.t ime, p.pay_inf o, u.t ime, u.cont ent ) are t he columns used in t hese t wo t ables.

> Document Version: 20220928 276


User Guide· MaxComput e SQL MaxComput e

r is t he alias of t he result ret urned by t he UDJ funct ion. You can reference t his alias in ot her SQL
st at ement s.
(user_id, t ime, cont ent ) are t he columns ret urned by t he UDJ funct ion.

4. Execut e t his SQL st at ement . A similar out put is displayed:

+---------+------------+---------+
| user_id | time | content |
+---------+------------+---------+
| 1000235 | 2018-02-13 00:25:36 | click FNOXAibRjkIaQPB |
| 1000235 | 2018-02-13 22:30:00 | click GczrYaxvkiPultZ |
| 1335656 | 2018-02-13 18:30:00 | click MxONdLckpAFUHRS, pay PEqMSHyktn |
| 1335656 | 2018-02-13 19:54:00 | click mKRPGOciFDyzTgM, pay PEqMSHyktn |
| 2656199 | 2018-02-13 08:30:00 | click CZwafHsbJOPNitL, pay pYvotuLDIT |
| 2656199 | 2018-02-13 09:14:00 | click nYHJqIpjevkKToy, pay pYvotuLDIT |
| 2656199 | 2018-02-13 21:05:00 | click gbAfPCwrGXvEjpI, pay PEqMSHyktn |
| 2656199 | 2018-02-13 21:08:00 | click dhpZyWMuGjBOTJP, pay PEqMSHyktn |
| 2656199 | 2018-02-13 22:29:00 | click bAsxnUdDhvfqaBr, pay gZhvdySOQb |
| 2656199 | 2018-02-13 22:30:00 | click XIhZdLaOocQRmrY, pay gZhvdySOQb |
| 4356142 | 2018-02-13 18:30:00 | click DYqShmGbIoWKier |
| 4356142 | 2018-02-13 19:54:00 | click DYqShmGbIoWKier |
| 8881237 | 2018-02-13 00:30:00 | click MpkvilgWSmhUuPn, pay pYvotuLDIT |
| 8881237 | 2018-02-13 06:14:00 | click OkTYNUHMqZzlDyL, pay pYvotuLDIT |
| 8881237 | 2018-02-13 10:30:00 | click OkTYNUHMqZzlDyL, pay KBuMzRpsko |
| 9890100 | 2018-02-13 16:01:00 | click vOTQfBFjcgXisYU, pay gZhvdySOQb |
| 9890100 | 2018-02-13 16:20:00 | click WxaLgOCcVEvhiFJ, pay MxONdLckwa |
+---------+------------+---------+

As shown in t he preceding code, t he t ask t hat could not be performed by calling nat ive JOIN met hods
has been complet ed by using UDJ.

6.10.2.5. Pre-sorting
An it erat or is used t o search all records in t he payment t able and locat e payment records t hat mat ch
t he query. T o perform t his t ask, you must load all payment records wit h t he same user_id t o an
ArrayList . T his met hod can be applied when t he number of payment records is small. Due t o RAM size
limit s, you must find anot her met hod t o load t he dat a if a large number of payment records have been
generat ed.

T his t opic describes how t o address t his issue using t he SORT BY clause. When t he size of t he payment
dat a is t oo large t o be st ored in t he memory, it would be easier t o address t his issue if all dat a in t he
t able has already been sort ed by t ime. You t hen only need t o compare t he first element in t hese t wo
list s. UDJ code in Java:

277 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

@Override
public void join(Record key, Iterator<Record> left, Iterator<Record> right, Yieldable<Recor
d> output) {
outputRecord.setString(0, key.getString(0));
if (! right.hasNext()) {
return;
} else if (! left.hasNext()) {
while (right.hasNext()) {
Record logRecord = right.next();
outputRecord.setBigint(1, logRecord.getDatetime(0).getTime());
outputRecord.setString(2, logRecord.getString(1));
output.yield(outputRecord);
}
return;
}
long prevDelta = Long.MAX_VALUE;
Record logRecord = right.next();
Record payRecord = left.next();
Record lastPayRecord = payRecord.clone();
while (true) {
long delta = logRecord.getDatetime(0).getTime() - payRecord.getDatetime(0).getTime();
if (left.hasNext() && delta > 0) {
// The delta of time between two records is decreasing, we can still
// explore the left group to try to gain a smaller delta.
lastPayRecord = payRecord.clone();
prevDelta = delta;
payRecord = left.next();
} else {
// Hit to the point of minimal delta. Check with the last pay record,
// output the merge result and prepare to process the next record of
// right group.
Record nearestPay = Math.abs(delta) < prevDelta ? payRecord : lastPayRecord;
outputRecord.setBigint(1, logRecord.getDatetime(0).getTime());
String mergedString = mergeLog(nearestPay.getString(1), logRecord.getString(1));
outputRecord.setString(2, mergedString);
output.yield(outputRecord);
if (right.hasNext()) {
logRecord = right.next();
prevDelta = Math.abs(
logRecord.getDatetime(0).getTime() - lastPayRecord.getDatetime(0).getTime()
);
} else {
break;
}
}
}
}

Not ice Aft er you have modified t he UDJ code, you must updat e t he corresponding JAR
package.

When t he creat ed UDJ funct ion is used in MaxComput e SQL, you must modify t he command as follows:

> Document Version: 20220928 278


User Guide· MaxComput e SQL MaxComput e

SELECT r.user_id, from_unixtime(time/1000) as time, content FROM (


SELECT user_id, time as time, pay_info FROM payment
) p JOIN (
SELECT user_id, time as time, content FROM user_client_log
) u
ON p.user_id = u.user_id
USING pay_user_log_merge_join(p.time, p.pay_info, u.time, u.content)
r
AS (user_id, time, content)
SORT BY p.time, u.time
;

In t he nat ive SQL language, you must make a few modificat ions, add a SORT BY clause t o t he end of t he
UDJ clause, and t hen sort t he dat a in bot h t ables by t ime.

T he execut ion result is t he same as t he result before t he code is modified.

T his met hod uses t he SORT BY clause t o pre-sort t he dat a. T o achieve t he same result , only a maximum
of t hree records need t o be cached.

6.10.3. Performance advantages


Wit hout UDJ, you must use MapReduce t o handle complex cross-t able comput ing t asks in a dist ribut ed
syst em.

T he following example uses an online MapReduce job t o t est t he UDJ performance. T his MapReduce job
uses a complex algorit hm t o join t wo t ables. T his example uses UDJ t o rewrit e t he SQL st at ement s of
t he MapReduce job and checks t he execut ion result s.

Under t he same programming concurrency, t he comparison of performance is as follows.


Performance comparison

279 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

As shown in t he figure, UDJ helps describe t he complex logic of handling mult iple t ables, and great ly
improves t he query performance.

Not e T he code is only execut ed inside UDJ. T he ent ire logic of t he code is execut ed by t he
high-performance MaxComput e nat ive runt ime.

UDJ opt imizes t he MaxComput e runt ime engine and t he dat a exchange bet ween int erfaces. T he join
logic of UDJ is more efficient t han t hat of t he reduce st age.

6.11. Parameterized view


MaxComput e support s paramet erized views. You can call such views t o obt ain queried informat ion.

In t he t radit ional views of MaxComput e, complex SQL script s are encapsulat ed at t he underlying layer.
Callers can call views like reading a st andard t able wit hout t he need t o underst and t he underlying
implement at ion mechanism. T radit ional views are widely used because t hey can be used t o implement
encapsulat ion and code reuse.

However, you cannot specify paramet ers for t radit ional views. If a t radit ional view is called t o read dat a
from an underlying t able, you cannot filt er dat a in t he underlying t able or pass ot her paramet ers t o t he
view. T his reduces t he code reuse rat e.

T he new SQL engine of MaxComput e V2.0 support s paramet erized views and allows you t o import any
t ables or ot her variables t o cust omize views.

Create a parameterized view


Example

--view with parameters


-- param @a -a table parameter
-- param @b -a string parmeter
--returns a table with schema (key string,value string)
create view if not exists pv1(@a table (k string,v bigint), @b string)
as
select srcp.key,srcp.value from srcp join @a on srcp.key=a.k and srcp.p=@b;

T he creat ed pv1 view has t wo paramet ers, t he t able and st ring paramet ers. T he paramet er values
can be t ables or be of a basic dat a t ype.
T he paramet er values can also be subqueries. Example:

select * from view_name( (select 1 fromsrc where a > 0), 1);

When you define a view, you can set t he t ype of a paramet er value t o ANY. Example:

create view view_name (@a ANY, @b TABLE (x ANY)) as ...

When you define a view, you can use an ast erisk (*) t o indicat e a varying-lengt h column. Example:

create view view_name(@a bigint @b TABLE(x bigint, * ANY)) asselect * from @b where x = @
a;

> Document Version: 20220928 280


User Guide· MaxComput e SQL MaxComput e

Not e In t he t able specified by t he T ABLE paramet er, dat a in t he first column is of t he


BIGINT t ype. You can execut e SELECT * t o obt ain t he varying-lengt h part .

Call the created view


Execut e t he following st at ement t o call t he creat ed pv1 view:

@a := select * from src where value >0;


--call view with table varable an scalar
@b := select * from pv1(@a,'20170101');
@another_day := '20170102';
--call view with table name and scalar variable
@c := select * from pv1(src2, @another_day);
@d := select * from @c union all select * from @b;
with
t as(select * from src3)
select * from @c
union all
select * from @d
union all
select * from pv1(t,@another_day);

Not e
You can use different paramet ers t o call t he pv1 view. T he value of t he t able paramet er can
be a physical t able, view, t able variable, or t able alias in common t able expressions (CT Es).
Common paramet ers can be variables or const ant s.

Additional instructions
A paramet erized view can cont ain mult iple SQL st at ement s, similar t o a script .

--view with parameters


-- param @a -a table parameter
-- param @b -a string parmeter
--returns a table with schema (key string,value string)
create view if not exists pv2(@a table (k string,v bigint), @b string) as
BEGIN
@srcp := select * from srcp where p=@b;
@pv2 := select srcp.key,srcp.value from @srcp join @a on srcp.key=a.k;
end;

281 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e
Cont ent bet ween BEGIN and end is t he script of t his view.
T he @pv2 := … st at ement is similar t o t he RET URN st at ement in ot her programming
languages. T his st at ement is used t o assign a value t o an implicit t able variable t hat has t he
same name as t he view.
Only DML st at ement s can be used in script s. T he INSERT and CREAT E T ABLE AS st at ement s
cannot be included in script s.
PRINT st at ement s cannot be included in script s.

T he mat ching rules for act ual and formal view paramet ers are t he same as t hose specified in a normal
programming language. If view paramet ers can be implicit ly convert ed, t hese paramet ers can be
mat ched. For example, t he BIGINT value can mat ch t he paramet ers of t he DOUBLE t ype. For t able
variables, if t he schema of T able A can be insert ed int o T able B, T able A can be used t o mat ch t he
t able paramet er t hat has t he same schema as T able B.

In some sit uat ions, you can declare t he ret urn t ype t o make t he code easier t o read. Example:

create view if not exists pv3(@a table (k string,v bigint), @b string)


returns @ret table (x string,y string)
AS
begin
@srcp := select * from srcp where p=@b;
@ret := select srcp.key,srcp.value from @srcp join @a on srcp.key=a.k;
end;

Not e RET URNS @ret T ABLE (x st ring, y st ring) defines t he following informat ion:
T he ret urn t ype is T ABLE (x st ring, y st ring), which indicat es t he t ype ret urned t o t he caller.
You can use t his paramet er t o cust omize t he t able schema.
T he response paramet er is @ret . A value is assigned t o t he paramet er in t he view script .
You can regard a view t hat does not cont ain t he BEGIN and END keywords or t hat does not
ret urn variables as a simplified view.

6.12. Geographic functions


6.12.1. Usage notes
Before you use geographic funct ions, underst and t he following point s:

All funct ions are published in t he geospat ial project of t he Dat aWorks market place. T hese funct ions are
prefixed wit h ST _. You can click a funct ion t o view and use it wit hout t he need t o apply for permissions.
T o use a funct ion, add geospatial.Project prefix t o t he beginning of t he funct ion name and
commit SQL st at ement s t hat cont ain t his funct ion wit h t he following flags:

set odps.sql.hive.compatible=true;
set odps.sql.udf.java.retain.legacy=false;
set odps.isolation.session.enable=true;

> Document Version: 20220928 282


User Guide· MaxComput e SQL MaxComput e

6.12.2. Constructors
6.12.2.1. ST_AsBinary
Funct ion declarat ion:

ST_AsBinary(ST_Geometry)

Descript ion: T his funct ion ret urns t he well-known binary (WKB) represent at ion of t he input geomet ry.

Example :

SELECT ST_AsBinary(ST_Point(1, 2)) FROM onerow;


-- WKB representation of POINT (1 2)

6.12.2.2. ST_AsGeoJson
Funct ion declarat ion:

ST_AsGeoJson(geometry)

Descript ion: T his funct ion ret urns t he GeoJSON represent at ion of t he input geomet ry.

Example :

SELECT ST_AsGeoJson(ST_Point(1.0, 2.0)) from onerow;


-- {"type":"Point", "coordinates":[1.0, 2.0]}

6.12.2.3. ST_AsJson
Funct ion declarat ion:

ST_AsJSON(ST_Geometry)

Descript ion: T his funct ion ret urns t he JSON represent at ion of t he input geomet ry.

Example :

SELECT ST_AsJSON(ST_Point(1.0, 2.0)) from onerow;


-- {"x":1.0,"y":2.0}
SELECT ST_AsJSON(ST_SetSRID(ST_Point(1, 1), 4326)) from onerow;
-- {"x":1.0,"y":1.0,"spatialReference":{"wkid":4326}}

6.12.2.4. ST_AsShape
Funct ion declarat ion:

ST_AsShape(ST_Geometry)

Descript ion: T his funct ion ret urns t he ESRI shape represent at ion of t he input geomet ry.

283 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Example :

SELECT ST_AsShape(ST_Point(1, 2)) FROM onerow;


-- Esri shape representation of POINT (1 2)

6.12.2.5. ST_AsText
Funct ion declarat ion:

ST_AsText(ST_Geometry)

Descript ion: T his funct ion ret urns t he well-known t ext (WKT ) represent at ion of t he input geomet ry.

Example :

SELECT ST_AsText(ST_Point(1, 2)) FROM onerow;


-- POINT (1 2)

6.12.2.6. ST_GeomCollection
Funct ion declarat ion:

ST_GeomCollection(wkt)

Descript ion: T his funct ion const ruct s a mult i-part geomet ry from t he well-known t ext (WKT )
represent at ion based on t he Open Geospat ial Consort ium (OGC).

Not ice T he ST _GeomCollect ion funct ion in MaxComput e support s only t he mult i-part
geomet ry feat ure, not t he collect ion feat ure.

Example :

SELECT ST_GeomCollection('multipoint ((1 0), (2 3))') FROM src LIMIT 1;


-- Construct a multipoint geometry.
ST_GeomCollection('POINT(1 1), LINESTRING(2 0,3 0)')
-- Not supported.

6.12.2.7. ST_GeomFromGeoJson
Funct ion declarat ion:

ST_GeomFromGeoJson(json)

Descript ion: T his funct ion const ruct s a geomet ry from t he input GeoJSON represent at ion.

Example :

> Document Version: 20220928 284


User Guide· MaxComput e SQL MaxComput e

SELECT ST_GeomFromGeoJson('{"type":"Point", "coordinates":[1.2, 2.4]}') FROM src LIMIT 1;


-- Construct a point.
SELECT ST_GeomFromGeoJson('{"type":"LineString", "coordinates":[[1,2], [3,4]]}') FROM src L
IMIT 1;
-- Construct a linestring.

6.12.2.8. ST_GeomFromJSON
Funct ion declarat ion:

ST_GeomFromJSON(json)

Descript ion: T his funct ion const ruct s a geomet ry from t he input ESRI JSON represent at ion.

Example :

SELECT ST_GeomFromJSON('{"x":0.0,"y":0.0}') FROM src LIMIT 1;


-- Construct a point.

6.12.2.9. ST_GeomFromShape
Funct ion declarat ion:

ST_GeomFromShape(shape)

Descript ion: T his funct ion const ruct s a geomet ry from t he input ESRI shape represent at ion.
Example :

SELECT ST_GeomFromShape(ST_AsShape(ST_Point(1, 2)));


-- Construct a point.

6.12.2.10. ST_GeomFromText
Funct ion declarat ion:

ST_GeomFromText(wkt)

Descript ion: T his funct ion const ruct s a geomet ry from t he input well-known t ext (WKT )
represent at ion based on t he Open Geospat ial Consort ium (OGC).

Example :

SELECT ST_GeomFromText('linestring (1 0, 2 3)') FROM src LIMIT 1;


-- Construct a linestring.
SELECT ST_GeomFromText('multipoint ((1 0), (2 3))') FROM src LIMIT 1;
-- Construct a multipoint geometry.

6.12.2.11. ST_GeomFromWKB

285 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Funct ion declarat ion:

ST_GeomFromWKB(wkb)

Descript ion: T his funct ion const ruct s a geomet ry from t he input well-known binary (WKB)
represent at ion based on t he Open Geospat ial Consort ium (OGC).

Example :

SELECT ST_GeomFromWKB(ST_AsBinary(ST_GeomFromText('linestring (1 0, 2 3)'))) FROM src LIMIT


1;
-- Construct a linestring.
SELECT ST_GeomFromWKB(ST_AsBinary(ST_GeomFromText('multipoint ((1 0), (2 3))'))) FROM src L
IMIT 1;
-- Construct a multipoint geometry.

6.12.2.12. ST_GeometryType
Funct ion declarat ion:

ST_GeometryType(geometry)

Descript ion: T his funct ion ret urns t he t ype name of t he input geomet ry.

Example :

SELECT ST_GeometryType(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- ST_Point
SELECT ST_GeometryType(ST_LineString(1.5,2.5, 3.0,2.2)) FROM src LIMIT 1;
-- ST_LineString
SELECT ST_GeometryType(ST_Polygon(2,0, 2,3, 3,0)) FROM src LIMIT 1;
-- ST_Polygon

6.12.2.13. ST_LineString
Funct ion declarat ion:

ST_LineString(x, y, [x, y]*)


ST_LineString('linestring( ... )')
ST_LineString(array(x+), array(y+))
ST_LineString(array(ST_Point(x,y)+)))

Descript ion: T his funct ion const ruct s a t wo-dimensional line.

Example :

SELECT ST_LineString(1, 1, 2, 2, 3, 3) from src LIMIT 1;


SELECT ST_LineString('linestring(1 1, 2 2, 3 3)') from src LIMIT 1;
SELECT ST_LineString(array(1,2,3), array (1,2,3)) from src LIMIT 1;
SELECT ST_LineString(array(ST_Point(1, 1), ST_Point(2,2), ST_Point(3,3))) from src LIMIT 1;

> Document Version: 20220928 286


User Guide· MaxComput e SQL MaxComput e

6.12.2.14. ST_LineFromWKB
Funct ion declarat ion:

ST_LineFromWKB(wkb)

Descript ion: T his funct ion const ruct s a t wo-dimensional line from t he input well-known binary (WKB)
represent at ion based on t he Open Geospat ial Consort ium (OGC).

Example :

SELECT ST_LineFromWKB(ST_AsBinary(ST_GeomFromText('linestring (1 0, 2 3)'))) FROM src LIMIT


1;
-- Construct a two-dimensional line.

6.12.2.15. ST_MultiLineString
Funct ion declarat ion:

ST_MultiLineString(array(x1, y1, x2, y2, ... ), array(x1, y1, x2, y2, ... ), ... )
ST_MultiLineString('multilinestring( ... )')

Descript ion: T his funct ion const ruct s a t wo-dimensional mult ilinest ring.
Example :

SELECT ST_MultiLineString(array(1, 1, 2, 2), array(10, 10, 20, 20)) from src LIMIT 1;
SELECT ST_MultiLineString('multilinestring ((1 1, 2 2), (10 10, 20 20))', 0) from src LIMIT
1;

6.12.2.16. ST_MLineFromWKB
Funct ion declarat ion:

ST_MLineFromWKB(wkb)

Descript ion: T his funct ion const ruct s a t wo-dimensional mult ilinest ring from t he input well-known
binary (WKB) represent at ion based on t he Open Geospat ial Consort ium (OGC).
Example :

SELECT ST_MLineFromWKB(ST_AsBinary(ST_GeomFromText('multilinestring ((1 0, 2 3), (5 7, 7 5)


)'))) FROM src LIMIT 1;
-- Construct a two-dimensional multilinestring.

6.12.2.17. ST_MultiPoint
Funct ion declarat ion:

287 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

ST_MultiPoint(x1, y1, x2, y2, x3, y3)


ST_MultiPoint('multipoint( ... )')

Descript ion: T his funct ion const ruct s a t wo-dimensional mult ipoint geomet ry.

Example :

SELECT ST_MultiPoint(1, 1, 2, 2, 3, 3) from src LIMIT 1;


-- Construct a three-point geometry.
SELECT ST_MultiPoint('MULTIPOINT ((10 40), (40 30))') from src LIMIT 1;
-- Construct a two-point geometry.

6.12.2.18. ST_MPointFromWKB
Funct ion declarat ion:

ST_MPointFromWKB(wkb)

Descript ion: T his funct ion const ruct s a t wo-dimensional mult ipoint geomet ry from t he input well-
known binary (WKB) represent at ion based on t he Open Geospat ial Consort ium (OGC).

Example :

SELECT ST_MPointFromWKB(ST_AsBinary(ST_GeomFromText('multipoint ((1 0), (2 3))'))) FROM src


LIMIT 1;
-- Construct a two-dimensional multipoint geometry.

6.12.2.19. ST_MultiPolygon
Funct ion declarat ion:

ST_MultiPolygon(array(x1, y1, x2, y2, ... ), array(x1, y1, x2, y2, ... ), ... )
ST_MultiPolygon('multipolygon ( ... )')

Descript ion: T his funct ion const ruct s a t wo-dimensional mult ipolygon.
Example :

SELECT ST_MultiPolygon(array(1, 1, 1, 2, 2, 2, 2, 1), array(3, 3, 3, 4, 4, 4, 4, 3)) from s


rc LIMIT 1;
SELECT ST_MultiPolygon('multipolygon (((0 0, 0 1, 1 0, 0 0)), ((2 2, 2 3, 3 2, 2 2)))') fro
m src LIMIT 1;

6.12.2.20. ST_MPolyFromWKB
Funct ion declarat ion:

ST_MPolyFromWKB(wkb)

> Document Version: 20220928 288


User Guide· MaxComput e SQL MaxComput e

Descript ion: T his funct ion const ruct s a t wo-dimensional mult ipolygon from t he input well-known
binary (WKB) represent at ion based on t he Open Geospat ial Consort ium (OGC).

Example :

SELECT ST_MPolyFromWKB(ST_AsBinary(ST_GeomFromText('multipolygon (((0 0, 1 0, 0 1, 0 0)), (


(2 2, 1 2, 2 1, 2 2)))'))) FROM src LIMIT 1;
-- Construct a two-dimensional multipolygon.

6.12.2.21. ST_Point
Funct ion declarat ion:

ST_Point(x, y)
ST_Point('point (x y)')

Descript ion: T his funct ion const ruct s a t wo-dimensional point .

Example :

SELECT ST_Point(longitude, latitude) from src LIMIT 1;


SELECT ST_Point('point (0 0)') from src LIMIT 1;

6.12.2.22. ST_PointFromWKB
Funct ion declarat ion:

ST_PointFromWKB(wkb)

Descript ion: T his funct ion const ruct s a t wo-dimensional point from t he input well-known binary (WKB)
represent at ion based on t he Open Geospat ial Consort ium (OGC).

Example :

SELECT ST_PointFromWKB(ST_AsBinary(ST_GeomFromText('point (1 0))'))) FROM src LIMIT 1;


-- Construct a two-dimensional point.

6.12.2.23. ST_PointZ
Funct ion declarat ion:

ST_PointZ(x, y, z)

Descript ion: T his funct ion const ruct s a t hree-dimensional point .

Example :

SELECT ST_PointZ(longitude, latitude, elevation) from src LIMIT 1;

6.12.2.24. ST_Polygon

289 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Funct ion declarat ion:

ST_Polygon(x, y, [x, y]*)


ST_Polygon('polygon( ... )')

Descript ion: T his funct ion const ruct s a t wo-dimensional polygon.

Example :

SELECT ST_Polygon(1, 1, 1, 4, 4, 4, 4, 1) from src LIMIT 1;


-- Construct a square.
SELECT ST_Polygon('polygon ((1 1, 4 1, 1 4))') from src LIMIT 1;
-- Construct a triangle.

6.12.2.25. ST_PolyFromWKB
Funct ion declarat ion:

ST_PolyFromWKB(wkb)

Descript ion: T his funct ion const ruct s a t wo-dimensional polygon from t he input well-known binary
(WKB) represent at ion based on t he Open Geospat ial Consort ium (OGC).

Example :

SELECT ST_PolyFromWKB(ST_AsBinary(ST_GeomFromText('polygon ((0 0, 10 0, 0 10, 0 0))'))) FRO


M src LIMIT 1;
-- Construct a two-dimensional polygon.

6.12.2.26. ST_SetSRID
Funct ion declarat ion:

ST_SetSRID(<ST_Geometry>, SRID)

Descript ion: T his funct ion set s t he spat ial reference syst em ident ifier (SRID) of t he input geomet ry.

Example :

SELECT ST_SetSRID(ST_Point(1.5, 2.5), 4326)) FROM src LIMIT 1;


-- Construct a point and set its SRID to 4326.

6.12.3. Accessors
6.12.3.1. ST_Area
Funct ion declarat ion:

ST_Area(ST_Polygon)

> Document Version: 20220928 290


User Guide· MaxComput e SQL MaxComput e

Descript ion: T his funct ion ret urns t he areas of one or more polygons.

Example :

SELECT ST_Area(ST_Polygon(1,1, 1,4, 4,4, 4,1)) FROM src LIMIT 1;


-- 9.0

6.12.3.2. ST_Centroid
Funct ion declarat ion:

ST_Centroid(polygon)

Descript ion: T his funct ion ret urns t he cent er point of t he minimum bounding rect angle of t he input
polygon.

Example :

SELECT ST_Centroid(ST_GeomFromText('polygon ((0 0, 3 6, 6 0, 0 0))')) FROM src LIMIT 1;


-- POINT(3 3)
SELECT ST_Centroid(ST_GeomFromText('polygon ((0 0, 0 8, 8 0, 0 0))')) FROM src LIMIT 1;
-- POINT(4 4)

6.12.3.3. ST_CoordDim
Funct ion declarat ion:

ST_CoordDim(geometry)

Descript ion: T his funct ion ret urns t he coordinat e dimension of t he input geomet ry.

Example :

SELECT ST_CoordDim(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- 2
SELECT ST_CoordDim(ST_PointZ(1.5,2.5, 3) FROM src LIMIT 1;
-- 3
SELECT ST_CoordDim(ST_Point(1.5, 2.5, 3., 4.)) FROM src LIMIT 1;
-- 4

6.12.3.4. ST_Dimension
Funct ion declarat ion:

ST_Dimension(geometry)

Descript ion: T his funct ion ret urns t he spat ial dimension of t he input geomet ry.

Example :

291 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

SELECT ST_Dimension(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- 0
SELECT ST_Dimension(ST_LineString(1.5,2.5, 3.0,2.2)) FROM src LIMIT 1;
-- 1
SELECT ST_Dimension(ST_Polygon(2,0, 2,3, 3,0)) FROM src LIMIT 1;
-- 2

6.12.3.5. ST_Distance
Funct ion declarat ion:

ST_Distance(ST_Geometry1, ST_Geometry2)

Descript ion: T his funct ion ret urns t he dist ance bet ween a point in geomet ry1 and a point in
geomet ry2.

Example :

SELECT ST_Distance(ST_Point(0.0,0.0), ST_Point(3.0,4.0)) FROM src LIMIT 1;


-- 5.0

6.12.3.6. ST_GeodesicLengthWGS84
Funct ion declarat ion:

ST_GeodesicLengthWGS84(line)

Descript ion: T his funct ion ret urns t he dist ance in met ers on a spheroid based on World Geodet ic
Syst em 1984 (WGS84). T he geomet ry must be in WGS84. Ot herwise, t his funct ion ret urns NULL.

Example :

SELECT ST_GeodesicLengthWGS84(ST_SetSRID(ST_Linestring(0.0,0.0, 0.3,0.4), 4326)) FROM src L


IMIT 1;
-- 55km
SELECT ST_GeodesicLengthWGS84(ST_GeomFromText('MultiLineString((0.0 80.0, 0.3 80.4))', 4326
)) FROM src LIMIT 1;
-- 45km

6.12.3.7. ST_GeometryN
Funct ion declarat ion:

ST_GeometryN(ST_GeometryCollection, n)

Descript ion: T his funct ion ret urns t he nt h geomet ry in t he input geomet ry collect ion. n st art s from 1.

Example :

> Document Version: 20220928 292


User Guide· MaxComput e SQL MaxComput e

SELECT ST_GeometryN(ST_GeomFromText('multipoint ((10 40), (40 30), (20 20), (30 10))'), 3)
FROM src LIMIT 1;
-- ST_Point(20 20)
SELECT ST_GeometryN(ST_GeomFromText('multilinestring ((2 4, 10 10), (20 20, 7 8))'), 2) FRO
M src LIMIT 1;
-- ST_Linestring(20 20, 7 8)

6.12.3.8. ST_Is3D
Funct ion declarat ion:

ST_Is3D(geometry)

Descript ion: If t he input geomet ry has Z coordinat es, t his funct ion ret urns t rue. Ot herwise, t his
funct ion ret urns false.

Example :

SELECT ST_Is3D(ST_Polygon(1,1, 1,4, 4,4, 4,1)) FROM src LIMIT 1;


-- false
SELECT ST_Is3D(ST_LineString(0.,0., 3.,4., 0.,4., 0.,0.)) FROM src LIMIT 1;
-- false
SELECT ST_Is3D(ST_Point(3., 4.)) FROM src LIMIT 1;
-- false
SELECT ST_Is3D(ST_PointZ(3., 4., 2)) FROM src LIMIT 1;
-- true

6.12.3.9. ST_IsClosed
Funct ion declarat ion:

ST_IsClosed(ST_[Multi]LineString)

Descript ion: If t he input linest ring or linest rings are closed, t his funct ion ret urns t rue.

Example :

SELECT ST_IsClosed(ST_LineString(0.,0., 3.,4., 0.,4., 0.,0.)) FROM src LIMIT 1;


-- true
SELECT ST_IsClosed(ST_LineString(0.,0., 3.,4.)) FROM src LIMIT 1;
-- false

6.12.3.10. ST_IsEmpty
Funct ion declarat ion:

ST_IsEmpty(geometry)

Descript ion: If t he input geomet ry is empt y, t his funct ion ret urns t rue.

Example :

293 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

SELECT ST_IsEmpty(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- false
SELECT ST_IsEmpty(ST_GeomFromText('point empty')) FROM src LIMIT 1;
-- true

6.12.3.11. ST_IsMeasured
Funct ion declarat ion:

ST_IsMeasured(geometry)

Descript ion: If t he input geomet ry has M coordinat es (measures), t his funct ion ret urns t rue.

Example :

SELECT ST_IsMeasured(ST_Polygon(1,1, 1,4, 4,4, 4,1)) FROM src LIMIT 1;


-- false
SELECT ST_IsMeasured(ST_LineString(0.,0., 3.,4., 0.,4., 0.,0.)) FROM src LIMIT 1;
-- false
SELECT ST_IsMeasured(ST_Point(3., 4.)) FROM src LIMIT 1;
-- false
SELECT ST_IsMeasured(ST_PointM(3., 4., 2)) FROM src LIMIT 1;
-- true

6.12.3.12. ST_IsSimple
Funct ion declarat ion:

ST_IsSimple(geometry)

Descript ion: If t he input geomet ry is simple, t his funct ion ret urns t rue.
Example :

SELECT ST_IsSimple(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- true
SELECT ST_IsSimple(ST_LineString(0.,0., 1.,1., 0.,1., 1.,0.)) FROM src LIMIT 1;
-- false

6.12.3.13. ST_IsRing
Funct ion declarat ion:

ST_IsRing(ST_LineString)

Descript ion: If t he input linest ring is closed or simple, t his funct ion ret urns t rue.

Example :

> Document Version: 20220928 294


User Guide· MaxComput e SQL MaxComput e

SELECT ST_IsRing(ST_LineString(0.,0., 3.,4., 0.,4., 0.,0.)) FROM src LIMIT 1;


-- true
SELECT ST_IsRing(ST_LineString(0.,0., 1.,1., 1.,2., 2.,1., 1.,1., 0.,0.)) FROM src LIMIT 1;
-- false
SELECT ST_IsRing(ST_LineString(0.,0., 3.,4.)) FROM src LIMIT 1;
-- false

6.12.3.14. ST_Length
Funct ion declarat ion:

ST_Length(line)

Descript ion: T his funct ion ret urns t he lengt h of t he input line segment .

Example :

SELECT ST_Length(ST_Line(0.0,0.0, 3.0,4.0)) FROM src LIMIT 1;


-- 5.0

6.12.3.15. ST_M
Funct ion declarat ion:

ST_M(geometry)

Descript ion: T his funct ion ret urns t he M coordinat e of t he input geomet ry.

Example :

SELECT ST_M(ST_PointM(3., 4., 2)) FROM src LIMIT 1;


-- 2

6.12.3.16. ST_MaxM
Funct ion declarat ion:

ST_MaxM(geometry)

Descript ion: T his funct ion ret urns t he maximum M coordinat e of t he input geomet ry.

Example :

SELECT ST_MaxM(ST_PointM(1.5, 2.5, 2)) FROM src LIMIT 1;


-- 2
SELECT ST_MaxM(ST_LineString('linestring m (1.5 2.5 2, 3.0 2.2 1)')) FROM src LIMIT 1;
-- 1

6.12.3.17. ST_MinM

295 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Funct ion declarat ion:

ST_MinM(geometry)

Descript ion: T his funct ion ret urns t he minimum M coordinat e of t he input geomet ry.

Example :

SELECT ST_MinM(ST_PointM(1.5, 2.5, 2)) FROM src LIMIT 1;


-- 2
SELECT ST_MinM(ST_LineString('linestring m (1.5 2.5 2, 3.0 2.2 1)')) FROM src LIMIT 1;
-- 1

6.12.3.18. ST_X
Funct ion declarat ion:

ST_X(point)

Descript ion: T his funct ion ret urns t he X coordinat e of t he input point .

Example :

SELECT ST_X(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- 1.5

6.12.3.19. ST_Y
Funct ion declarat ion:

ST_Y(point)

Descript ion: T his funct ion ret urns t he Y coordinat e of t he input point .

Example :

SELECT ST_Y(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- 2.5

6.12.3.20. ST_Z
Funct ion declarat ion:

ST_Z(point)

Descript ion: T his funct ion ret urns t he Z coordinat e of t he input point .

Example :

SELECT ST_Z(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- 1.5

> Document Version: 20220928 296


User Guide· MaxComput e SQL MaxComput e

6.12.3.21. ST_MaxX
Funct ion declarat ion:

ST_MaxX(geometry)

Descript ion: T his funct ion ret urns t he maximum X coordinat e of t he input geomet ry.

Example :

SELECT ST_MaxX(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- 1.5
SELECT ST_MaxX(ST_LineString(1.5,2.5, 3.0,2.2)) FROM src LIMIT 1;
-- 3.0

6.12.3.22. ST_MaxY
Funct ion declarat ion:

ST_MaxY(geometry)

Descript ion: T his funct ion ret urns t he maximum Y coordinat e of t he input geomet ry.

Example :

SELECT ST_MaxY(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- 2.5
SELECT ST_MaxY(ST_LineString(1.5,2.5, 3.0,2.2)) FROM src LIMIT 1;
-- 2.5

6.12.3.23. ST_MaxZ
Funct ion declarat ion:

ST_MaxZ(geometry)

Descript ion: T his funct ion ret urns t he maximum Z coordinat e of t he input geomet ry.

Example :

SELECT ST_MaxZ(ST_PointZ(1.5, 2.5, 2)) FROM src LIMIT 1;


-- 2
SELECT ST_MaxZ(ST_LineString('linestring z (1.5 2.5 2, 3.0 2.2 1)')) FROM src LIMIT 1;
-- 1

6.12.3.24. ST_MinX
Funct ion declarat ion:

ST_MinX(geometry)

297 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Descript ion: T his funct ion ret urns t he minimum X coordinat e of t he input geomet ry.

Example :

SELECT ST_MinX(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- 1.5
SELECT ST_MinX(ST_LineString(1.5,2.5, 3.0,2.2)) FROM src LIMIT 1;
-- 3.0

6.12.3.25. ST_MinY
Funct ion declarat ion:

ST_MinY(geometry)

Descript ion: T his funct ion ret urns t he minimum Y coordinat e of t he input geomet ry.

Example :

SELECT ST_MinY(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- 2.5
SELECT ST_MinY(ST_LineString(1.5,2.5, 3.0,2.2)) FROM src LIMIT 1;
-- 2.2

6.12.3.26. ST_MinZ
Funct ion declarat ion:

ST_MinZ(geometry)

Descript ion: T his funct ion ret urns t he minimum Z coordinat e of t he input geomet ry.

Example :

SELECT ST_MinZ(ST_PointZ(1.5, 2.5, 2)) FROM src LIMIT 1;


-- 2
SELECT ST_MinZ(ST_LineString('linestring z (1.5 2.5 2, 3.0 2.2 1)')) FROM src LIMIT 1;
-- 1

6.12.3.27. ST_NumGeometries
Funct ion declarat ion:

ST_NumGeometries(ST_GeometryCollection)

Descript ion: T his funct ion ret urns t he number of geomet ries in t he input geomet ry collect ion.

Example :

> Document Version: 20220928 298


User Guide· MaxComput e SQL MaxComput e

SELECT ST_NumGeometries(ST_GeomFromText('multipoint ((10 40), (40 30), (20 20), (30 10))'))
FROM src LIMIT 1;
-- 4
SELECT ST_NumGeometries(ST_GeomFromText('multilinestring ((2 4, 10 10), (20 20, 7 8))')) FR
OM src LIMIT 1;
-- 2

6.12.3.28. ST_NumInteriorRing
Funct ion declarat ion:

ST_NumInteriorRing(ST_Polygon)

Descript ion: T his funct ion ret urns t he number of int erior rings of t he input polygon.

Example :

SELECT ST_NumInteriorRing(ST_Polygon(1,1, 1,4, 4,1)) FROM src LIMIT 1;


-- 0
SELECT ST_NumInteriorRing(ST_Polygon('polygon ((0 0, 8 0, 0 8, 0 0), (1 1, 1 5, 5 1, 1 1))'
)) FROM src LIMIT 1;
-- 1

6.12.3.29. ST_NumPoints
Funct ion declarat ion:

ST_NumPoints(geometry)

Descript ion: T his funct ion ret urns t he number of point s in t he input geomet ry.

Example :

SELECT ST_NumPoints(ST_Point(1.5, 2.5)) FROM src LIMIT 1;


-- 1
SELECT ST_NumPoints(ST_LineString(1.5,2.5, 3.0,2.2)) FROM src LIMIT 1;
-- 2
SELECT ST_NumPoints(ST_GeomFromText('polygon ((0 0, 10 0, 0 10, 0 0))')) FROM src LIMIT 1;
-- 4

6.12.3.30. ST_PointN
Funct ion declarat ion:

ST_PointN(ST_Geometry, n)

Descript ion: T his funct ion ret urns t he nt h point of one or more linest rings.

Example :

299 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

SELECT ST_PointN(ST_LineString(1.5,2.5, 3.0,2.2), 2) FROM src LIMIT 1;


-- POINT(3.0 2.2)

6.12.3.31. ST_StartPoint
Funct ion declarat ion:

ST_StartPoint(geometry)

Descript ion: T his funct ion ret urns t he first point of t he input linest ring.

Example :

SELECT ST_StartPoint(ST_LineString(1.5,2.5, 3.0,2.2)) FROM src LIMIT 1;


-- POINT(1.5 2.5)

6.12.3.32. ST_EndPoint
Funct ion declarat ion:

ST_EndPoint(geometry)

Descript ion: T his funct ion ret urns t he last point of t he input linest ring.

Example :

SELECT ST_EndPoint(ST_LineString(1.5,2.5, 3.0,2.2)) FROM src LIMIT 1;


-- POINT(3.0 2.0)

6.12.3.33. ST_SRID
Funct ion declarat ion:

ST_SRID(ST_Geometry)

Descript ion: T his funct ion ret urns t he spat ial reference syst em ident ifier (SRID) of t he input geomet ry.

Example :

SELECT ST_SRID(ST_Point(1.5, 2.5)) FROM src LIMIT 1


-- Return SRID 0.

6.12.4. Operations
6.12.4.1. ST_Aggr_ConvexHull
Funct ion declarat ion:

ST_Aggr_ConvexHull(ST_Geometry)

> Document Version: 20220928 300


User Guide· MaxComput e SQL MaxComput e

Descript ion: T his funct ion ret urns a convex hull for input geomet ries by using aggregat ion
t ransformat ion.
Example :

SELECT ST_Aggr_ConvexHull(geometry) FROM source;


-- Return the convex hull of input geometries from the data source by using aggregation tra
nsformation.

6.12.4.2. ST_Aggr_Intersection
Funct ion declarat ion:

ST_Aggr_Intersection(ST_Geometry)

Descript ion: T his funct ion ret urns t he int ersect ion of input geomet ries by using aggregat ion
t ransformat ion.
Example :

SELECT ST_Aggr_Intersection(geometry) FROM source;


-- Return the intersection of input geometries from the data source by using aggregation tr
ansformation.

6.12.4.3. ST_Aggr_Union
Funct ion declarat ion:

ST_Aggr_Union(ST_Geometry)

Descript ion: T his funct ion ret urns a union of input geomet ries by using aggregat ion t ransformat ion.

Example :

SELECT ST_Aggr_Union(geometry) FROM source;


-- Return the union of input geometries from the data source by using aggregation transform
ation.

6.12.4.4. ST_Bin
Funct ion declarat ion:

ST_Bin(placeholder)

Descript ion: T his funct ion ret urns t he bin ID of t he input point .

6.12.4.5. ST_BinEnvelope
Funct ion declarat ion:

ST_BinEnvelope(binsize, point)

301 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Descript ion: T his funct ion ret urns t he binary envelope for t he input point .

Funct ion declarat ion:

ST_BinEnvelope(binsize, binid)

Descript ion: T his funct ion ret urns t he binary envelope for t he input bin ID.

6.12.4.6. ST_Boundary
Funct ion declarat ion:

ST_Boundary(ST_Geometry)

Descript ion: T his funct ion ret urns t he boundary of t he input geomet ry.

Example :

SELECT ST_Boundary(ST_LineString(0,1, 1,0))) FROM src LIMIT 1;


-- MULTIPOINT((1 0),(0 1))
SELECT ST_Boundary(ST_Polygon(1,1, 4,1, 1,4)) FROM src LIMIT 1;
-- LINESTRING(1 1, 4 1, 1 4, 1 1)

6.12.4.7. ST_Buffer
Funct ion declarat ion:

ST_Buffer(geometry, distance)

Descript ion: T his funct ion ret urns a geomet ry t hat indicat es all point s whose dist ance from t his
geomet ry t o t he input geomet ry is less t han or equal t o t he value of t he dist ance paramet er.

6.12.4.8. ST_ConvexHull
Funct ion declarat ion:

ST_ConvexHull(ST_Geometry, ST_Geometry, ...)

Descript ion: T his funct ion ret urns t he convex hull of t he input geomet ry.

Example :

SELECT ST_AsText(ST_ConvexHull(ST_Point(0, 0), ST_Point(0, 1), ST_Point(1, 1))) FROM onerow


;
-- MULTIPOLYGON (((0 0, 1 1, 0 1, 0 0)))

6.12.4.9. ST_Difference
Funct ion declarat ion:

ST_Difference(ST_Geometry1, ST_Geometry2)

> Document Version: 20220928 302


User Guide· MaxComput e SQL MaxComput e

Descript ion: T his funct ion ret urns a geomet ry t hat indicat es t he difference bet ween t he input
geomet ries.

Example :

SELECT ST_AsText(ST_Difference(ST_MultiPoint(1, 1, 1.5, 1.5, 2, 2), ST_Point(1.5, 1.5))) FR


OM onerow;
-- MULTIPOINT (1 1, 2 2)
SELECT ST_AsText(ST_Difference(ST_Polygon(0, 0, 0, 10, 10, 10, 10, 0), ST_Polygon(0, 0, 0,
5, 5, 5, 5, 0))) from onerow;
-- MULTIPOLYGON (((10 0, 10 10, 0 10, 0 5, 5 5, 5 0, 10 0)))

6.12.4.10. ST_Envelope
Funct ion declarat ion:

ST_Envelope(ST_Geometry)

Descript ion: T his funct ion ret urns t he envelope of t he input geomet ry. If t he specified geomet ry is a
point , a horizont al line, or a vert ical line, t his funct ion ret urns t he common difference or an empt y
envelope.

Example :

SELECT ST_Envelope(ST_LineString(0,0, 2,2)) from src LIMIT 1;


-- POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
SELECT ST_Envelope(ST_Polygon(2,0, 2,3, 3,0)) from src LIMIT 1;
-- POLYGON ((2 0, 3 0, 3 3, 2 3, 2 0))

6.12.4.11. ST_ExteriorRing
Funct ion declarat ion:

ST_ExteriorRing(polygon)

Descript ion: T his funct ion ret urns t he ext erior ring of a polygon as a linest ring.

Example :

SELECT ST_ExteriorRing(ST_Polygon(1,1, 1,4, 4,1)) FROM src LIMIT 1;


-- LINESTRING(1 1, 4 1, 1 4, 1 1)
SELECT ST_ExteriorRing(ST_Polygon('polygon ((0 0, 8 0, 0 8, 0 0), (1 1, 1 5, 5 1, 1 1))'))
FROM src LIMIT 1;
-- LINESTRING (8 0, 0 8, 0 0, 8 0)

6.12.4.12. ST_InteriorRingN
Funct ion declarat ion:

ST_InteriorRingN(ST_Polygon, n)

303 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Descript ion: T his funct ion ret urns t he nt h int erior ring of a polygon as a linest ring.

Example :

SELECT ST_InteriorRingN(ST_Polygon('polygon ((0 0, 8 0, 0 8, 0 0), (1 1, 1 5, 5 1, 1 1))'),


1) FROM src LIMIT 1;
-- LINESTRING (1 1, 5 1, 1 5, 1 1)

6.12.4.13. ST_Intersection
Funct ion declarat ion:

ST_Intersection(ST_Geometry1, ST_Geometry2)

Descript ion: T his funct ion ret urns a geomet ry t hat indicat es t he int ersect ion of t he input geomet ries.
If t he input geomet ries int ersect in a lower dimension, ST _Int ersect ion may drop lower-dimension
int ersect ions or ret urn a closed linest ring.

Example :

SELECT ST_AsText(ST_Intersection(ST_Point(1,1), ST_Point(1,1))) FROM onerow;


-- POINT (1 1)
SELECT ST_AsText(ST_Intersection(ST_GeomFromText('linestring(0 2, 0 0, 2 0)'), ST_GeomFromT
ext('linestring(0 3, 0 1, 1 0, 3 0)'))) FROM onerow;
-- MULTILINESTRING ((1 0, 2 0), (0 2, 0 1))
SELECT ST_AsText(ST_Intersection(ST_LineString(0,2, 2,3), ST_Polygon(1,1, 4,1, 4,4, 1,4)))
FROM onerow;
-- MULTILINESTRING ((1 2.5, 2 3))
SELECT ST_AsText(ST_Intersection(ST_Polygon(2,0, 2,3, 3,0), ST_Polygon(1,1, 4,1, 4,4, 1,4))
) FROM onerow;
-- MULTIPOLYGON (((2.67 1, 2 3, 2 1, 2.67 1)))
SELECT ST_AsText(ST_Intersection(ST_Polygon(2,0, 3,1, 2,1), ST_Polygon(1,1, 4,1, 4,4, 1,4))
) FROM onerow;
-- MULTIPOLYGON EMPTY or LINESTRING (2 1, 3 1, 2 1)

6.12.4.14. ST_SymmetricDiff
Funct ion declarat ion:

ST_SymmetricDiff(ST_Geometry1, ST_Geometry2)

Descript ion: T his funct ion ret urns a geomet ry t hat consist s of t he symmet ric differences of t he input
geomet ries.

Example :

> Document Version: 20220928 304


User Guide· MaxComput e SQL MaxComput e

SELECT ST_AsText(ST_SymmetricDiff(ST_LineString('linestring(0 2, 2 2)'), ST_LineString('lin


estring(1 2, 3 2)'))) FROM onerow;
-- MULTILINESTRING((0 2, 1 2), (2 2, 3 2))
SELECT ST_AsText(ST_SymmetricDiff(ST_SymmetricDiff(ST_Polygon('polygon((0 0, 2 0, 2 2, 0 2,
0 0))'), ST_Polygon('polygon((1 1, 3 1, 3 3, 1 3, 1 1))'))) from onerow;
-- MULTIPOLYGON (((0 0, 2 0, 2 1, 1 1, 1 2, 0 2, 0 0)), ((3 1, 3 3, 1 3, 1 2, 2 2, 2 1, 3 1
)))

6.12.4.15. ST_Union
Funct ion declarat ion:

ST_Union(ST_Geometry, ST_Geometry, ...)

Descript ion: T his funct ion ret urns a geomet ry t hat is t he union of t he input geomet ries.

Example :

SELECT ST_AsText(ST_Union(ST_Polygon(1, 1, 1, 4, 4, 4, 4, 1), ST_Polygon(4, 1, 4, 4, 4, 8,


8, 1))) FROM onerow;
-- MULTIPOLYGON (((4 1, 8 1, 4 8, 4 4, 1 4, 1 1, 4 1)))

6.12.5. Relationship tests


6.12.5.1. ST_Contains
Funct ion declarat ion:

BOOLEAN ST_Contains(geometry1, geometry2)

Descript ion: If geomet ry1 cont ains geomet ry2, t his funct ion ret urns t rue. Ot herwise, t his funct ion
ret urns false.

Example :

SELECT ST_Contains(st_polygon(1,1, 1,4, 4,4, 4,1), st_point(2, 3) from src LIMIT 1;


-- true is returned.
SELECT ST_Contains(st_polygon(1,1, 1,4, 4,4, 4,1), st_point(8, 8) from src LIMIT 1;
-- false is returned.

6.12.5.2. ST_Crosses
Funct ion declarat ion:

BOOLEAN ST_Crosses(geometry1, geometry2)

Descript ion: If geomet ry1 crosses geomet ry2, t his funct ion ret urns t rue. Ot herwise, t his funct ion
ret urns false.

305 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not e Crossing indicat es t hat some point s in t he t wo geomet ries are t he same.

Example :

SELECT ST_Crosses(st_linestring(0,0, 1,1), st_linestring(1,0, 0,1)) from src LIMIT 1;


-- true is returned.
SELECT ST_Crosses(st_linestring(2,0, 2,3), st_polygon(1,1, 1,4, 4,4, 4,1)) from src LIMIT 1
;
-- true is returned.
SELECT ST_Crosses(st_linestring(0,2, 0,1), ST_linestring(2,0, 1,0)) from src LIMIT 1;
-- false is returned.

6.12.5.3. ST_Disjoint
Funct ion declarat ion:

BOOLEAN ST_Disjoint(geometry1, geometry2)

Descript ion: If geomet ry1 and geomet ry2 do not int ersect , t his funct ion ret urns t rue. Ot herwise, t his
funct ion ret urns false.

Example :

SELECT ST_Disjoint(ST_LineString(0,0, 0,1), ST_LineString(1,1, 1,0)) from src LIMIT 1;


-- true is returned.
SELECT ST_Disjoint(ST_LineString(0,0, 1,1), ST_LineString(1,0, 0,1)) from src LIMIT 1;
-- false is returned.

6.12.5.4. ST_EnvIntersects
Funct ion declarat ion:

BOOLEAN ST_EnvIntersects(ST_Geometry1, ST_Geometry2)

Descript ion: If t he envelopes of geomet ry1 and geomet ry2 int ersect , t his funct ion ret urns t rue.
Ot herwise, t his funct ion ret urns false.

Example :

SELECT ST_EnvIntersects(ST_LineString(0,0, 1,1), ST_LineString(1,3, 2,2)) from src LIMIT 1;

-- false is returned.
SELECT ST_EnvIntersects(ST_LineString(0,0, 2,2), ST_LineString(1,0, 3,2)) from src LIMIT 1;

-- true is returned.

6.12.5.5. ST_Equals
Funct ion declarat ion:

> Document Version: 20220928 306


User Guide· MaxComput e SQL MaxComput e

BOOLEAN ST_Equals(geometry1, geometry2)

Descript ion: If geomet ry1 equals geomet ry2, t his funct ion ret urns t rue. Ot herwise, t his funct ion ret urns
false.

Example :

SELECT ST_Equals(st_linestring(0,0, 1,1), st_linestring(1,1, 0,0)) from src LIMIT 1;


-- true is returned.
SELECT ST_Equals(st_linestring(0,0, 1,1), st_linestring(1,0, 0,1)) from src LIMIT 1;
-- false is returned.

6.12.5.6. ST_Intersects
Funct ion declarat ion:

BOOLEAN ST_Intersects(geometry1, geometry2)

Descript ion: If geomet ry1 and geomet ry2 int ersect , t his funct ion ret urns t rue. Ot herwise, t his funct ion
ret urns false.

Example :

SELECT ST_Intersects(st_linestring(0,0, 1,1), st_linestring(1,1, 0,0)) from src LIMIT 1;


-- true is returned.
SELECT ST_Intersects(st_linestring(0,0, 1,1), st_linestring(1,0, 0,1)) from src LIMIT 1;
-- true is returned.
SELECT ST_Intersects(ST_LineString(2,0, 2,3), ST_Polygon(1,1, 4,1, 4,4, 1,4)) from src LIMI
T 1;
-- true is returned.
SELECT ST_Intersects(ST_LineString(8,7, 7,8), ST_Polygon(1,1, 4,1, 4,4, 1,4)) from src LIMI
T 1;
-- false is returned.

6.12.5.7. ST_Overlaps
Funct ion declarat ion:

BOOLEAN ST_Overlaps(geometry1, geometry2)

Descript ion: If geomet ry1 and geomet ry2 overlap, t his funct ion ret urns t rue. Ot herwise, t his funct ion
ret urns false. Overlapping excludes t he t angency of t he geomet ries.

Example :

SELECT ST_Overlaps(st_polygon(2,0, 2,3, 3,0), st_polygon(1,1, 1,4, 4,4, 4,1)) from src LIMI
T 1;
-- true is returned.
SELECT ST_Overlaps(st_polygon(2,0, 2,1, 3,1), ST_Polygon(1,1, 1,4, 4,4, 4,1)) from src LIMI
T 1;
-- false is returned.

307 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.12.5.8. ST_Relate
Funct ion declarat ion:

BOOLEAN ST_Relate(geometry1, geometry2)

Descript ion: If geomet ry1 has t he specified Dimensionally Ext ended nine-Int ersect ion Model (DE-9IM)
relat ionship wit h geomet ry2, t his funct ion ret urns t rue. Ot herwise, t his funct ion ret urns false.

Example :

SELECT ST_Relate(st_polygon(2,0, 2,1, 3,1), ST_Polygon(1,1, 1,4, 4,4, 4,1), '****T****') fr


om src LIMIT 1;
-- true is returned.
SELECT ST_Relate(st_polygon(2,0, 2,1, 3,1), ST_Polygon(1,1, 1,4, 4,4, 4,1), 'T********') fr
om src LIMIT 1;
-- false is returned.
SELECT ST_Relate(st_linestring(0,0, 3,3), ST_linestring(1,1, 4,4), 'T********') from src LI
MIT 1;
-- true is returned.
SELECT ST_Relate(st_linestring(0,0, 3,3), ST_linestring(1,1, 4,4), '****T****') from src LI
MIT 1;
-- false is returned.

6.12.5.9. ST_Touches
Funct ion declarat ion:

BOOLEAN ST_Touches(geometry1, geometry2)

Descript ion: If geomet ry1 and geomet ry2 spat ially t ouch and have no similar int erior point s, t his
funct ion ret urns t rue. Ot herwise, t his funct ion ret urns false.

Example :

SELECT ST_Touches(st_point(1, 2), st_polygon(1, 1, 1, 4, 4, 4, 4, 1)) from src LIMIT 1;


-- true is returned.
SELECT ST_Touches(st_point(8, 8), st_polygon(1, 1, 1, 4, 4, 4, 4, 1)) from src LIMIT 1;
-- false is returned.

6.12.5.10. ST_Within
Funct ion declarat ion:

BOOLEAN ST_Within(geometry1, geometry2)

Descript ion: If geomet ry1 is wit hin geomet ry2, t his funct ion ret urns t rue. Ot herwise, t his funct ion
ret urns false.

Example :

> Document Version: 20220928 308


User Guide· MaxComput e SQL MaxComput e

SELECT ST_Within(st_point(2, 3), st_polygon(1,1, 1,4, 4,4, 4,1)) from src LIMIT 1;
-- true is returned.
SELECT ST_Within(st_point(8, 8), st_polygon(1,1, 1,4, 4,4, 4,1)) from src LIMIT 1;
-- false is returned.

6.12.6. Geohash index functions


6.12.6.1. ST_GeoHash
Funct ion declarat ion:

string ST_GeoHash(st_geometry geometry, integer precision=full_precision)


string ST_GeoHash(double longitude, double latitude, integer precision=full_precision)

Descript ion: T his funct ion ret urns t he unique Geohash st ring of t he specified point . T his funct ion uses
a funct ion wit h t he ST _ prefix or t he specified longit udes and lat it udes as input paramet ers. If t he
precision paramet er is not specified, t he maximum precision is used.

Example :

SELECT ST_GeoHash(ST_Point(-102.849854, 36.451113), 8);


SELECT ST_GeoHash(ST_GeomFromText('POINT(-102.849854 36.451113)'));
SELECT ST_GeoHash(-102.849854, 36.451113, 10);

6.12.6.2. ST_PointFromGeoHash
Funct ion declarat ion:

st_geometry ST_PointFromGeoHash(string geohash, integer precision=full_precision)

Descript ion: T his funct ion ret urns a point based on t he input Geohash value. If t he precision paramet er
is not specified, t he maximum precision is used.

Example :

SELECT ST_AsText(ST_PointFromGeoHash('9wqz7eep0eyq'));
SELECT ST_AsText(ST_PointFromGeoHash('9wqz7eep0eyq', 4));

6.12.6.3. ST_EnvelopeFromGeoHash
Funct ion declarat ion:

st_geometry ST_EnvelopeFromGeoHash(string geohash, integer precision=full_precision)

Descript ion: T his funct ion ret urns t he envelope of t he specified precision based on t he input Geohash
value. If t he precision paramet er is not specified, t he maximum precision is used.

Example :

309 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

SELECT ST_AsText(ST_EnvelopeFromGeoHash('9wqz7eep0eyq', 8));


SELECT ST_AsText(ST_EnvelopeFromGeoHash('9wqz7eep0eyq'));

6.12.6.4. ST_GeoHashNeighbours
Funct ion declarat ion:

list_of_string ST_GeoHashNeighbours(double longitude, double latitude, integer precision)

Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F) t hat generat es nine dat a
records. T his funct ion ret urns nine Geohash st rings of t he current point and it s eight neighboring point s
based on t he input longit ude, lat it ude, and precision. T hese paramet ers must be specified.

Example :

SELECT ST_GeoHashNeighbours(-102.849854, 36.451113, 10);

6.12.7. S2 mesh functions


6.12.7.1. ST_S2CellIdsFromGeom
Funct ion declarat ion:

list_of_string ST_S2CellIdsFromGeom(st_geometry geometry, integer level)

Descript ion: T his funct ion overwrit es t he input geomet ry by using S2 cells at t he specified level. T hen,
it ret urns t he IDs of all S2 cells.

Example :

SELECT ST_S2CellIdsFromGeom(ST_Point(-102.849854, 36.451113), 4);


SELECT ST_S2CellIdsFromGeom(ST_LineString('LINESTRING(-71.160281 42.258729,-71.160837 42.25
9113,-71.161144 42.25932)'), 17) as cellid;

6.12.7.2. ST_S2CellIdsFromText
Funct ion declarat ion:

list_of_string ST_S2CellIdsFromText(string wkt, integer level)

Descript ion: T his funct ion overwrit es t he well-known t ext (WKT ) represent at ion of t he input geomet ry
by using S2 cells at t he specified level. T hen, it ret urns t he IDs of all S2 cells.

Example :

SELECT ST_S2CellIdsFromText(ST_GeomFromText('POINT(-102.849854 36.451113)'), 4);


SELECT ST_S2CellIdsFromText('LINESTRING(-71.160281 42.258729,-71.160837 42.259113,-71.16114
4 42.25932)', 17) as cellid;

> Document Version: 20220928 310


User Guide· MaxComput e SQL MaxComput e

6.12.7.3. ST_S2CellCenterPoint
Funct ion declarat ion:

st_point ST_S2CellCenterPoint(string cellId)

Descript ion: T his funct ion calculat es t he cent er point of t he cell specified by t he cellId paramet er in
t he input S2 cell.

Example :

SELECT ST_S2CellCenterPoint('549015');
SELECT ST_AsText(ST_S2CellCenterPoint('89e37f091'));

6.12.7.4. ST_S2CellNeighbours
Funct ion declarat ion:

list_of_string ST_S2CellNeighbours(string cellId, integer level)

Descript ion: T his funct ion calculat es t he neighboring S2 cells of t he cell specified by t he cellId
paramet er at t he specified level. T hen, it ret urns t he IDs of all neighboring S2 cells.

Example :

SELECT ST_S2CellNeighbours('549015', 10);


SELECT ST_S2CellNeighbours('89e37f091', 16) as neighour;

6.12.8. Geodesic functions


6.12.8.1. ST_AreaWGS84
Funct ion declarat ion:

double ST_AreaWGS84(st_geometry geometry)

Descript ion: T his funct ion ret urns t he approximat e geodesic area of t he input geomet ry based on
World Geodet ic Syst em 1984 (WGS84). T his funct ion convert s t he coordinat es of t he input geomet ry
from EPSG:4326 t o EPSG:3857. T hen, it calculat es t he plane area in square met ers.
Example :

SELECT ST_AreaWGS84(ST_GeomFromText('POLYGON((743238 2967416,743238 2967450, 743265 2967450


,743265.625 2967416,743238 2967416))'));

6.12.8.2. ST_DistanceWGS84
Funct ion declarat ion:

double ST_DistanceWGS84(st_geometry geometry1, st_geometry geometry2)

311 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Descript ion: T his funct ion ret urns t he approximat e geodesic dist ance of t he input geomet ry based on
World Geodet ic Syst em 1984 (WGS84). T his funct ion convert s t he coordinat es of t he input geomet ry
from EPSG:4326 t o EPSG:3857. T hen, it calculat es t he plane dist ance in met ers.

Example :

SELECT ST_DistanceWGS84(ST_GeomFromText('POINT(-72.1235 42.3521)'), ST_GeomFromText('LINEST


RING(-72.1260 42.45, -72.123 42.1546)'));

6.12.8.3. ST_BufferWGS84
Funct ion declarat ion:

st_geometry ST_BufferWGS84(st_geometry geometry, double radius)

Descript ion: T his funct ion ret urns t he approximat e geodesic buffer of t he input geomet ry based on
World Geodet ic Syst em 1984 (WGS84). T his funct ion convert s t he coordinat es of t he input geomet ry
from EPSG:4326 t o EPSG:3857. T hen, it calculat es t he plane buffer and convert s t he coordinat es back
t o EPSG:4326.

Example :

SELECT ST_AsText(ST_BufferWGS84(ST_GeomFromText('POINT(-72.1235 42.3521)'), 10));

6.12.8.4. ST_GeodesicDistance
Funct ion declarat ion:

double ST_GeodesicDistance(double lon1, double lat1, double lon2, double lat2, string metho
d = VINCENTY)
double ST_GeodesicDistance(st_geometry geo1, st_geometry geo2, string method = VINCENTY)

Descript ion: T his funct ion calculat es t he geodesic dist ance bet ween t wo point s by using t he specified
met hod. T he support ed met hods are Vincent y, LawOfCosines, and Haversine. T he default value of t he
met hod paramet er is VINCENT Y. T he ret urn value is in radians.

Example :

SELECT ST_GeodesicDistance(ST_GeomFromText('POINT(152.352298 -24.875975)'), ST_GeomFromText


('POINT(151.960336 -24.993289)'), 'LawOfCosines');

6.12.8.5. ST_Distance_Sphere
Funct ion declarat ion:

double ST_Distance_Sphere(st_point geo1, st_point2 geo2)


double ST_Distance_Sphere(double lng1, double lat1, double lng2, double lat2)

Descript ion: T his funct ion uses t he algorit hm provided by AMAP t o calculat e t he approximat e
geodesic dist ance bet ween t he t wo input point s. T his funct ion uses ST _Point or t he specified
longit udes and lat it udes as input paramet ers.

> Document Version: 20220928 312


User Guide· MaxComput e SQL MaxComput e

Example :

SELECT ST_Distance_Sphere(
ST_GeomFromText('POINT(116.292078 39.919622)'),
ST_GeomFromText('POINT(116.286676 39.919593)'));
+------------+
| _c0 |
+------------+
| 460.6965312526471 |
+------------+

6.12.8.6. ST_Area_Sphere
Funct ion declarat ion:

double ST_Area_Sphere(st_geometry geo)

Descript ion: T his funct ion uses t he algorit hm provided by AMAP t o calculat e t he geodesic area of t he
input geomet ry. T his funct ion uses only ST _Polygon and ST _Mult iPolygon as input paramet ers.

Example :

SELECT geospatial.ST_Area_Sphere(geospatial.ST_GeomFromText('POLYGON((116.259097 40.202114,


116.259024 40.20199,116.258768 40.201662,116.258376 40.201341,116.258031 40.201036,116.2576
75 40.200734,116.257429 40.200656,116.257357 40.200562,116.257392 40.200051,116.257506 40.1
99433,116.257569 40.198586,116.257564 40.19756,116.257561 40.197372,116.257552 40.197036,11
6.257554 40.19675,116.257539 40.196647,116.257502 40.19653,116.257343 40.196389,116.257153
40.196276,116.256733 40.196071,116.25582 40.195646,116.255628 40.195611,116.255468 40.19565
3,116.255385 40.195742,116.255347 40.195849,116.255258 40.197143,116.255103 40.199576,116.2
55078 40.200585,116.251227 40.20059,116.251098 40.203978,116.259433 40.204111,116.259247 40
.203079,116.259097 40.202114))'));
+------------+
| _c0 |
+------------+
| 353493.765625 |
+------------+

6.12.9. R-tree index functions


6.12.9.1. ST_BuildRTreeIndex
Funct ion declarat ion:

RTree ST_BuildRTreeIndex(string uniqueId, string geometryWkt)

Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDAF). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry as input paramet ers t o creat e t he R-t ree index. T his
funct ion must be used wit h ot her R-t ree funct ions.

Example :

313 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

SELECT geospatial.ST_BuildRTreeIndex(id, shape) AS index FROM poi_sample;

6.12.9.2. ST_ContainsFromRTree
Funct ion declarat ion:

ST_ContainsFromRTree(string uniqueId, string geometryWkt, RTree rtree)

Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat are cont ained by
t he geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _Cont ains query.

6.12.9.3. ST_CrossesFromRTree
Funct ion declarat ion:

ST_CrossesFromRTree(string uniqueId, string geometryWkt, RTree rtree)

Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat cross t he
geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _Crosses query.

6.12.9.4. ST_EqualsFromRTree
Funct ion declarat ion:

ST_EqualsFromRTree(string uniqueId, string geometryWkt, RTree rtree)

Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat equal t he
geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _Equals query.

6.12.9.5. ST_IntersectsFromRTree
Funct ion declarat ion:

ST_IntersectsFromRTree(string uniqueId, string geometryWkt, RTree rtree)

Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat int ersect wit h t he
geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _Int ersect s query.

6.12.9.6. ST_OverlapsFromRTree
Funct ion declarat ion:

> Document Version: 20220928 314


User Guide· MaxComput e SQL MaxComput e

ST_OverlapsFromRTree(string uniqueId, string geometryWkt, RTree rtree)

Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat overlap wit h t he
geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e ST _Overlaps query.

6.12.9.7. ST_TouchesFromRTree
Funct ion declarat ion:

ST_TouchesFromRTree(string uniqueId, string geometryWkt, RTree rtree)

Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat spat ially t ouch
t he geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _T ouches query.

6.12.9.8. ST_WithinFromRTree
Funct ion declarat ion:

ST_WithinFromRTree(string uniqueId, string geometryWkt, RTree rtree)

Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of object s t hat include t he
geomet ry from t he R-t ree index. T his funct ion is used t o accelerat e t he ST _Wit hin query.

6.12.9.9. ST_KNNFromRTree
Funct ion declarat ion:

ST_KNNFromRTree(string uniqueId, string geometryWkt, int k, RTree rtree)

Descript ion: T his funct ion is a user-defined t able-valued funct ion (UDT F). It uses t he unique ID and
well-known t ext (WKT ) st ring of each geomet ry and t he R-t ree index t hat is creat ed by calling
ST _BuildRT reeIndex as input paramet ers. T his funct ion ret urns t he IDs of k object s t hat are near t o t he
geomet ry from t he R-t ree index.

6.12.9.10. Example
T his t opic provides examples on how t o use R-t ree index funct ions.

Example 1

315 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

-- Query the intersections of line segments in the A table and polygons in the B table.
set odps.sql.allow.cartesian=true;
SELECT a.id as link_id, b.id as shape_id
FROM link_sample_wkt a, poi_sample_wkt b
WHERE geospatial.ST_IsValid(b.shape)
AND geospatial.ST_Intersects(
geospatial.ST_LineString(a.line),
geospatial.ST_Multipolygon(b.shape));
Summary:
resource cost: cpu 3.28 Core * Min, memory 5.76 GB * Min
inputs:
meta_dev.poi_sample_wkt: 1000 (237592 bytes)
meta_dev.link_sample_wkt: 1000 (105940 bytes)
outputs:
Job run time: 111.000
+---------+----------+
| link_id | shape_id |
+---------+----------+
| 5121371185457659960 | B000A844XK |
| 5121377123249946651 | B000A85TV4 |
| 5121377166199619654 | B000A844KT |
+---------+----------+
-- After optimization by using the new function:
SELECT /*+mapjoin(i)*/
geospatial.ST_IntersectsFromRTree(id, line, i.index)
AS (link_id, shape_id)
FROM link_sample_wkt
JOIN
(
SELECT geospatial.ST_BuildRTreeIndex(id, shape) AS index
FROM poi_sample_wkt
WHERE geospatial.ST_IsValid(shape)
) i;
Summary:
resource cost: cpu 1.03 Core * Min, memory 1.99 GB * Min
inputs:
meta_dev.poi_sample_wkt: 1000 (237592 bytes)
meta_dev.link_sample_wkt: 1000 (105940 bytes)
outputs:
Job run time: 41.000
+---------+----------+
| link_id | shape_id |
+---------+----------+
| 5121371185457659960 | B000A844XK |
| 5121377123249946651 | B000A85TV4 |
| 5121377166199619654 | B000A844KT |
+---------+----------+

Example 2

> Document Version: 20220928 316


User Guide· MaxComput e SQL MaxComput e

-- Create an R-tree for all points in a table and use the KNN function to locate the neares
t point of each point.
SELECT /*+mapjoin(i)*/
geospatial.ST_KNNFromRTree(id, point, 1, i.index) AS (id1, id2)
FROM poi_sample_wkt
JOIN
(
SELECT geospatial.ST_BuildRTreeIndex(id, point) AS index
FROM poi_sample_wkt
) i;
Summary:
resource cost: cpu 1.17 Core * Min, memory 2.24 GB * Min
inputs:
meta_dev.poi_sample_wkt: 1000 (237592 bytes)
outputs:
Job run time: 46.000
+-----+-----+
| id1 | id2 |
+-----+-----+
| B000A01B4E | B000A01B4E |
| B000A01C19 | B000A01C19 |
| B000A023A5 | B000A023A5 |
| B000A02F81 | B000A02F81 |
| B000A07BEE | B000A07BEE |
| B000A07E06 | B000A07E06 |
| B000A08863 | B000A08863 |
...
-- The table has 1,000 rows of data. This function returns 1,000 rows of data, which meets
your expectations.

6.12.10. Other functions


6.12.10.1. ST_IsValid
Funct ion declarat ion:

boolean ST_IsValid(st_geometry geometry)


boolean ST_IsValid(string wkt)

Descript ion: T his funct ion checks whet her t he input geomet ry or well-known t ext (WKT ) st ring meet s
t he requirement s.

Example :

SELECT ST_IsValid('POINT(-102.849854 36.451113)');


SELECT ST_IsValid(ST_Point('POINT(-102.849854 36.451113)'));

6.12.10.2. ST_Transform
Funct ion declarat ion:

317 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

st_geometry ST_TransformWGS84(st_geometry geometry)


st_geometry ST_Transform(st_geometry geometry, integer toSRID)
st_geometry ST_Transform(st_geometry geometry, integer fromSRID, integer toSRID)

Descript ion: T his funct ion convert s t he coordinat es of t he input geomet ry from one spat ial reference
syst em t o anot her. T he ST _T ransformWGS84 funct ion convert s t he coordinat es of t he geomet ry from
EPSG:4326 t o EPSG:3857. T he ST _T ransform funct ion convert s t he geomet ry from fromSRID t o t oSRID.
If t he overload funct ion cont ains only t oSRID, you must call t he ST _Set SRID funct ion first .

Example :

SELECT ST_AsText(ST_Transform(ST_GeomFromText('POLYGON((743238 2967416,743238 2967450, 7432


65 2967450,743265.625 2967416,743238 2967416))', 2249, 4326));
SELECT ST_AsText(ST_TransformWGS84(ST_GeomFromText('POLYGON((-71.1776848522251 42.390289651
2902,-71.1776843766326 42.3903829478009, -71.1775844305465 42.3903826677917,-71.17758259272
31 42.3902893647987,-71.1776848522251 42.3902896512902))')));

6.13. SQL Function


User-defined funct ions (UDFs) in MaxComput e support Java or Pyt hon. Some UDFs can be direct ly
implement ed by SQL. T herefore, MaxComput e support s SQL funct ions. T his improves t he reuse rat e of
SQL code.

Use SQ L functions
Example:

FUNCTION ADD(@a BIGINT) AS @a + 1;


SELECT ADD(key), ADD(value) FROM src;

Functions as input parameters


Funct ions can be used as input paramet ers for SQL funct ions, including built -in funct ions, UDFs, and SQL
funct ions.

Example:

FUNCTION ADD(@a BIGINT) AS @a + 1;


FUNCTION OP(@a, @fun FUNCTION (BIGINT) RETURNS BIGINT) AS @ fun(@a);
SELECT OP(key, ADD), OP(key, abs) FROM src;

Anonymous functions as input parameters


Anonymous funct ions can be used as input paramet ers for SQL funct ions.

Example:

FUNCTION OP(@a, @fun FUNCTION (BIGINT) RETURNS BIGINT) AS @ fun(@a);


SELECT OP(key, FUNCTION (@a) AS @a + 1) FROM src;

6.14. CLONE TABLE


> Document Version: 20220928 318
User Guide· MaxComput e SQL MaxComput e

MaxComput e support s t he CLONE T ABLE st at ement . You can execut e t his st at ement t o clone dat a from
one t able t o anot her.

Synt ax

CLONE TABLE <[src_project_name.]src_table_name> [PARTITION(spec), ...] TO <[dest_project_n


ame.]desc_table_name> [IF EXISTS (OVERWRITE | IGNORE)] ;

Not e
If t he dest inat ion t able is not creat ed before dat a is cloned, a t able is creat ed by using t he
CREAT E T ABLE LIKE st at ement when you execut e t he CLONE T ABLE st at ement .
If t he dest inat ion t able is creat ed before dat a is cloned and IF EXIST S OVERWRIT E is
specified, dat a in t he specified part it ions of t he dest inat ion t able is overwrit t en.
If t he dest inat ion t able is creat ed before dat a is cloned and IF EXIST S IGNORE is specified,
exist ing part it ions in t he dest inat ion t able are skipped and dat a in t hese part it ions is not
overwrit t en.

Limit s and t roubleshoot ing

T he schema of a dest inat ion t able must be compat ible wit h t hat of t he source t able.
T he CLONE T ABLE st at ement support s bot h part it ioned and non-part it ioned t ables. T ables t hat have
special dat a organizat ion st ruct ures are not support ed. T hese t ables include clust ered t ables, shard
t ables, Xlib or Algo t ables, and t ables wit h ext reme st orage.
Make sure t hat t he configurat ion of t he clust er for t he source t able int ersect s wit h t hat for t he
dest inat ion t able and t he dat a t hat you want t o process is in t he same clust er. If any of t he
condit ions is not met , an error is ret urned.
If t he dest inat ion t able already exist s before dat a is cloned, you can clone dat a from a maximum of
10,000 part it ions at a t ime.
If t he dest inat ion t able does not exist before dat a is cloned, t he number of part it ions t hat you can
clone at a t ime is not limit ed, which ensures at omicit y.
If a hard link in t he Apsara Dist ribut ed File Syst em is fault y, purge t he recycle bin and t ry again.
T he user who submit s t he command must have t he Creat e T able and Updat e T able permissions on
t he t arget project .

Example

T he following code shows t he part it ions and dat a of t he source t ables:

319 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

odps@ multi>read srcpart_copy;


+------------+------------+------------+------------+
| key | value | ds | hr |
+------------+------------+------------+------------+
| 1 | ok49 | 2008-04-09 | 11 |
| 1 | ok48 | 2008-04-08 | 12 |
+------------+------------+------------+------------+
odps@ multi>read src_copy;
+------------+------------+
| key | value |
+------------+------------+
| 1 | ok |
+------------+------------+

Clone all dat a from t he non-part it ioned t able.

clone table src_copy to src_clone;


odps@ multi>clone table src_copy to src_clone;
ID = 2019102303024544g2540cdv2
OK
odps@ multi>read src_clone;
+------------+------------+
| key | value |
+------------+------------+
| 1 | ok |
+------------+------------+

Clone some part it ions of t he part it ioned t able.

clone table srcpart_copy partition(ds="2008-04-09", hr='11') to srcpart_clone IF EXISTS OVE


RWRITE;
odps@ multi>clone table srcpart_copy partition(ds="2008-04-09", hr='11') to srcpart_clone I
F EXISTS OVERWRITE;
ID = 20191023030534986g4540cdv2
OK
odps@ multi>read srcpart_clone;
+------------+------------+------------+------------+
| key | value | ds | hr |
+------------+------------+------------+------------+
| 1 | ok49 | 2008-04-09 | 11 |
+------------+------------+------------+------------+

Clone dat a from t he part it ioned t able and skip exist ing part it ions in t he dest inat ion t able.

> Document Version: 20220928 320


User Guide· MaxComput e SQL MaxComput e

clone table srcpart_copy to srcpart_clone IF EXISTS IGNORE;


odps@ multi>clone table srcpart_copy to srcpart_clone IF EXISTS IGNORE;
ID = 20191023030619196g5540cdv2
OK
odps@ multi>read srcpart_clone;
+------------+------------+------------+------------+
| key | value | ds | hr |
+------------+------------+------------+------------+
| 1 | ok49 | 2008-04-09 | 11 |
| 1 | ok48 | 2008-04-08 | 12 |
+------------+------------+------------+------------+

Clone all dat a from t he part it ioned t able.

clone table srcpart_copy to srcpart_clone2;


odps@ multi>clone table srcpart_copy to srcpart_clone2;
ID = 20191023030825186g6540cdv2
OK
odps@ multi>read srcpart_clone2;
+------------+------------+------------+------------+
| key | value | ds | hr |
+------------+------------+------------+------------+
| 1 | ok49 | 2008-04-09 | 11 |
| 1 | ok48 | 2008-04-08 | 12 |
+------------+------------+------------+------------+

6.15. MaxCompute Hash Clustering


6.15.1. Background information
JOIN operat ions are commonly used for queries in MaxComput e. MaxComput e provides t he following
implement at ion met hods of JOIN operat ions:

1. Broadcast hash join: T his met hod is used when a JOIN operat ion involves a small t able. T he small
t able is broadcast ed and t ransferred t o all JoinT ask inst ances. T hen, t he hash join operat ion is
performed t o join t he small t able wit h a large t able.
2. Shuf f le hash join: T his met hod is used when a JOIN operat ion involves large t ables t hat cannot be
broadcast ed direct ly. In t his case, t he hash shuffle operat ion is performed on t wo t ables based on
join keys. T he hash result s for t he same key-value pairs are t he same. T his ensures t hat result s t hat
have t he same key are collect ed on a JoinT ask inst ance. For each inst ance, a hash t able is creat ed
by using a small t able, probe operat ions are performed by using a large t able, and t hen t he t ables
are joined.
3. Sort merge join: T his met hod is used when a JOIN operat ion involves larger t ables and t he
preceding met hods cannot be used because t he memory is insufficient t o creat e a hash t able. In
t his case, t he hash shuffle operat ion is performed on t wo t ables based on join keys, t he obt ained
values are sort ed by using join keys, and t hen t he sort ed values are merged.

321 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

T he sort merge join operat ion is commonly used in MaxComput e because MaxComput e processes
huge volumes of dat a in most cases. T his operat ion generat es repeat ed shuffle and join operat ions.
T he physical execut ion plan of Job Scheduler of t he JOIN operat ion also requires mult iple st ages, which
consumes excessive volumes of resources.

> Document Version: 20220928 322


User Guide· MaxComput e SQL MaxComput e

T herefore, MaxComput e allows you t o configure t he hash shuffle and sort at t ribut es when t he dat a is
init ially generat ed in a t able. T his prevent s dat a from being shuffled and sort ed repeat edly in
subsequent queries. As a result , t he number of st ages in t he physical execut ion plan of Job Scheduler of
a JOIN operat ion is reduced. T he preceding figure shows t hat only one st age is required.

MaxComput e Hash Clust ering allows you t o configure t he shuffle and sort at t ribut es of a t able when
you creat e t he t able. As a result , MaxComput e opt imizes t he execut ion plan, improves t he efficiency,
and saves resources based on t he exist ing st orage charact erist ics.

6.15.2. Descriptions
6.15.2.1. Enable or disable Hash Clustering
T he Hash Clust ering feat ure is available and enabled by default . If you want t o use clust ered indexes,
add t he following flag:

set odps.sql.cfile2.enable.read.write.index.flag=true;

Aft er t he flag is set t o t rue, t he syst em aut omat ically creat es indexes for t he sort ed hash bucket s t o
improve query efficiency. T o use clust ered indexes, you must add t his flag during t able creat ion and
subsequent queries. If you want t o use clust ered indexes in your project all t he t ime, cont act t he
MaxComput e t eam.

Not e Clust ered indexes improve t he efficiency of queries (equivalent values or ranges) based
on sort keys. However, you can st ill experience t he superior performance provided by Hash
Clust ering alt hough you do not add t his flag.

6.15.2.2. Create a hash clustering table


You can use t he following st at ement t o creat e a hash clust ering t able. You must specify clust er keys or
hash keys and t he number of hash bucket s. T he sort operat ion is opt ional. However, we recommend
t hat you use t he same paramet er values as t he clust er keys t o achieve opt imal performance.

323 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

CREATE TABLE [IF NOT EXISTS] table_name


[(col_name data_type [comment col_comment], ...)]
[comment table_comment]
[PARTITIONED BY (col_name data_type [comment col_comment], ...)]
[CLUSTERED BY (col_name [, col_name, ...]) [SORTED BY (col_name [ASC | DESC] [, col_name [A
SC | DESC] ...])] INTO number_of_buckets BUCKETS]
[AS select_statement]

You can use t he following st at ement t o creat e a st andard t able:

CREATE TABLE T1 (a string, b string, c bigint) CLUSTERED BY (c) SORTED by (c) INTO 1024 BUC
KETS;

You can use t he following st at ement t o creat e a part it ioned t able:

CREATE TABLE T1 (a string, b string, c bigint) PARTITIONED BY (dt string) CLUSTERED BY (c)
SORTED by (c) INTO 1024 BUCKETS;

T he following sect ions det ail t he CLUST ERED BY, SORT ED BY, and INT O number_of_bucket s BUCKET S
clauses.

CLUSTERED BY
T he CLUST ERED BY clause specifies hash keys. MaxComput e performs t he hash operat ion on t he
specified column and dist ribut es dat a t o bucket s based on t he hash values. T o prevent dat a skew and
hot spot s, and t o concurrent ly execut e st at ement s, we recommend t hat you specify a column t hat has
large value ranges and a small number of duplicat e key-value pairs in CLUST ERED BY. In addit ion, t o
opt imize t he JOIN operat ion, we recommend t hat you select commonly used join or aggregat ion keys.
T he join and aggregat ion keys are similar t o t he primary keys in convent ional dat abases.

SO RTED BY
T he SORT ED BY clause specifies how fields are sort ed in a bucket . We recommend t hat you specify t he
same column in SORT ED BY as t hat in CLUST ERED BY t o improve execut ion efficiency. Aft er you specify
t he column in SORT ED BY, MaxComput e aut omat ically generat es indexes and t hen execut es SQL
st at ement s fast er when you query dat a based on t hese indexes.

INTO number_of_buckets BUCKETS


T he INT O number_of _bucket s BUCKET S clause specifies t he number of hash bucket s, which is
required. T he number of hash bucket s is det ermined by t he volume of dat a. More bucket s indicat e
higher concurrency, which short ens t he job running t ime. However, if t oo many bucket s exist , excessive
small files may be generat ed. In addit ion, high concurrency increases CPU t ime. We recommend t hat you
set t he volume of dat a for each bucket t o a value t hat ranges from 500 MB t o 1 GB. If a large t able is
used, you can adjust t he value t o a larger value as required.

You can remove t he shuffle operat ion only for t ables wit h t he same number of bucket s in MaxComput e.
In lat er versions, MaxComput e will support bucket alignment . You will be able t o remove t he shuffle
operat ion for t ables whose numbers of bucket s are mult iples or fact ors of each ot her. T o achieve
bucket alignment , we recommend t hat you set t he number of bucket s t o a power of 2, for example,
512, 1,024, and 2,048. T he maximum number of bucket s is 4,096. If t he number of bucket s exceeds t he
value, t he performance and resource usage may be affect ed.

> Document Version: 20220928 324


User Guide· MaxComput e SQL MaxComput e

If you want t o remove t he shuffle and sort operat ions during a JOIN operat ion on t wo t ables, t he
numbers of hash bucket s in t he t ables must be t he same. If t he numbers t hat are calculat ed based on
t he aforement ioned met hod are inconsist ent , we recommend t hat you use t he larger number for t he
JOIN operat ion. T his guarant ees t hat SQL st at ement s can be execut ed concurrent ly in an efficient
manner.

If t he sizes of t wo t ables great ly differ, you can set t he number of bucket s for t he large t able t o
several t imes of t hat for t he small t able, for example, 256 and 1,024. If aut omat ic hash bucket split and
merging are support ed, t he set t ings can be opt imized by using dat a feat ures.

6.15.2.3. Modify table attributes


For a part it ioned t able, MaxComput e allows you t o execut e t he ALT ER T ABLE st at ement t o add t he
Hash Clust ering at t ribut e t o a t able or remove t he Hash Clust ering at t ribut e from a t able.

ALTER TABLE table_name


[CLUSTERED BY (col_name [, col_name, ...]) [SORTED BY (col_name [ASC | DESC] [, col_nam
e [ASC | DESC] ...])] INTO number_of_buckets BUCKETS]
ALTER TABLE table_name NOT CLUSTERED;

Not e t he following point s when you use t he ALT ER T ABLE st at ement :

T he ALT ER T ABLE st at ement can only modify t he Hash Clust ering at t ribut e of a part it ioned t able. T he
Hash Clust ering at t ribut e cannot be modified aft er it is added t o a non-part it ioned t able.
T he ALT ER T ABLE st at ement t akes effect only for t he new part it ions of a t able, which include t he
part it ions generat ed by using t he INSERT OVERWRIT E st at ement . New part it ions are st ored based on
t he Hash Clust ering at t ribut e. T he st orage format s of exist ing part it ions remain unchanged.
T he ALT ER T ABLE st at ement t akes effect only for t he new part it ions of a t able. T herefore, you
cannot specify a part it ion in t his st at ement .

T he ALT ER T ABLE st at ement is suit able for exist ing t ables. Aft er t he Hash Clust ering at t ribut e is added,
new part it ions are st ored based on t he Hash Clust ering at t ribut e.

6.15.2.4. View and verify table attributes


Aft er you creat e a hash clust ering t able, execut e t he following st at ement t o view t able at t ribut es:

DESC EXTENDED table_name;

T he Hash Clust ering at t ribut e is displayed in Ext ended Info.

You can also execut e t he following st at ement t o view part it ion at t ribut es of a part it ioned t able:

325 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

DESC EXTENDED table_name partition(pt_spec);

T he following figure shows t he execut ion result .

6.15.3. Benefits
6.15.3.1. Bucket pruning and index optimization
T he following code provides a synt ax sample:

CREATE TABLE t1 (id bigint, a string, b string) CLUSTERED BY (id) SORTED BY (id) into 1000
BUCKETS;
...
SELECT t1.a, t1.b, t1.c FROM t1 WHERE t1.id=12345;

T his synt ax indicat es a full scan for a st andard t able. A full scan for a large t able consumes a large
number of resources. However, if t he hash shuffle operat ion is performed on all id fields and t he id
fields are sort ed, t he query is great ly simplified. T he sample procedure is as follows:

1. Find t he hash bucket t hat corresponds t o 12345. T his query is performed in only one bucket , not all
1,000 bucket s. T his process is called bucket pruning.
2. Dat a in a bucket is st ored based on IDs. MaxComput e aut omat ically creat es indexes and uses t he
INDEX LOOKUP funct ion t o locat e relevant records.

T he simplified procedure not only great ly reduces t he number of mappers, but also allows mappers t o
locat e t he page where t he dat a is st ored by using t he INDEX funct ion. T herefore, t he volume of loaded
dat a is great ly reduced.

6.15.3.2. Aggregation optimization


T he following code provides a synt ax sample:

SELECT department, SUM(salary) FROM employee GROUP BY (department);

> Document Version: 20220928 326


User Guide· MaxComput e SQL MaxComput e

In most cases, t he depart ment column is shuffled and sort ed. T hen, a st ream aggregat e operat ion is
performed t o collect st at ist ics on t he depart ment groups. However, if CLUSTERED BY (department)
SORTED BY (department) is execut ed for t he t able dat a, t he shuffle and sort operat ions are no longer
required.

6.15.3.3. Storage optimization


In addit ion t o comput at ion opt imizat ion, st orage space is great ly saved if t ables are shuffled and
st ored in a sort ed manner. MaxComput e uses t he column st ore at t he underlying layer. Records wit h t he
same or similar key-value pairs are st ored t oget her by t he sort funct ion, which facilit at es encoding and
compression. As a result , compression efficiency is improved. In some cases, a sort ed t able can save 50%
more st orage space t han an unsort ed t able. T herefore, Hash Clust ering is suit able for t he st orage of
t ables t hat have long lifecycles.

For example, t ake a t able wit h 100 GB of T PC-H line it ems and mult iple dat a t ypes, such as INT , DOUBLE,
and ST RING. When Hash Clust ering is used, about 10% of t he st orage space is saved while t he volume of
dat a and compression format remain unchanged.

327 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.15.4. ShuffleRemove
Range clust ering t ables support t he join and aggregat e operat ions. If a join or group key is a range
clust ering key or it s prefix, dat a redist ribut ion is not required. T his mechanism is called ShuffleRemove,
which improves execut ion efficiency.

Usage : T he odps.optimizer.enable.range.partial.repartitioning flag cont rols whet her t o


enable t his feat ure. T his feat ure is disabled by default .

If you join t wo hash clust ering t ables and t he numbers of bucket s in t hese t ables are different but
are mult iples or fact ors of each ot her, dat a redist ribut ion is not required. T his improves execut ion
efficiency.

Usage : T he odps.optimizer.enable.hash.partial.repartitioning flag cont rols whet her t o


enable t his feat ure. T his feat ure is enabled by default .

Correlat ed Shuffle Remove is support ed. If dat a meet s dist ribut ion requirement s but does not meet
t he sort ing requirement s, you can add a sort operat or t o avoid dat a redist ribut ion.

6.15.5. Limits
T he limit s of Hash Clust ering are described as follows:

T he INSERT INT O st at ement is not support ed. You can only execut e t he INSERT OVERWRIT E
st at ement t o import dat a.
Small files cannot be merged. Dat a is evenly dist ribut ed in bucket s when it is split , so no small files are
generat ed. If you merge files, t he dat a dist ribut ion is affect ed. However, you can st ill use t he merge
and archive commands t o change t he st orage format of a t able file and t he format of a RAID file.

> Document Version: 20220928 328


User Guide· MaxComput e SQL MaxComput e

You cannot use T unnel t o upload dat a t o a range-clust ered t able because dat a uploaded by using
T unnel is unsort ed.

In t he fut ure, t hese limit s will be resolved. St ay t uned for updat es on t he official websit e.

6.16. MaxCompute SQL limits


T he following t able list s all t he limit s of MaxComput e SQL st at ement s.

Limits

Maximum
Item Category Description
value/Limit

A table name or column name cannot contain


T able name special characters. It can contain only lowercase
128 bytes Length
lengt h and uppercase letters, digits, and underscores (_)
and must start with a letter.

Co mment lengt h 1,024 bytes Length A comment can be up to 1,024 bytes in length.

Co lumn
A table can contain a maximum of 1,200 column
def init io ns in a 1,200 Quantity
definitions.
t able

Part it io ns in a A table can contain a maximum of 60,000


60,000 Quantity
t able partitions.

Part it io n levels A table can contain a maximum of six levels of


6 Quantity
o f a t able partitions.

St at ist ical
A table can contain a maximum of 100 statistical
def init io ns o f a 100 Quantity
definitions.
t able

St at ist ical
T he length of statistical definitions in a table
def init io n lengt h 64,000 Length
cannot exceed 64,000.
o f a t able

A SELECT statement can generate a maximum of


Screen display 10,000 rows Quantity
10,000 rows.

A MULT IINS operation can insert a maximum of


INSERT t arget s 256 Quantity
256 data tables at a time.

A UNION ALL operation can be performed on a


UNION ALL 256 tables Quantity
maximum of 256 tables.

T he JOIN operation can be performed on a


JOIN so urces 128 Quantity
maximum of 128 source tables.

T he memory size for all small tables on which


MAPJOIN
512 MB Quantity the MAPJOIN operation is performed cannot
memo ry
exceed 512 MB.

329 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Maximum
Item Category Description
value/Limit

W indo w A SELECT statement can contain a maximum of


5 Quantity
f unct io ns five window functions.

A PT IN SUBQUERY statement can generate a


PT INSUBQ 1,000 rows Quantity
maximum of 1,000 rows.

Lengt h o f an T he maximum length of an SQL statement is 2


2 MB Length
SQL st at ement MB.

Co ndit io ns o f a A WHERE clause can contain a maximum of 256


256 Quantity
W HERE clause conditions.

Lengt h o f a T he maximum length of a column record in a


8 MB Length
co lumn reco rd table is 8 MB.

T his item specifies the maximum number of


parameters in an IN clause, such as
in(1,2,3,...,1024). Excess parameters can slow
IN paramet ers 1,024 Quantity
down the compilation process. We recommend
that you use no more than 1,024 parameters, but
this is not a fixed upper limit.

T he maximum size of the jobconf.json file is 1


MB. If a table contains a large number of
jo bco nf .jso n 1 MB Length
partitions, the size of jobconf.json may exceed 1
MB.

A view is not writable and does not support the


V iew Not writable Operation
INSERT operation.

Dat a t ype and


T he data type and position of a column are
po sit io n o f a Unmodifiable Operation
unmodifiable.
co lumn

Cannot be
Java UDFs abstract or Operation Java UDFs cannot be abstract or static.
static

Part it io ns t o
10,000 Quantity A maximum of 10,000 partitions can be queried.
query

Not ice T he preceding MaxComput e SQL limit s cannot be modified manually.

6.17. Common MaxCompute SQL


parameter settings
6.17.1. MAP configurations

> Document Version: 20220928 330


User Guide· MaxComput e SQL MaxComput e

set odps.sql.mapper.cpu=100

Purpose: It is used t o set t he number of CPUs for each inst ance in a Map t ask. Default value: 100. Value
range: 50 t o 800.

set odps.sql.mapper.memory=1024

Purpose: It is used t o set t he memory size for each inst ance in a Map t ask. Default value: 1024 MB. Value
range: 256 MB t o 12,288 MB.

set odps.sql.mapper.merge.limit.size=64

Purpose: It is used t o set t he maximum size of cont rol files t o be merged. Default value: 64 MB. You can
set t his variable t o cont rol t he input s of mappers. Value range: 0 t o Int eger.MAX_VALUE.

set odps.sql.mapper.split.size=256

Purpose: It is used t o set t he maximum dat a input volume for a map. Default value: 256 MB. You can set
t his variable t o cont rol t he input s of mappers. Value range: 1 t o Int eger.MAX_VALUE.

6.17.2. JOIN configurations


set odps.sql.joiner.instances=-1

Purpose: It is used t o set t he number of inst ances in a JOIN t ask. Default value: 1. Value range: 0 t o
2,000.

set odps.sql.joiner.cpu=100

Purpose: It is used t o set t he number of CPUs for each inst ance in a JOIN t ask. Default value: 100. Value
range: 50 t o 800.

set odps.sql.joiner.memory=1024

Purpose: It is used t o set t he memory size for each inst ance in a JOIN t ask. Default value: 1,024 MB.
Value range: 256 MB t o 12,288 MB.

6.17.3. Reduce configurations


set odps.sql.reducer.instances=-1

Purpose: It is used t o set t he number of inst ances in a Reduce t ask. Default value: 1. Value range: 0 t o
2,000.

set odps.sql.reducer.cpu=100

331 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Purpose: It is used t o set t he number of CPUs for each inst ance in a Reduce t ask. Default value: 100.
Value range: 50 t o 800.

set odps.sql.reducer.memory=1024

Purpose: It is used t o set t he memory size for each inst ance in a Reduce t ask. Default value: 1,024 MB.
Value range: 256 t o 12,288 MB.

6.17.4. UDF configurations


set odps.sql.udf.jvm.memory=1024

Purpose: It is used t o set t he maximum memory size for a UDF JVM heap. Default value: 1,024 MB. Value
range: 256 t o 12,288 MB.

set odps.sql.udf.timeout=600

Purpose: It is used t o set t he t imeout value of a UDF. Default value: 600 seconds. Value range: 0 t o
3,600 seconds.

set odps.sql.udf.python.memory=256

Purpose: It is used t o set t he maximum memory size for UDF pyt hon. Default value: 256 MB. Value
range: 64 t o 3,072 MB.

set odps.sql.udf.optimize.reuse=true/false

Purpose: aft er st art -up, each UDF funct ion expression can only be calculat ed once, improving
performance. T he default is t rue.

set odps.sql.udf.strict.mode=false/true

Purpose: It is used t o cont rol funct ions regarding whet her t o ret urn NULL or error if dirt y dat a is
encount ered. If it is t rue, an error is ret urned. If it is false, NULL is ret urned.

6.17.5. MAPJOIN configurations


set odps.sql.mapjoin.memory.max=512

Purpose: It is used t o set t he maximum memory of a small t able in MAPJOIN. Default vlaue 512 MB. Value
range: 128 t o 2,048 MB.

set odps.sql.reshuffle.dynamicpt=true/false

Purpose:
Some scenarios of dynamic part it ioning are t ime-consuming. Shut t ing t hem down can speed up SQL.
If t he dynamic part it ion value is very small, disabling dynamic part it ion can avoid dat a skew.

> Document Version: 20220928 332


User Guide· MaxComput e SQL MaxComput e

6.17.6. Configure data skew


set odps.sql.groupby.skewindata=true/false

Effect : enables t he group by opt imizat ion.

set odps.sql.skewjoin=true/false

Effect : enables t he join opt imizat ion. It t akes effect only when odps.sql.skewinfo is configured.

set odps.sql.skewinfo

Purpose: It is used t o set det ailed informat ion of join opt imizat ion. T he command synt ax is as follows:
set odps.sql.skewinfo=skewed_src:(skewed_key)[("skewed_value")]

Example:

T he following command is used t o set a single skewed dat a value in a single field:

set odps.sql.skewinfo=src_skewjoin1:(key)[("0")]
-- Command output: explain select a.key c1, a.value c2, b.key c3, b.value c4 from src a joi
n src_skewjoin1 b on a.key = b.key;

T he following command is used t o set mult iple skewed dat a values in a single field:

set odps.sql.skewinfo=src_skewjoin1:(key)[("0")("1")]
-- Command output: explain select a.key c1, a.value c2, b.key c3, b.value c4 from src a joi
n src_skewjoin1 b on a.key = b.key;

6.18. MapReduce-to-SQL conversion


for execution
6.18.1. Overview
MaxComput e provides a series of Java APIs for MapReduce t o process dat a.

In t he current version, MapReduce programs are aut omat ically convert ed t o SQL for execut ion. Aft er
t he conversion, you can use t he compiler, cost -based opt imizer, and vect orized execut ion engine
released wit h MaxComput e V2.0 t o process t he MapReduce programs. T he new feat ures of t he SQL
engine can also be used. T he feat ures, performance, and st abilit y of t he SQL engine are opt imized.

333 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not ice
You do not need t o change t he original APIs and job logic.
Only MapReduce jobs of t he OpenMR t ype, which are writ t en wit h MapReduce APIs, can be
convert ed t o SQL.
T his feat ure can be used for project s and jobs.
T his feat ure support s views as t he input .
T his feat ure support s ext ernal t ables as t he input .
T his feat ure support s T emporaryFile reads and writ es.
T his feat ure allows you t o read dat a from and writ e dat a t o hash clust ering t ables.
T his feat ure support s t he near-real-t ime execut ion of small jobs.

6.18.2. Configure local running settings


1. Download t he lat est MaxComput e client package t o your comput er and properly configure t he
client .
2. Configure t he execut ion mode.

You can configure t he execut ion mode based on your business requirement s. T he default
execut ion mode is lot . In lot mode, jobs are execut ed by MapReduce. T he new compiler, opt imizer,
and execut ion engine are not required.

You can enable t he execut ion mode by set t ing t he odps.mr.run.mode paramet er. Valid values: lot ,
sql, and hybrid .

Met hod 1: Enable t he execut ion mode at t he project level. When t he execut ion mode is enabled,
it affect s all jobs. T herefore, t he project administ rat or must apply for and enable t he execut ion
mode. Set t he odps.mr.run.mode paramet er t o hybrid or sql. If SQL execut ion fails in hybrid
mode, t he job is execut ed by MapReduce. If SQL execut ion fails in sql mode, an error is ret urned.
Met hod 2: Enable t he execut ion mode at t he session level. T his met hod is only valid for t he
current job. T o enable t he execut ion mode, use one of t he following met hods:
Add a set flag, such as set odps.mr.run.mode=hybrid , before JAR st at ement s.

Configure t he job paramet ers. Example:

JobConf job = new JobConf();


job.set("odps.mr.run.mode","hybrid")

T he execut ion mode can be enabled at t he project level lat er by MaxComput e O&M personnel.

6.18.3. Operation settings in DataWorks


Jobs running in Dat aWorks are updat ed by t he O&M personnel of MaxComput e and Dat aWorks. You do
not need t o updat e t he client manually.

1. Enable t he conversion for a single job.

You can add t he SET st at ement before a MapReduce job or configure t he job paramet er for it .
T hese met hods t ake effect at t he session level and apply only t o t he current job.

T he following examples demonst rat e how t o use t hese met hods:

> Document Version: 20220928 334


User Guide· MaxComput e SQL MaxComput e

Add t he SET st at ement , such as set odps.mr.run.mode=hybrid .

Configure t he job paramet er as follows:

JobConf job = new JobConf();


job.set("odps.mr.run.mode","hybrid")

2. Enable t he conversion at t he project level by set t ing odps.mr.run.mode for a project .

6.18.4. View running details


You can use Logview and MaxComput e St udio t o view MapReduce-t o-SQL conversion result s and
running det ails of SQL jobs.

1. LogView XML.

Open Logview and click t he LOT node in t he cent er of t he page. T he SQL jobs t hat are convert ed
from MapReduce jobs are included in t he XML informat ion of t he node. Example:

335 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

create temporary function mr2sql_mapper_152955927079392291755 as 'com.aliyun.odps.map


red.bridge.LotMapperUDTF' using ;
create temporary function mr2sql_reducer_152955927079392291755 as 'com.aliyun.odps.mapr
ed.bridge.LotReducerUDTF' using ;
@sub_query_mapper :=
SELECT k_id,v_gmt_create,v_gmt_modified,v_product_id,v_admin_seq,v_sku_attr,v_sku_pric
e,v_sku_stock,v_sku_code,v_sku_image,v_delivery_time,v_sku_bulk_order,v_sku_bulk_discou
nt,v_sku_image_version,v_currency_code
FROM(
SELECT mr2sql_mapper_152955927079392291755(id,gmt_create,gmt_modified,product_id,admin_
seq,sku_attr,sku_price,sku_stock,sku_code,sku_image,delivery_time,sku_bulk_order,sku_bu
lk_discount,sku_image_version,currency_code ) as (k_id,v_gmt_create,v_gmt_modified,v_pr
oduct_id,v_admin_seq,v_sku_attr,v_sku_price,v_sku_stock,v_sku_code,v_sku_image,v_delive
ry_time,v_sku_bulk_order,v_sku_bulk_discount,v_sku_image_version,v_currency_code)
FROM ae_antispam.product_sku_tt_inc
WHERE ds = "20180615" AND hh = "21"
UNION ALL
SELECT mr2sql_mapper_152955927079392291755(id,gmt_create,gmt_modified,product_id,admin_
seq,sku_attr,sku_price,sku_stock,sku_code,sku_image,delivery_time,sku_bulk_order,sku_bu
lk_discount,sku_image_version,currency_code ) as (k_id,v_gmt_create,v_gmt_modified,v_pr
oduct_id,v_admin_seq,v_sku_attr,v_sku_price,v_sku_stock,v_sku_code,v_sku_image,v_delive
ry_time,v_sku_bulk_order,v_sku_bulk_discount,v_sku_image_version,v_currency_code)
FROM ae_antispam.product_sku
) open_mr_alias1
DISTRIBUTE BY k_id SORT BY k_id ASC;
@sub_query_reducer :=
SELECT mr2sql_reducer_152955927079392291755(k_id,v_gmt_create,v_gmt_modified,v_product_
id,v_admin_seq,v_sku_attr,v_sku_price,v_sku_stock,v_sku_code,v_sku_image,v_delivery_tim
e,v_sku_bulk_order,v_sku_bulk_discount,v_sku_image_version,v_currency_code) as (id,gmt_
create,gmt_modified,product_id,admin_seq,sku_attr,sku_price,sku_stock,sku_code,sku_imag
e,delivery_time,sku_bulk_order,sku_bulk_discount,sku_image_version,currency_code)
FROM @sub_query_mapper;
FROM @sub_query_reducer
INSERT OVERWRITE TABLE ae_antispam.product_sku
SELECT id,gmt_create,gmt_modified,product_id,admin_seq,sku_attr,sku_price,sku_stock,sku
_code,sku_image,delivery_time,sku_bulk_order,sku_bulk_discount,sku_image_version,curren
cy_code ;

2. LogView det ail or summary.

You can see t hat t he new execut ion engine is used t o execut e jobs.

Job run mode: fuxi job


Job run engine: execution engine

3. LogView det ail or JSON summary.

T he JSON summary informat ion in MapReduce only cont ains t he input and out put informat ion of
Map and Reduce. However, t he JSON summary informat ion in SQL allows you t o view det ails about
each st age of SQL execut ion, such as all execut ion paramet ers, logical execut ion plans, physical
execut ion plans, and execut ion det ails. Example:

> Document Version: 20220928 336


User Guide· MaxComput e SQL MaxComput e

"midlots" :
[
"LogicalTableSink(table=[[odps_flighting.flt_20180621104445_step1_ad_quality_tech_qp_a
lgo_antifake_wordbag_filter_bag_change_result_lv2_20, auctionid,word,match_word(3) {0,
1, 2}]])
OdpsLogicalProject(auctionid=[$0], word=[$1], match_word=[$2])
OdpsLogicalProject(auctionid=[$0], word=[$1], match_word=[$2])
OdpsLogicalProject(auctionid=[$0], word=[$1], match_word=[$2])
OdpsLogicalProject(auctionid=[$2], word=[$3], match_word=[$4])
OdpsLogicalTableFunctionScan(invocation=[[MR2SQL_MAPPER_152955294118813063732($0, $1)](
)], rowType=[RecordType(VARCHAR(2147483647) item_id, VARCHAR(2147483647) text, VARCHAR(
2147483647) __tf_0_0, VARCHAR(2147483647) __tf_0_1, VARCHAR(2147483647) __tf_0_2)])
OdpsLogicalTableScan(table=[[ad_quality_tech.qp_algo_antifake_wordbag_filter_bag_change
_lv2_20, item_id,text(2) {0, 1}]])
]

6.18.5. Perform operations on the distributed file


system
Procedure
1. Specify volume files.

You can use eit her of t he following met hods t o specify volume files:

Use a ut ilit y class t o specify t he input and out put files:

com.aliyun. ODPS .mapred.utils.InputUtils.addVolume( new VolumeInfo([project,]inVolum


e,inPartition, "inLabel"), new JobConf());
com.aliyun. ODPS .mapred.utils.OutputUtils.addVolume( new VolumeInfo([project,]outVol
ume, outPartition, "outLabel"), new JobConf());

In t he preceding commands, project and label are opt ional, and t he current project and default
label are used by default . If mult iple input and out put files are used, labels are used t o
dist inguish t he files from each ot her. Aut horizat ion is required before you access t he volume files
of ot her project s.

Configure paramet ers t o specify t he volume and part it ion of t he input and out put files. If
mult iple input or out put files are used, separat e t he paramet ers wit h commas (,).

set odps.sql.volume.input[/output].desc = [<project>.]<table>.<partition>[:<label>];

2. Call t he following met hod by using a cont ext object in t he map and reduce st eps t o writ e dat a t o
t he dist ribut ed file syst em or writ e dat a st ream input and out put files:

context.getOutputVolumeFileSystem();

6.19. Analysis of the mapping


between SQL input and output fields

337 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

6.19.1. Features
MaxComput e SQL provides t he feat ure of analyzing t he mapping bet ween SQL input and out put fields.

T his feat ure is t o calculat e t he fields in t he input and out put t ables based on field mapping. Example:

select key, sum(value) as total from src group by key;

T he following result is ret urned.

T wo columns are ret urned: key and t ot al. T he key column corresponds t o t he src.key column of t he
input t able. T he t ot al column corresponds t o t he src.value column of t he input t able.

6.19.2. Usage notes


T his t opic describes how t o use t he feat ure of analyzing t he mapping bet ween input and out put fields.

O utput format
Field mapping analysis support s human-readable and JSON format s. You can use t he set
odps.sql.select.output.format=HumanReadable/json flag t o specify t he out put format .

SDK-based field mapping analysis


Examples

> Document Version: 20220928 338


User Guide· MaxComput e SQL MaxComput e

Odps odps = initOdps();


// To perform analysis, use LineageTask.
LineageTask task = new LineageTask("task_name", "select * from dual;");
// Optional. Use the preceding flag to specify the output format.
Map<String, String> settings = new LinkedHashMap<>();
settings.putIfAbsent("odps.sql.select.output.format", "json");
task.setProperty("settings", JSON.toJSONString(settings));
// Submit code to the server for field mapping analysis.
Instance instance = odps.instances().create(task);
System.out.println(instance.getId());
String logView = odps.logview().generateLogView(instance, 72);
System.out.println(logView);
instance.waitForSuccess();
// Obtain the analysis result.
System.out.println(instance.getTaskResults().get("task_name"));

odpscmd-based field mapping analysis


Examples

CLI mode: Use t he -X paramet er for field mapping analysis.

./bin/odpscmd.bat -X D:\lineage.q

Int eract ive mode: Aft er you ent er t he int eract ive mode of odpscmd, you can use t he preceding flag t o
specify t he out put format . T he usage met hod is similar t o t hat used t o commit SQL jobs.

odps@ lineage_test>set odps.sql.task.mode=LINEAGE;


OK
odps@ lineage_test>set odps.sql.select.output.format=Humanreadable;
OK
odps@ lineage_test>select * from dual;
== Column Lineage
Column : id
Source Columns :
test2.dual.id

6.20. Common MaxCompute SQL


errors and solutions
6.20.1. Data skew
6.20.1.1. Overview

339 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

For a running job inst ance where t he min, max, and avg values for t he paramet ers t ime, input records,
and out put records are imbalanced (for example, max is much great er t han avg), a dat a skew problem
may have occurred. You can check t he log view t o locat e t he dat a skew problem, as shown in t he
following figure.

T he Long T ails t ab of each t ask shows t he inst ance where t he dat a skew occurred. T he root cause of
dat a skew is t hat t he amount s of dat a processed by some inst ances are much higher t han t hat
processed by ot her inst ances, causing t he running t ime of t hese inst ances t o exceed t he average t ime
of ot her inst ances. As a result , t he ent ire job slows down.

You can reduce t he dat a skew of different SQL dat a t ypes using different met hods.

6.20.1.2. GROUP BY skew


Possible cause : T he unbalanced dist ribut ion of GROUP BY keys causes dat a skew in t he Reduce st ep.

Solut ion: Enable t he group skew prevent ion paramet er before running SQL st at ement s:

set odps.sql.groupby.skewindata=true

Not e If t his paramet er is set t o t rue, t he syst em adds random fact ors t o t he shuffle hash
algorit hm and adds a new t ask t o prevent dat a skew.

6.20.1.3. DISTRIBUTE BY skew


Possible cause : Using const ant s for full-t able sort ing in DIST RIBUT E BY mode will result in dat a skew at
t he Reduce end.

Solut ion: Avoid t he preceding operat ion.

6.20.1.4. JOIN skew


Possible cause : T he unbalanced dist ribut ion of join on keys (such as a large number of repeat ed keys
in mult iple JOIN t ables) causes surging Cart esian product dat a in some JOIN inst ances, which result s in
dat a skew.

Solut ion: T he solut ions t o different scenarios are as follows:

If t here are small t ables on bot h sides of 'join', perform 'map join' inst ead of 'join'.
T he skewed key can be dealt wit h by using individual logic. For example, a large amount of NULL dat a

> Document Version: 20220928 340


User Guide· MaxComput e SQL MaxComput e

in keys on bot h sides of a t able result s in skew. In t his case, you need t o filt er out t he NULL dat a
before performing t he JOIN operat ion or replacing NULL values wit h random values by using t he CASE
WHEN clause, and t hen do JOIN operat ion.
If you do not want t o change SQL st at ement s, set t he following paramet ers t o enable aut omat ic
opt imizat ion on MaxComput e:

set odps.sql.skewinfo=tab1:(col1, col2)[(v1, v2), (v3, v4), ...]


set odps.sql.skewjoin=true;

6.20.1.5. MULTI-DISTINCT skew


Possible cause : Mult iple DIST INCT keywords aggravat e t he GROUP BY skew problem.

Solut ion: You can use a t wo-layer GROUP BY t o smoot h t he skew.

6.20.1.6. Data skew caused by misuse of dynamic


partitioning
Possible cause : If dynamic part it ioning is enabled, and t here are K map inst ances and N t arget
part it ions, a number of small files (K * N) may be generat ed. A large amount of small files can great ly
increase t he management workload of t he file syst em. T herefore, t he following configurat ion t akes
effect by default :

set odps.sql.reshuffle.dynamicpt=true;

It int roduces an addit ional level of ReduceT ask t o allow one or more reduce inst ances t o writ e dat a t o
t he same t arget part it ion. T his prevent s t oo many small files from being generat ed. However, dynamic
part it ion shuffle may cause dat a skew.

Solut ion: If t here are only a few t arget part it ions, t he syst em will not generat e many small files. In t his
case, you can run t he following command t o disable t he preceding funct ion, or disable dynamic
part it ioning:

set odps.sql.reshuffle.dynamicpt=false;

6.20.2. Quota and resource usage


Comput ing resources in MaxComput e may be insufficient somet imes because of improper planning and
use of clust er resources.

341 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

In general, t asks lacking comput ing resources have t wo charact erist ics, one of which is t hat t he t ask
get s st uck wit h t he out put remained at a cert ain st age. For example, in t he following figure, t he
progress of t he M1_St g1 t ask has st ayed at 0% (because R2_1_St g1 depends on M1_St g1, it st ays at
0% unt il M1_St g1 ends).

T he ot her charact erist ic is t hat t he t ask remains in "Ready" st at e in t he Logview (as shown in t he
following figure) (a "Ready" t ask is await ing allocat ion of resources; a "Wait ing" t ask is wait ing for
complet ion of t he dependent t ask). T he "Ready" st at e indicat es t hat t he resources for running t hese
st and-by t ask inst ances are insufficient . Once t he inst ances obt ain t he necessary resources, t hey resume
operat ing and change t o "Running" st at e.

> Document Version: 20220928 342


User Guide· MaxComput e SQL MaxComput e

Each t ask is split int o subt asks based on t he execut ion plan and shown in a DAG, and each subt ask
invokes mult iple inst ances t o execut e t he comput at ion concurrent ly. In general, t he resources required
for invoking an inst ance are a 1-core CPU and 2 GB of memory. A quot a group is assigned t o each
project for reasonable resource allocat ion. T he quot a group det ermines t he maximum amount of
resources (CPU and memory) t hat can be used by all jobs in t he project concurrent ly. Once t he resource
usage for simult aneously running t asks reaches t he limit of t he quot a group, t he t asks are st uck due t o
insufficient resources.

T here are t wo met hods t o solve t his problem:

Run t he t asks in idle periods.


Increase t he quot a group for t he project (handled by OAM personnel).

6.20.3. MaxCompute storage optimization tips


Partition tables reasonably
MaxComput e support s t he concept of part it ioning in a t able. A part it ion refers t o t he specified
part it ion space in t he creat ion of a t able; t hat is, a few fields in t he specified t able as t he part it ion
columns. In most cases, you can consider a part it ion as a direct ory in a file syst em. MaxComput e divides
each value of t he part it ion column int o a part it ion (direct ory). Users can specify mult i-level part it ions
(use mult iple fields of t he t able as part it ion columns). Mult i-level part it ions are like mult i-level
direct ories. If you specify t he name of t he part it ion t hat you want t o access when using t he dat a, t hen
only t he corresponding part it ion are read, avoiding a full t able scan. T his improves t he processing
efficiency and reduces cost s.

Example of a part it ioning st at ement : create table src (key string, value bigint) partitioned by
(pt string); . In t his example, select * from src where pt='20160901'; specifies t he part it ioning
format . MaxComput e t akes only t he dat a in t he "20160901" part it ion as t he input when generat ing a
query plan.

Example of a non-part it ioning st at ement : select * from src where key = 'MaxCompute'; scans
t he ent ire t able.

Part it ioning is usually based on dat e or geographical region. You may also set part it ions based on your
business requirement s. Example:

create table if not exists sale_detail(


shop_name string,
customer_id string,
total_price double)
partitioned by (sale_date string,region string);
-- Create a two-level partitioned table, in which sale_date is level-1 partition, and regio
n level-2 partition.

Set table lifecycle reasonably


St orage space on MaxComput e is precious. You can set t he life cycle of a t able according t o dat a
usage. MaxComput e will delet e expired dat a t o save st orage space.

Example: Run t he create table test3 (key boolean) partitioned by (pt string, ds string)
lifecycle 100; command t o creat e a t able wit h a lifecycle of 100. If t he lat est modificat ion t ime of
t his t able or part it ion was more t han 100 days ago, t he t able or part it ion will be delet ed.

343 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

Not ice T he lifecycle t akes a part it ion as t he smallest unit , so for a part it ioned t able, if some
part it ions reach t he lifecycle t hreshold, t hey will be delet ed direct ly. Part it ions t hat have not
reached t he lifecycle t hreshold are not be affect ed.

Run t he alter table table_name set lifecycle days; command t o modify t he lifecycle of an
exist ing t able.

Archive cold data


Some dat a need t o be preserved eit her permanent ly or for a long period of t ime, but t he frequency of
access decreases over t ime. When t he use frequency is very low, you can archive t he dat a. T he archive
funct ion saves dat a wit h RAID. Dat a is not simply st ored as t hree copies. By using t he Cauchy Reed-
Solomon algorit hm, dat a is st ored as six copies of t he original dat a plus t hree parit y blocks. T his
improves t he effect ive st orage rat io from 1:3 t o 1:1.5. In addit ion, MaxComput e uses t he bzip2
algorit hm t o archive t ables wit h a higher compression rat io t han ot her algorit hms. Combining t he t wo
algorit hms reduces st orage usage by more t han 70%.

Archiving command format is as below:

ALTER TABLE table_name [PARTITION(partition_name='partition_value')] ARCHIVE;

Example:

alter table my_log partition(ds='20140101') archive;

Merge small files


In t he reduce calculat ion or real-t ime t unnel dat a collect ion, a large number of small files are
generat ed. T oo many small files may cause t he following problems:

Many inst ances are occupied because a single inst ance can process only a small number of files. T his
result s in a wast e of resources, affect ing t he overall execut ion performance.
T he file syst em becomes larger, while t he use rat io of disk space becomes smaller.

Current ly, t here are t wo alt ernat ive ways t o merge small files: ALT ER merge mode and SQL merge
mode:

T he ALT ER merge mode merges files t hrough 'console' command. T he command format is as follows:

ALTER TABLE tablename [PARTITION] MERGE SMALLFILES;

Set cont rol paramet ers aft er SQL execut ion is complet e. Run odps.task.merge.enabled=true; to
det ermine whet her it is necessary t o merge small files. If so, st art FuxiJob t o merge t hese files.

6.20.4. UDF OOM error


Some jobs will report t he OOM error during running. T he error message is as follows:

FAILED: ODPS-0123144: Fuxi job failed - WorkerRestart errCode:9,errMsg:SigKill(OOM), usuall


y caused by OOM(out of memory)

T his problem can be solved by set t ing t he UDF runt ime paramet ers:

> Document Version: 20220928 344


User Guide· MaxComput e SQL MaxComput e

odps.sql.mapper.memory=3072;
set odps.sql.udf.jvm.memory=2048;
set odps.sql.udf.python.memory=1536;

6.21. Appendix
6.21.1. Escape character
St ring const ant s in MaxComput e SQL can be enclosed in single or double quot at ion marks, in double
quot at ion marks enclosed in single quot at ion marks, or in single quot at ion marks enclosed in double
quot at ion marks. Ot herwise, t hey must be expressed wit h an escape charact er. Examples of correct
expressions: "I'm a happy coder!" and 'I\'m a happy coder!'.

In MaxComput e SQL, t he backslash (\) is an escape charact er, which expresses t he special charact er in a
st ring or int erpret s t he charact er t hat follows as t he charact er it self. When a st ring const ant is read, if
t he backslash is followed by t hree valid oct al digit s in t he range from 001 t o 177, t he syst em convert s
t he ASCII values int o t he corresponding charact ers. T he following t able list s t he mappings bet ween
escape sequences and represent ed charact ers.

Escape sequences

Escape
Represented character
sequence

\b Backspace

\t T ab

\n Newline

\r Carriage return

\' Single quote

\" Double quote

\ \ Backslash

\; Semicolon

\Z Control-Z

\0 o r \00 T erminator

Example :

select length('a\tb') from dual;


-- The result is 3, indicating that the string contains three characters, with "\t" regarde
d as one character. Any character following the escape sequence is interpreted as the chara
cter itself.

345 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

select 'a\ab',length('a\ab') from dual;


-- The result is 'aab', with a length of 3. "\a" is interpreted as an ordinary "a".

6.21.2. LIKE matching


In LIKE mat ching, "%" indicat es mat ching any number of charact ers; "_" indicat es mat ching a single
charact er. If t he charact er "%" or "_" needs t o be mat ched, escape conversion is required. "\%" indicat es
mat ching "%", and "\_" indicat es mat ching "_".

Not e For t he charact er set of st rings, MaxComput e SQL current ly support s t he UT F-8
charact er set . Dat a t hat is encoded in a different format may result in incorrect calculat ions.

6.21.3. Regular expressions


MaxComput e SQL adopt s t he PCRE library for regular expressions. Mat ching is performed charact er by
charact er. T he support ed met acharact ers are as follows:

^: t he beginning of a row
$: t he end of a row
.: any charact er
*: mat ches zero or mult iple t imes.
+: mat ches once or mult iple t imes.
?: mat ches a modifier. If t his charact er follows any one of ot her delimit ers (*, +, ?, {n}, {n,}, or {n,m}),
t he mat ch is lazy. In t he lazy mode, as few st rings as possible are mat ched. In t he default greedy
mode, as many searched st rings as possible are mat ched zero t imes or once.
A|B: A or B
(abc)*: mat ches t he abc sequence zero or mult iple t imes.
{n} or {m,n}: t he number of mat ches
[ab]: mat ches any charact er in t he bracket s.
[^ab]: ^ represent s NOT . T his met acharact er mat ches any charact er t hat is neit her a nor b.
\: t he escape sequence
\n: n represent s digit 1 t o 9. T his met acharact er specifies backward reference.
\d: digit
\D: non-digit
[::]: POSIX charact er set
[[:alnum:]]: let t er or digit in t he range of [a-zA-Z0-9]
[[:alpha:]]: let t er in t he range of [a-zA-Z]
[[:ascii:]]: ASCII charact er in t he range of [\x00-\x7F]
[[:blank:]]: space and t ab in t he range of [ \t ]
[[:cnt rl:]]: cont rol charact er in t he range of [\x00-\x1F\x7F]
[[:digit :]]: digit in t he range of [0-9]
[[:graph:]]: any charact er except space in t he range of [\x21-\x7E]
[[:space:]]: space in t he range of [ \t \r\n\v\f]
[[:print :]]: [:graph:] and [:space:] in t he range of [\x20-\x7E]

> Document Version: 20220928 346


User Guide· MaxComput e SQL MaxComput e

[[:lower:]]: lowercase let t er in t he range of [a-z]


[[:punct :]]: punct uat ion in t he range of [][!"#$%&()*+,./:;<=>? @\^_`{|}~-]
[[:upper:]]: uppercase let t er in t he range of [A-Z]
[[:xdigit :]]: hexadecimal charact er in t he range of [A-Fa-f0-9]

T he syst em uses a backslash (\) as t he escape charact er, so a backslash (\) in a regular expression
indicat es second escape. For example, t he st ring t o be mat ched by t he regular expression is "a+b". T he
plus sign (+) is a special charact er in regex, and must be escaped t o obt ain t he st ring "a+b". However,
t he syst em needs t o escape t he first backslash (escape charact er) before it can be read by regex.
Hence, t he expression t o mat ch "a+b" is "a\\+b".

T he following example assumes t hat t here is a t able named t est _dual:

select 'a+b' rlike 'a\\+b' from test_dual;


+------+
_c1 |
+------+
true |
+------+

In ext reme cases, t o mat ch t he charact er "\", which is a special charact er in t he regular engine, t he
expression must be "\\". T he syst em must perform an escape on t he expression, so it is expressed as
"\\\".

select 'a\\b', 'a\\b' rlike 'a\\\\b' from test_dual;


+-----+------+
_c0 | _c1 |
+-----+------+
a\b | true |
+-----+------+

Not e
If a MaxComput e SQL st at ement cont ains "a\b", 'a\b' is displayed in t he out put because
MaxComput e escapes t he expression.
If a st ring cont ains a t ab or t ab charact er, t he syst em reads '\t ' and st ores it as one
charact er. T herefore, it is a common charact er in t he regular expression mode.

select 'a\tb', 'a\tb' rlike 'a\tb' from test_dual;


+---------+------+
_c0 | _c1 |
+---------+------+
a b | true |
+---------+------+

6.21.4. Reserved words


T he following are all reserved words in MaxComput e SQL. Do not use t hese words t o name t ables,
columns, or part it ions. Ot herwise, an error is ret urned. Reserved words are case-insensit ive.

347 > Document Version: 20220928


MaxComput e User Guide· MaxComput e SQL

% & && ( ) * +. / ; < <= <> = > >= ? ADD AFTER ALL ALTER ANALYZE AND ARCHIVE ARRAY AS ASC B
EFORE BETWEEN BIGINT BINARY BLOB BOOLEAN BOTH BUCKET BUCKETS BY CASCADE CASE CAST CFILE CHA
NGE CLUSTER CLUSTERED CLUSTERSTATUS COLLECTION COLUMN COLUMNS COMMENT COMPUTE CONCATENATE C
ONTINUE CREATE CROSS CURRENT CURSOR DATA DATABASE DATABASES DATE DATETIME DBPROPERTIES DEFE
RRED DELETE DELIMITED DESC DESCRIBE DIRECTORY DISABLE DISTINCT DISTRIBUTE DOUBLE DROP ELSE
ENABLE END ESCAPED EXCLUSIVE EXISTS EXPLAIN EXPORT EXTENDED EXTERNAL FALSE FETCH FIELDS FIL
EFORMAT FIRST FLOAT FOLLOWING FORMAT FORMATTED FROM FULL FUNCTION FUNCTIONS GRANT GROUP HAV
ING HOLD_DDLTIME IDXPROPERTIES IF IMPORT IN INDEX INDEXES INPATH INPUTDRIVER INPUTFORMAT IN
SERT INT INTERSECT INTO IS ITEMS JOIN KEYS LATERAL LEFT LIFECYCLE LIKE LIMIT LINES LOAD LOC
AL LOCATION LOCK LOCKS LONG MAP MAPJOIN MATERIALIZED MINUS MSCK NOT NO_DROP NULL OF OFFLINE
ON OPTION OR ORDER OUT OUTER OUTPUTDRIVER OUTPUTFORMAT OVER OVERWRITE PARTITION PARTITIONED
PARTITIONPROPERTIES PARTITIONS PERCENT PLUS PRECEDING PRESERVE PROCEDURE PURGE RANGE RCFILE
READ READONLY READS REBUILD RECORDREADER RECORDWRITER REDUCE REGEXP RENAME REPAIR REPLACE R
ESTRICT REVOKE RIGHT RLIKE ROW ROWS SCHEMA SCHEMAS SELECT SEMI SEQUENCEFILE SERDE SERDEPROP
ERTIES SET SHARED SHOW SHOW_DATABASE SMALLINT SORT SORTED SSL STATISTICS STORED STREAMTABLE
STRING STRUCT TABLE TABLES TABLESAMPLE TBLPROPERTIES TEMPORARY TERMINATED TEXTFILE THEN TIM
ESTAMP TINYINT TO TOUCH TRANSFORM TRIGGER TRUE UNARCHIVE UNBOUNDED UNDO UNION UNIONTYPE UNI
QUEJOIN UNLOCK UNSIGNED UPDATE USE USING UTC UTC_TMESTAMP VIEW WHEN WHERE WHILE

6.21.5. New data type settings


If you want t o read a t able t hat includes new dat a t ypes, you are not required t o add t he set
setodps.sql.type.system.odps2=true; flag. However, you must t ake not e of t he following point s:

If t he flag is not added, t he read dat a is implicit ly convert ed int o t he original dat a t ype for all
comput at ions.
If t he flag is not added for int eger const ant s, t he BIGINT t ype is used, and an error message is
ret urned.
If you writ e dat a t o a t able and t he dat a is in passt hrough mode, you can choose not t o add t he new
dat a t ype flag. However, if you want t o calculat e t he dat a, an error is ret urned because t he implicit
dat a t ype conversion is invalid.

> Document Version: 20220928 348


User Guide· MaxComput e Tunnel MaxComput e

7.MaxCompute Tunnel
7.1. Overview
MaxComput e provides t wo t ypes of channels for dat a uploads and downloads:

Dat aHub : T his channel is used t o upload or download dat a in real t ime. It includes t he OGG, Flume,
Logst ash, and Fluent d plug-ins.
T unnel: T his channel is used t o upload or download large amount s of dat a at a t ime. It includes t he
MaxComput e client , Dat aWorks, DT S, Sqoop, Ket t le plug-in, and MaxComput e Migrat ion Assist (MMA).

Dat aHub and T unnel provide t heir own SDKs. T he dat a upload and download t ools derived from t hese
SDKs meet t he requirement s of t he most common scenarios in which dat a is migrat ed t o t he cloud. T he
t ools also enable you t o upload or download dat a in a variet y of ot her scenarios.

Limits
Limit s on T unnel-based dat a uploads:
You cannot run T unnel commands t o upload or download dat a of t he ARRAY, MAP, or ST RUCT
t ypes.
No limit s are specified for t he upload speed. T he upload speed depends on t he net work
bandwidt h and server performance.
T he number of ret ries is limit ed. If t he number of ret ries exceeds t he limit , t he next block is
uploaded. Aft er dat a is uploaded, you can execut e t he select count (*) from t able_name
st at ement t o check whet her any dat a is lost .
By default , a project support s a maximum of 2,000 concurrent T unnel connect ions.
On t he server, t he lifecycle of a session is 24 hours. A session can be shared among processes and
t hreads on t he server, but you must make sure t hat each block ID is unique.
MaxComput e ensures t he validit y of concurrent writ es based on at omicit y, consist ency, isolat ion,
durabilit y (ACID).

Limit s on Dat aHub-based dat a uploads:


T he size of each field cannot exceed it s upper limit .

Not e T he size of a st ring cannot exceed 8 MB.

During an upload, mult iple dat a records are packaged.

Limit s on T ableT unnel SDK int erfaces:


A block ID must be great er t han or equal t o 0 but less t han 20,000. T he size of t he dat a t hat you
want t o upload in a block cannot exceed 100 GB.
T he lifecycle of a session is 24 hours. If you want t o t ransfer large amount s of dat a, more t han 24
hours are required. In t his case, we recommend t hat you t ransfer t he dat a in mult iple sessions.
T he lifecycle of an HT T P request t hat corresponds t o a RecordWrit er is 120 seconds. If no dat a
flows over an HT T P connect ion wit hin 120 seconds, t he server closes t he connect ion.

7.2. Tunnel service connections

349 > Document Version: 20220928


MaxComput e User Guide· MaxComput e Tunnel

Dat aHub and T unnel use different endpoint s in different net work environment s. You must also select
different endpoint s when connect ing t o t he service.

7.3. Selection of cloud data migration


tools
MaxComput e provides a variet y of dat a upload and download t ools, which can be used in different
cloud dat a migrat ion scenarios. T his t opic describes t he select ion of dat a t ransmission t ools in t hree
t ypical scenarios.

Hadoop data migration


You can use Sqoop and Dat aWorks t o migrat e Hadoop dat a.

When you use Dat aWorks, Dat aX is required.


When you use Sqoop, a MapReduce job is execut ed on t he original Hadoop clust er for dist ribut ed
dat a t ransmission t o MaxComput e.

Synchronization of data in a database


T o synchronize dat a from a dat abase t o MaxComput e, you must select a t ool based on t he dat abase
t ype and synchronizat ion policy.

Use Dat aWorks for offline bat ch synchronizat ion. Dat aWorks support s a wide range of dat abase
t ypes, including MySQL, SQL Server, and Post greSQL.
Use t he OGG plug-in for real-t ime synchronizat ion of dat a in an Oracle dat abase.
Use DT S for real-t ime synchronizat ion of dat a in an ApsaraDB for RDS dat abase.

Log collection
You can use t ools such as Flume, Fluent d, and Logst ash t o collect logs.

7.4. Introduction to the tools


MaxComput e support s a wide range of dat a upload and download t ools. T he source code for most of
t he t ools can be found and maint ained on t he open-source communit y Git Hub. You can select t he
appropriat e t ools t o upload and download dat a based on t he applicat ion scenario.

Alibaba Cloud DTplus products


Dat a Int egrat ion of Dat aWorks (T unnel)

Dat a Int egrat ion of Dat aWorks is a st able, efficient , and scalable dat a synchronizat ion plat form
provided by Alibaba Cloud. It is designed t o provide full offline and increment al real-t ime dat a
synchronizat ion, int egrat ion, and exchange services for t he het erogeneous dat a st orage syst ems on
Alibaba Cloud.
Dat a synchronizat ion t asks support t he following dat a source t ypes: MaxComput e, ApsaraDB for RDS
(MySQL, SQL Server, and Post greSQL), Oracle, FT P, Analyt icDB (ADS), OSS, ApsaraDB for Memcache,
and DRDS.

MaxComput e client (T unnel)

Based on t he bat ch dat a t unnel SDK, t he client provides built -in T unnel commands for dat a upload
and download.

> Document Version: 20220928 350


User Guide· MaxComput e Tunnel MaxComput e

DT S (T unnel)

Dat a T ransmission (DT S) is an Alibaba Cloud dat a service t hat support s dat a exchange bet ween
mult iple dat a sources, such as Relat ional Dat abase Management Syst em (RDBMS), NoSQL, and Online
Analyt ical Processing (OLAP) dat abases. It provides dat a t ransmission feat ures, such as dat a
migrat ion, real-t ime dat a subscript ion, and real-t ime dat a synchronizat ion.

DT S support s dat a synchronizat ion from ApsaraDB for RDS and MySQL inst ances t o MaxComput e
t ables. Ot her dat a source t ypes are not support ed.

O pen-source products
T he project s corresponding t o each product are open-sourced. You can visit aliyun-maxcomput e-dat a-
collect ors t o view det ails.

Sqoop (T unnel)

Sqoop 1.4.6 on t he communit y is furt her developed t o provide enhanced MaxComput e support . It
can import dat a from relat ional dat abases such as MySQL and dat a from HDFS or Hive t o
MaxComput e t ables. It can also export dat a from MaxComput e t ables t o relat ional dat abases such
as MySQL.

Ket t le (T unnel)

Ket t le is an open-source ET L t ool t hat is developed in Java. It can run on Windows, Unix, or Linux. It
provides graphic int erfaces for you t o define dat a t ransmission t opology by using drag-and-drop
component s.

Flume (Dat aHub)


Apache Flume is a dist ribut ed and reliable syst em. It collect s large volumes of log dat a from
different dat a sources and t hen aggregat es and st ores t he dat a in a cent ralized dat a st orage.
T he Dat aHub Sink plug-in of Apache Flume allows you t o upload log dat a t o Dat aHub in real t ime
and archive t he dat a in MaxComput e t ables.

Fluent d (Dat aHub)


Fluent d is an open-source soft ware product . It collect s logs, such as applicat ion logs, syst em logs,
and access logs, from various sources. It allows you t o use plug-ins t o filt er log dat a and st ore t he
dat a in different dat a processors, including MySQL, Oracle, MongoDB, Hadoop, and T reasure Dat a.
T he Dat aHub plug-in of Fluent d allows you t o upload log dat a t o Dat aHub in real t ime and archive
t he dat a in MaxComput e t ables.

Logst ash (Dat aHub)


Logst ash is an open-source log collect ion and processing framework. T he logst ash-out put -
dat ahub plug-in allows you t o import dat a t o Dat aHub. T his t ool can be easily configured t o
collect and t ransmit dat a. It can be used t oget her wit h MaxComput e or St reamComput e t o easily
creat e an all-in-one st reaming dat a solut ion from dat a collect ion t o analysis.
T he Dat aHub plug-in of Logst ash allows you t o upload log dat a t o Dat aHub in real t ime and
archive t he dat a in MaxComput e t ables.

OGG (Dat aHub)

T he Dat aHub plug-in of OGG allows you t o increment ally synchronize dat a in t he Oracle dat abase t o
Dat aHub in real t ime and archive t he dat a in MaxComput e t ables.

7.5. Tunnel SDK overview


351 > Document Version: 20220928
MaxComput e User Guide· MaxComput e Tunnel

7.5.1. Overview
Dat a upload and download t ools provided by MaxComput e are compiled based on t he T unnel SDK. T his
t opic describes t he major APIs of t he T unnel SDK.

T he usage of t he SDK varies according t o t he version. For specific informat ion, see SDK Java Doc.

Major APIs

API Description

T ableT unnel An entry class of the MaxCompute T unnel service.

T ableT unnel.Uplo adS


A session that uploads data to a MaxCompute table.
essio n

T ableT unnel.Do w nlo a


A session that downloads data from a MaxCompute table.
dSessio n

Inst anceT unnel An entry class of the MaxCompute T unnel service.

A session that downloads data from a MaxCompute instance. T his session


Inst anceT unnel.Do w nl
applies only to SQL instances that start with the SELECT keyword and are used
o adSessio n
to query data.

Not e T he t unnel endpoint support s aut omat ic rout ing based on t he MaxComput e endpoint
set t ings.

7.5.2. TableTunnel
T his t opic describes t he T ableT unnel API.

Definition
Definit ion:

public class TableTunnel {


public DownloadSession createDownloadSession(String projectName, String tableName);
public DownloadSession createDownloadSession(String projectName, String tableName, Partitio
nSpec partitionSpe c);
public UploadSession createUploadSession(String projectName, String tableName);
public UploadSession createUploadSession(String projectName, String tableName, PartitionSpe
c partitionSpec);
public DownloadSession getDownloadSession(String projectName, String tableName, PartitionSp
ec partitionSpec, String id);
public DownloadSession getDownloadSession(String projectName, String tableName, String id);
public UploadSession getUploadSession(String projectName, String tableName, PartitionSpec p
artitionSpec, String id);
public UploadSession getUploadSession(String projectName, String tableName, String id); pub
lic void setEndpoint(String endpoint);
}

> Document Version: 20220928 352


User Guide· MaxComput e Tunnel MaxComput e

Descript ion:

Lif ecycle : t he durat ion from t he creat ion of t he T ableT unnel inst ance t o t he end of t he program.
T ableT unnel provides a met hod t o creat e UploadSession and DownloadSession object s.
T ableT unnel.UploadSession is used t o upload dat a, and T ableT unnel.DownloadSession is used t o
download dat a.
A session refers t o t he process of uploading or downloading a t able or part it ion. A session consist s
of one or more HT T P request s t o T unnel REST ful APIs.
Upload sessions of T ableT unnel use t he INSERT INT O semant ics. Mult iple upload sessions of t he same
t able or part it ion does not affect each ot her, and t he dat a uploaded in each session is st ored in an
independent direct ory.
In an upload session, each RecordWrit er is mat ched wit h an HT T P request and is ident ified by a unique
block ID. T he block ID is t he name of t he file corresponding t o t he RecordWrit er.
If you use t he same block ID t o enable a RecordWrit er mult iple t imes in t he same session, t he dat a
uploaded by t he RecordWrit er t hat calls t he close() funct ion last will overwrit e all previous dat a. T his
feat ure can be used t o ret ransmit dat a of a block when dat a upload fails.

API implementation process


1. T he RecordWrit er.writ e() funct ion uploads your dat a as files t o a t emporary direct ory.
2. T he RecordWrit er.close() funct ion moves t he files from t he t emporary direct ory t o t he Dat a
direct ory.
3. T he session.commit () funct ion moves each file in t he Dat a direct ory t o t he direct ory where t he
corresponding t able is locat ed and updat es t he t able met adat a. T his way, dat a moved int o a
t able by t he current t ask will be visible t o t he ot her MaxComput e t asks such as SQL and
MapReduce.

API limits
T he value of a block ID must be great er t han or equal t o 0 and less t han 20000. T he size of dat a t o
be uploaded in a block cannot exceed 100 GB.
A session is uniquely ident ified by it s session ID. T he lifecycle of a session is 24 hours. If your session
t imes out due t o t he t ransfer of large volumes of dat a, you must t ransfer your dat a in mult iple
sessions.
T he lifecycle of an HT T P request corresponding t o a RecordWrit er is 120 seconds. If no dat a flows
over an HT T P connect ion wit hin 120 seconds, t he server closes t he connect ion.

Not e HT T P has an 8 KB buffer. When you call t he RecordWrit er.writ e() funct ion, your dat a
may be saved t o t he buffer and no inbound t raffic flows over t he corresponding HT T P
connect ion. In t his case, you can call t he T unnelRecordWrit er.flush() funct ion t o forcibly flush
dat a from t he buffer.

When you use a RecordWrit er t o writ e logs t o MaxComput e, t he RecordWrit er may t ime out due t o
unexpect ed t raffic fluct uat ions. T herefore, we recommend t hat you:
Do not use a RecordWrit er for each dat a record. Ot herwise, a large number of small files are
generat ed, because each RecordWrit er corresponds t o a file. T his affect s t he performance of
MaxComput e.

Do not use a RecordWrit er t o writ e dat a unt il t he size of cached code reaches 64 MB.
T he lifecycle of a RecordReader is 300 seconds.

353 > Document Version: 20220928

You might also like