TPC-DS v2.5.0
TPC-DS v2.5.0
Standard Specification
Version 2.5.0
June, 2017
www.tpc.org
No Warranty
TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, THE INFORMATION CONTAINED
HEREIN IS PROVIDED “AS IS” AND WITH ALL FAULTS, AND THE AUTHORS AND DEVELOPERS
OF THE WORK HEREBY DISCLAIM ALL OTHER WARRANTIES AND CONDITIONS, EITHER
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY (IF ANY) IMPLIED
WARRANTIES, DUTIES OR CONDITIONS OF MERCHANTABILITY, OF FITNESS FOR A
PARTICULAR PURPOSE, OF ACCURACY OR COMPLETENESS OF RESPONSES, OF RESULTS, OF
WORKMANLIKE EFFORT, OF LACK OF VIRUSES, AND OF LACK OF NEGLIGENCE. ALSO, THERE
IS NO WARRANTY OR CONDITION OF TITLE, QUIET ENJOYMENT, QUIET POSSESSION,
CORRESPONDENCE TO DESCRIPTION OR NON-INFRINGEMENT WITH REGARD TO THE WORK.
IN NO EVENT WILL ANY AUTHOR OR DEVELOPER OF THE WORK BE LIABLE TO ANY OTHER
PARTY FOR ANY DAMAGES, INCLUDING BUT NOT LIMITED TO THE COST OF PROCURING
SUBSTITUTE GOODS OR SERVICES, LOST PROFITS, LOSS OF USE, LOSS OF DATA, OR ANY
INCIDENTAL, CONSEQUENTIAL, DIRECT, INDIRECT, OR SPECIAL DAMAGES WHETHER UNDER
CONTRACT, TORT, WARRANTY, OR OTHERWISE, ARISING IN ANY WAY OUT OF THIS OR ANY
OTHER AGREEMENT RELATING TO THE WORK, WHETHER OR NOT SUCH AUTHOR OR
DEVELOPER HAD ADVANCE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES.
Trademarks
TPC Benchmark, TPC-DS and QphDS are trademarks of the Transaction Processing Performance Council.
Full Members
Associate Members
The TPC Benchmark™DS (TPC-DS) is a decision support benchmark that models several generally applicable
aspects of a decision support system, including queries and data maintenance. The benchmark provides a
representative evaluation of the System Under Test’s (SUT) performance as a general purpose decision support
system.
This benchmark illustrates decision support systems that:
Data generator dsdgen Used to generate the data sets for the Clause 3.4
benchmark
Query generator dsqgen Used to generate the query sets for the Clause 4.1.2
benchmark
Answer Sets answer_sets/ Used to verify the initial population of Clause 7.3
the data warehouse.
Reference Data run dsdgen with – Set of files for each scale factor to
Set validate flag compare the correct data generation of
base data, refresh data and dsqgen data
0.5.1 The rules for pricing are included in the current revision of the TPC Pricing Specification located on the TPC
website (http://www.tpc.org).
Comment: There is a non-binding How_To_Guide.doc guide electronically available. The purpose of this
guide is to describe the most common tasks necessary to implement a TPC-DS benchmark. The target audience
is individuals who want to install, populate, run and analyze the database, queries and data maintenance
workloads for TPC-DS.
Record customer purchases (and track customer returns) from any sales channel
Modify prices according to promotions
Maintain warehouse inventory
Create dynamic web pages
Maintain customer profiles (Customer Relationship Management)
TPC-DS does not benchmark the operational systems. It is assumed that the channel sub-systems were designed
at different times by diverse groups having dissimilar functional requirements. It is also recognized that they
may be operating on significantly different hardware configurations, software configurations and data model
semantics. All three channel sub-systems are autonomous and retain possibly redundant information regarding
customers, addresses, etc. For more information in the benchmarking of operational system, please see the TPC
website (http://www.tpc.org).
TPC-DS’ modeling of the business environment falls into three broad categories:
Static: The contents of the dimension are loaded once during database load and do not change over time.
The date dimension is an example of a static dimension.
Historical: The history of the changes made to the dimension data is maintained by creating multiple rows
for a single business key value. Each row includes columns indicating the time period for which the row is
valid. The fact tables are linked to the dimension values that were active at the time the fact was recorded,
thus maintaining “historical truth”. Item is an example of a historical dimension.
Non-Historical: The history of the changes made to the dimension data is not maintained. As dimension
rows are updated, the previous values are overwritten and this information is lost. All fact data is
associated with the most current value of the dimension. Customer is an example of a Non-Historical
dimension.
1.3.4 To achieve the optimal compromise between performance and operational consistency, the system administrator
can set, once and for all, the locking levels and the concurrent scheduling rules for queries and data maintenance
functions.
1.3.5 The size of a DSS system – more precisely the size of the data captured in a DSS system – may vary from
company to company and within the same company based on different time frames. Therefore, the TPC-DS
benchmark will model several different sizes of the DSS (a.k.a. benchmark scaling or scale factor).
Reporting queries
Ad hoc queries
Iterative OLAP queries
Data mining queries
TPC-DS provides a wide variety of queries in the benchmark to emulate these diverse query classes.
1.4.1.1 Reporting Queries
These queries capture the “reporting” nature of a DSS system. They include queries that are executed
periodically to answer well-known, pre-defined questions about the financial and operational health of a
business. Although reporting queries tend to be static, minor changes are common. From one use of a given
reporting query to the next, a user might choose to shift focus by varying a date range, geographic location or a
brand name.
1.4.1.2 Ad hoc Queries
These queries capture the dynamic nature of a DSS system in which impromptu queries are constructed to
answer immediate and specific business questions. The central difference between ad hoc queries and reporting
queries is the limited degree of foreknowledge that is available to the System Administrator (SysAdmin) when
planning for an ad hoc query.
1.4.1.3 Iterative OLAP Queries
OLAP queries allow for the exploration and analysis of business data to discover new and meaningful
relationships and trends. While this class of queries is similar to the “Ad hoc Queries” class, it is distinguished
by a scenario-based user session in which a sequence of queries is submitted. Such a sequence may include both
complex and simple queries.
1.4.1.4 Data Mining Queries
Data mining is the process of sifting through large amounts of data to produce data content relationships. It can
predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This
class of queries typically consists of joins and large aggregations that return large data result sets for possible
extraction.
Data Extraction: This phase consists of the accurate extraction of pertinent data from production OLTP
databases and other relevant data sources. In a production environment, the extraction step may include
numerous separate extract operations executed against multiple OLTP databases and auxiliary data sources.
While selection and tuning of the associated systems and procedures is important to the success of the
production system, it is separate from the purchase and configuration of the decision support servers.
Accordingly, the data extract step of the ETL process (E) is not modeled in the benchmark. The TPC-DS
data maintenance process starts from generated flat files that are assumed to be the output of this external
Extraction process.
Data Transformation: This is when the extracted data is cleansed and massaged into a common format
suitable for assimilation by the decision support database.
Data Load: This is the actual insertion, modification and deletion of data within the decision support
database.
Taken together, the progression of Extraction, Transformation and Load is more commonly known by its
acronym, ETL. In TPC-DS, the modeling of Transformation and Load is known as Data Maintenance (DM) or
Data Refresh. In this specification the two terms are used interchangeably.
The DM process of TPC-DS includes the following tasks that result from such a complex business environment
as shown in Figure 1-2:
i) Load the refresh data set, which consists of new, deleted and changed data destined for the data warehouse in
its operational format.
ii) Load refresh data set into the data warehouse applying data transformations, e.g.:
Data denormalization (3rd Normal form to snowflake). During this step the source tables are mapped
into the data warehouse by:
Direct source to target mapping. This type of mapping is the most common. It applies to tables in
the data warehouse that have an equivalent table in the operational schema.
Multiple data warehouse source tables are joined and the result is mapped to one target table. This
mapping translates the third normal form of the operational schema into the de-normalized form of
the data warehouse.
One source table is mapped to multiple target tables. This mapping is the least common. It occurs
if, for efficiency reason, the schema of the operational system is less normalized than the data
warehouse schema.
Syntactically cleanse data
De-normalize
iii) Insert new fact records and delete fact records by date.
The structure and relationships between the flat files is provided in form of a table description and the ddl of the
tables that represent the hypothetical operational database in Appendix A.
A pair of fact tables focused on the product sales and returns for each of the three channels
A single fact table that models inventory for the catalog and internet sales channels.
In addition, the schema includes 17 dimension tables that are associated with all sales channels. The following
clauses specify the logical design of each table:
The name of the table, along with its abbreviation (listed parenthetically)
A logical diagram of each fact table and its related dimension tables
The high-level definitions for each table and its relationship to other tables, using the format defined in
Clause 2.2
The scaling and cardinality information for each column
2.5.3.8.1 A logical table space is a named collection of physical storage devices referenced as a single, logically
contiguous, non-divisible entity.
2.5.3.8.2 The DDL may include syntax that directs a table in its entirety to be stored in a particular logical table space.
2.5.3.8.3 Horizontal partitioning of base tables or EADS is allowed. If the partitioning is a function of data in the table
or auxiliary data structure, the assignment shall be based on the values in the partitioning column(s). Only
primary keys, foreign keys, date columns and date surrogate keys may be used as partitioning columns. If
partitioning DDL uses directives that specify explicit partition values for the partitioning columns, they shall
satisfy the following conditions:
They may not rely on any knowledge of the data stored in the partitioning column(s) except the minimum
and maximum values for those columns, and the definition of data types for those columns provided in
Clause 2.
Within the limitations of integer division, they shall define each partition to accept an equal portion of the
range between the minimum and maximum values of the partitioning column(s).
For date-based partitions, it is permissible to partition into equally sized domains based upon an integer
granularity of days, weeks, months, or years; all using the Gregorian calendar (e.g., 30 days, 4 weeks, 1
month, 1 year, etc.). For date-based partition granularities other than days, a partition boundary may extend
beyond the minimum or maximum boundaries as established in that table’s data characteristics as defined
in Clause 3.4
The directives shall allow the insertion of values of the partitioning column(s) outside the range covered by
the minimum and maximum values, as required by Clause 1.5.
If any directives or DDL are used to horizontally partition data, the directives, DDL, and other details necessary
to replicate the partitioning behavior shall be disclosed.
Multi-level partitioning of base tables or auxiliary data structures is allowed only if each level of partitioning
satisfies the conditions stated above.
2.5.3.8.4 Vertical partitioning of base tables or EADS is allowed when meeting all of the following requirements:
SQL DDL that explicitly partitions data vertically is prohibited.
SQL DDL must not contain partitioning directives which influence the physical placement of data on
durable media.
2.5.4.2 If foreign key constraints are defined and enforced, there is no specific requirement for a particular
delete/update action when enforcing a constraint (e.g., ANSI SQL RESTRICT, CASCADE, NO ACTION, are
all acceptable).
3.1.4 Test sponsors may choose any scale factor from the defined series. No other scale factors may be used for a
TPC-DS result.
3.1.5 Results at the different scale factors are not comparable, due to the substantially different computational
challenges found at different data volumes.
call_center
305 6 42 48 54 60 60
catalog_page
139 11,718 30,000 36,000 40,000 46,000 50,000
catalog_returns
166 144,067 143,996,756 432,018,033 1,440,033,112 4,319,925,093 14,400,175,879
catalog_sales
226 1,441,548 1,439,980,416 4,320,078,880 14,399,964,710 43,200,404,822 143,999,334,399
customer
132 100,000 12,000,000 30,000,000 65,000,000 80,000,000 100,000,000
customer_address
110 50,000 6,000,000 15,000,000 32,500,000 40,000,000 50,000,000
customer_
demographics 42 1,920,800 1,920,800 1,920,800 1,920,800 1,920,800 1,920,800
date_dim
141 73,049 73,049 73,049 73,049 73,049 73,049
household_
demographics 21 7,200 7,200 7,200 7,200 7,200 7,200
income_band
16 20 20 20 20 20 20
inventory
16 11,745,000 783,000,000 1,033,560,000 1,311,525,000 1,627,857,000 1,965,337,830
item
281 18,000 300,000 360,000 402,000 462,000 502,000
promotions
124 300 1,500 1,800 2,000 2,300 2,500
reason
38 35 65 67 70 72 75
ship_mode
56 20 20 20 20 20 20
store
263 12 1,002 1,350 1,500 1,704 1,902
store_returns
134 287,514 287,999,764 863,989,652 2,879,970,104 8,639,952,111 28,800,018,820
store_sales
164 2,880,404 2,879,987,999 8,639,936,081 28,799,983,563 86,399,341,874 287,997,818,084
time_dim
59 86,400 86,400 86,400 86,400 86,400 86,400
warehouse
117 5 20 22 25 27 30
web_page
96 60 3,000 3,600 4,002 4,602 5,004
web_returns
162 71,763 71,997,522 216,003,761 720,020,485 2,160,007,345 7,199,904,459
web_sales
226 719,384 720,000,376 2,159,968,881 7,199,963,324 21,600,036,511 71,999,670,164
web_site
292 30 54 66 78 84 96
1. Table names - The table and view names found in the CREATE TABLE, CREATE VIEW, DROP
VIEW and FROM clause of each query may be modified to reflect the customary naming conventions
of the system under test.
2. Tablespace references - CREATE TABLE statements may be augmented with a tablespace reference
conforming to the requirements of Clause 3.
3. WITH() clause - Queries using the "with()" syntax, also known as common table sub-expressions, can
be replaced with semantically equivalent derived tables or views.
b) Joins:
1. Outer Join - For outer join queries, vendor specific syntax may be used instead of the specified syntax.
For example, the join expression "CUSTOMER LEFT OUTER JOIN ORDERS ON C_CUSTKEY =
O_CUSTKEY"• may be replaced by adding CUSTOMER and ORDERS to the from clause and adding
a specially-marked join predicate (e.g., C_CUSTKEY *= O_CUSTKEY).
2. Inner Join - For inner join queries, vendor specific syntax may be used instead of the specified syntax.
For example, the join expression "FROM CUSTOMER, ORDERS WHERE C_CUSTKEY =
O_CUSTKEY" may be modified to use a JOIN clause such as "FROM CUSTOMER JOIN ORDERS
ON C_CUSTKEY = O_CUSTKEY".
c) Operators:
1. Select-list expression aliases - For queries that include the definition of an alias for a SELECT-list item
(e.g., "AS" clause), vendor-specific syntax may be used instead of the specified syntax. Examples of
acceptable implementations include "TITLE <string>", or "WITH HEADING <string>". Use of a
select-list expression alias is optional.
2. GROUP BY and ORDER BY - For queries that utilize a view, nested table-expression, or select-list
alias solely for the purposes of grouping or ordering on an expression, vendors may replace the view,
nested table-expression or select-list alias with a vendor-specific SQL extension to the GROUP BY or
ORDER BY clause. Examples of acceptable implementations include "GROUP BY <ordinal>",
"GROUP BY <expression>", "ORDER BY <ordinal>", and "ORDER BY <expression>".
3. Correlation names - Table-name aliases may be added to the executable query text. The keyword "AS"
before the table-name alias may be omitted.
4. Nested table-expression aliasing - For queries involving nested table-expressions, the nested keyword
"AS" before the table alias may be omitted.
5. Column alias - column name alias may be added for columns in any SELECT list of an executable
query text. These column aliases may be used to refer to the column in later portions of the query, such
as GROUP BY or ORDER BY clauses.
f) Expressions and functions:
1. Date expressions - For queries that include an expression involving manipulation of dates (e.g.,
adding/subtracting days/months/years, or extracting years from dates), vendor-specific syntax may be
used instead of the specified syntax. Examples of acceptable implementations include
"YEAR(<column>)" to extract the year from a date column or "DATE(<date>) + 3 MONTHS" to add 3
months to a date.
2. Output formatting functions - Scalar functions whose sole purpose is to affect output formatting (such
as treatment of null strings) or intermediate arithmetic result precision (such as COALESCE or CAST)
may be applied to items in the outermost SELECT list of the query.
3. Aggregate functions - At large scale factors, the aggregates may exceed the range of the values
supported by an integer. The aggregate functions AVG and COUNT may be replaced with equivalent
vendor-specific functions to handle the expanded range of values (e.g., AVG_BIG and COUNT_BIG).
4. Substring Scalar Functions - For queries which use the SUBSTRING() scalar function, vendor-specific
syntax may be used instead of the specified syntax. For example, "SUBSTRING(S_ZIP, 1, 5)".
Vendor-specific SQL syntax may be added to the SELECT statement of a query template to redirect the
rows returned to a file. For example, “Unload to file ‘outputfile’ Select c1, c2 …”
Vendor-specific control statements supported by a test sponsor’s interactive SQL interface may be used. For
example,
set output_file = ‘outputfile’
select c1, c2…;
unset output_file;
Control statements recognized by the implementation specific layer (see Clause 8.2.4) and used to invoke an
extraction tool or method.
4.2.5.2 If one of these alternative extract options is used, the output shall be formatted as delimited or fixed-width
ASCII text.
4.2.5.3 If one of these alternative extract options is used, they must meet the following conditions:
A test sponsor may select only one of the options in 4.2.5.1. That method must be used consistently for all the
queries that are eligible as extract queries.
If the extraction syntax modifies the query SQL, in all other respects the query must satisfy the
requirements of Clause 4.1.2. The syntax added must deal solely with the extraction tool or method, and
must not make any additional explicit reference, for example, to tables, indices, or access paths.
The test sponsor must demonstrate that the file names used, and the extract facility itself, does not provide
hints or optimizations in the DBMS such that the query has additional performance gains beyond any
benefits from accelerating the extraction of rows.
The tool or method used must meet all ACID requirements for the queries used in combination with the tool or
method.
4.2.6 Query Variants
4.2.6.1 A Query Variant is an alternate query template, which has been created to allow a vendor to overcome specific
functional barriers or product deficiencies that could not be address by minor query modifications.
4.2.6.2 Approval of any new query variant is required prior to using such variant to produce compliant TPC-DS results.
The approval process is defined Clause 4.2.7.
Flat File Name Approximate Size at SF=11 Source Schema Table Name
Bytes Number of rows
s_catalog_order.dat 116505 682 s_catalog_order
s_catalog_order_lineitem.dat 592735 6138 s_catalog_order_lineitem
s_catalog_returns.dat 112182 578 s_catalog_returns
s_inventory.dat 26764259 540000 s_inventory
s_purchase.dat 142552 1022 s_purchase
s_purchase_lineitem.dat 1312480 12264 s_purchase_lineitem
s_store_returns.dat 159306 1235 s_store_returns
s_web_order.dat 43458 256 s_web_order
s_web_order_lineitem.dat 324160 3072 s_web_order_lineitem
s_web_returns.dat 42165 295 s_web_returns
inventory_delete 66 3 inventory_delete
delete 66 3 delete
1 The number of rows are correct to within 0.001%. However, the number of bytes can vary from refresh set to
refresh set due to NULL values.
5.2.2 The number of rows present in each refresh set at scale factor 1 for each of the flat files is summarized in Table
5-1.
5.2.3 The refresh data set of each data maintenance function must be generated using dsdgen. The execution of
dsdgen is not timed. The output of dsdgen is a text file. The storage to hold the refresh data sets must be part
of the priced configuration.
5.2.4 The refresh data set produced by dsdgen can be modified in the following way: The output file for each table of
the refresh data set can be split into n files where each file contains approximately 1/n of the total number of
rows of the original output file. The order of the rows in the original output file must be preserved, such that the
concatenation of all n files is identical to the original file.
5.2.5 Reading the refresh data is a timed part of the data maintenance process. The data set for a specific refresh run
must be loaded and timed as part of the execution of the refresh run. The loading of data must be performed via
generic processes inherent to the data processing system or by the loader utility the database software provides
and supports for general data loading. It is explicitly prohibited to use a loader tool that has been specifically
developed for TPC-DS.
Data Data Maintenance Function Type of Operation View Name Target Table Source Schema Table
Maintenance
Function ID
1 LF_CR(Clause 5.3.11.6) Method 1 crv catalog_returns s_catalog_returns
2 LF_CS(Clause 5.3.11.5) Method 1 csv catalog_sales s_catalog_sales
3 LF_I(Clause 5.3.11.7) Method 1 iv inventory s_inventory
4 LF_SR(Clause 5.3.11.2) Method 1 srv store_returns s_store_returns
5 LF_SS(Clause 5.3.11.1) Method 1 ssv store_sales s_purchase_lineitem
6 LF_WR(Clause 5.3.11.4) Method 1 wrv web_returns s_web_returns
7 LF_WS(Clause 5.3.11.3) Method 1 wsv web_sales s_web_order_lineitem
8 DF_CS(Clause 5.3.11.10) Method 2 - catalog_sales [S], catalog_returns [R] -
9 DF_SS(Clause 5.3.11.9) Method 2 - store_sales [S], store_returns [R] -
10 DF_WS(Clause 5.3.11.11) Method 2 - web_sales [S], web_returns [R] -
11 DF_I(Clause 5.3.11.12) Method 3 - Inventory [I] -
5.3.3 Data maintenance function method 1 reads rows from a view V (see column View Name of table in Clause
5.3.2) and insert rows into a data warehouse table T. Both V and T are defined as part of the data maintenance
function. T is created as part of the initial load of the data warehouse. V is a logical table that does not need to
be instantiated.
5.3.4 The primary key of V is defined in the data maintenance function. Each data maintenance function contains a
table with column mapping between its view V and its data warehouse table T. The primary key of V is
denoted in bold letters on the left side of this mapping table (e.g. Table 5-5).
5.3.5 Business keys are the primary keys from the source schema. Business keys are denoted in bold letters on the
right side of the mapping table for the data maintenance function (e.g. Table 5-5).
5.3.6 Generating a new primary key value for a dimension table is defined as generating the next largest value in the
dense sequence of the table’s primary key values. That is, assuming the largest current primary key value is x
then the next value is x+1.
5.3.7 Method 1: Fact Table Load
for every row v in view V corresponding to fact table F
get row v into local variable lv
for every type 1 business key column bkc in v
get row d from dimension table D corresponding to bkc
where the business keys of v and d are equal
update bkc of lv with surrogate key of d
end for
for every type 2 business key column bkc in v
get row d from dimension table D corresponding to bkc
where the business keys of v and d are equal and
end for
5.3.11.2 LF_SR
5.3.11.3 LF_WS
5.3.11.4 LF_WR
5.3.11.5 LF_CS
5.3.11.6 LF_CR
5.3.11.7 LF_I:
5.3.11.8
CREATE view iv AS
SELECT d_date_sk inv_date_sk,
i_item_sk inv_item_sk,
w_warehouse_sk inv_warehouse_sk,
invn_qty_on_hand inv_quantity_on_hand
FROM s_inventory
LEFT OUTER JOIN warehouse ON (invn_warehouse_id=w_warehouse_id)
LEFT OUTER JOIN item ON (invn_item_id=i_item_id AND i_rec_end_date IS NULL)
LEFT OUTER JOIN date_dim ON (d_date=invn_date);
5.3.11.9 DF_SS:
S=store_sales
R=store_returns
Date1 as generated by dsdgen
Date2 as generated by dsdgen
5.3.11.10 DF_CS:
S=catalog_sales
R=catalog_returns
Date1 as generated by dsdgen
Date2 as generated by dsdgen
5.3.11.11 DF_WS:
S=web_sales
R=web_returns
Date1 as generated by dsdgen
Date2 as generated by dsdgen
5.3.11.12 DF_I:
I=Inventory
Date1 as generated by dsdgen
Date2 as generated by dsdgen
7.1.18 Database Location is the location of loaded data that is directly accessible (read/write) by the test
database to query or apply dml operations on the TPC-DS tables defined in Clause 2 as required by
Load test, Power test, Throughput test, Data maintenance test and all tests required by the auditor.
The interaction between the driver and the SUT shall not have the purpose of indicating to the SUT or any
of its components an execution strategy or priority that is time-dependent or query-specific;
The interaction between the driver and the SUT shall not have the purpose of indicating to the SUT, or to
any of its components, the insertion of time delays;
The driver shall not insert time delays before, after, or between the submission of queries to the SUT;
The interaction between the driver and the SUT shall not have the purpose of modifying the behavior or
configuration of the SUT (i.e., data processing system or operating system settings) on a query-by-query
basis. These parameters shall not be altered during the execution of the performance test.
Comment: One intent of this clause is to prohibit the pacing of query submission by the driver.
7.2.8 Environmental Assumptions
7.2.8.1 The configuration and initialization of the SUT, the database, or the session, including any relevant parameter,
switch or option settings, shall be based only on externally documented capabilities of the system that can be
reasonably interpreted as useful for a decision support workload. This workload is characterized by:
Sequential scans of large amounts of data;
Aggregation of large amounts of data;
Multi-table joins;
Possibly extensive sorting.
7.2.8.2 While the configuration and initialization can reflect the general nature of this expected workload, it shall not
take special advantage of the limited functions actually exercised by the benchmark. The queries actually
chosen in the benchmark are merely examples of the types of queries that might be used in such an
environment, not necessarily actual user queries. Due to this limit in the scope of the queries and test
environment, TPC-DS has chosen to restrict the use of some database technologies (see Clause 2.5). In general,
the effect of the configuration on benchmark performance should be representative of its expected effect on the
performance of the class of applications modeled by the benchmark.
7.4.3.7.1 The elapsed time to prepare the Test Database for the execution of the performance test is called the Database
Load Time (TLOAD), and must be disclosed. It includes all of the elapsed time to create the tables defined in
Clause 2.1, load data, create and populate EADS, define and validate constraints, gather statistics for the test
database, configure the system under test to execute the performance test, and to ensure that the test database
meets the data accessibility requirements including syncing loaded data on RAID devices and the taking of a
backup of the data processing system, when necessary.
7.4.3.8 The Database Load Time, known as TLOAD is the difference between Load Start Time and Load End Time.
7.4.3.8.1 There are five classes of operations which may be excluded from database load time:
a) Any operation that does not affect the state of the data processing system (e.g., data generation into flat
files, relocation of flat files to the SUT, permutation of data in flat files, operating-system-level disk
partitioning or configuration);
b) Any modification to the state of the data processing system that is not specific to the TPC-DS workload
(e.g. logical tablespace creation or database block formatting);
c) The time required to install or remove physical resources (e.g. CPU, memory or disk) on the SUT that are
not priced;
d) An optional backup of the test database performed at the test sponsor’s discretion. However, if a backup is
required to ensure that the data accessibility properties can be met, it must be included in the load time;
e) Operations that create RAID devices.
f) Tests required to fulfill data validation test (see Clause 3.5)
7.4.3.8.2 There cannot be any manual intervention during the Database Load.
7.4.3.8.3 The SUT or any component of it must not be restarted after the start of the Load Test and before the start of the
Performance Test.
Comment: The intent of this Clause is that when the timing ends the system under test be capable of
executing the Performance Test without any further change. The database load may be decomposed into several
phases. Database load time is the sum of the elapsed times of all phases during which activity other than that
detailed in Clause 7.4.3.8.1 occurred on the SUT.
7.4.6.1 The Throughput Tests measure the ability of the system to process the most queries in the least
amount of time with multiple users.
7.4.6.2 Throughput Test 1 immediately follows the Power Test. The sequencing of Throughput Tests and Data
Maintenance Tests is as follows:
Throughput Test 1 followed by Data Maintenance Test1 followed by Throughput Test 2 followed by
Data Maintenance Test 2.
7.4.6.3 Any explicitly created aggregates, as defined in Clause 5.1.4, present and enabled during any portion
of Throughput Test 1or 2 must be present and enabled at all times that queries are being processed.
7.4.6.4 Each query stream contains a distinct permutation of the query templates defined for TPC-DS. The
permutation of queries for the first 20 query streams is shown in 11.9.1.1Appendix D:.
7.4.6.5 Only one query shall be active on any of the sessions at any point of time during a Throughput Test.
7.4.6.6 The Throughput Test shall execute queries submitted by the driver through a sponsor-selected number of query
streams (Sq). There must be one session per query stream on the SUT and each stream must execute queries
serially (i.e. one after another).
7.4.6.7 Each query stream is uniquely identified by a stream identification number s ranging from 1 to S, where S is
the number of query streams in the Throughput Tests (Throughput Test 1 plus Throughput Test 2).
7.4.6.8 Once a stream identification number has been generated and assigned to a given query stream, the same number
must be used for that query stream for the duration of the test.
7.4.6.9 The value of Sq is any even number larger than or equal to 4.
7.4.6.10 The same value of Sq shall be used for bothThroughput Tests, and shall remain constant throughout each
Throughput Test.
7.4.6.11 The queries in each query stream shall be executed in the order assigned to the stream identification number and
defined in 11.9.1.1Appendix D:.
7.4.7 Throughput Test Timing
7.4.7.1 For a given query template t, used to produce the ith query within query stream s, the query elapsed time,
QD(s, i, t), is the difference between:
The timestamp when the first character of the executable query text is submitted to the SUT by the driver;
The timestamp when the last character of the output is returned from the SUT to the driver and a success
message is sent to the driver.
Comment: All the operations that are part of the execution of a query (e.g., creation and deletion of a
temporary table or a view) must be included in the elapsed time of that query.
7.4.7.2 The elapsed time of each query in each stream shall be disclosed for each Throughput Test and Power Test.
7.4.7.3 The elapsed time of Throughput Test 1, known as TTT1 is the difference between Throughput Test 1 Start Time
and Throughput Test 1 End Time.
7.4.7.4 Throughput Test 1 Start Time, which is the timestamp that must be taken before the first character of the
executable query text of the first query stream of Throughput Test 1 is submitted to the SUT by the driver.
7.4.7.5 Throughput Test 1 End Time, which is the timestamp that must be taken after the last character of output data
from the last query of the last query stream of Throughput Test 1 is received by the driver from the SUT.
7.6 Metrics
7.6.1 TPC-DS defines three primary metrics:
a) A Performance Metric, QphDS@SF, reflecting the TPC-DS query throughput (see Clause 7.6.3);
b) A Price-Performance metric, $/QphDS@SF (see Clause 7.6.4);
c) System availability date (see Clause 7.6.5).
Where:
SF is defined in Clause 3.1.3, and is based on the scale factor used in the benchmark
Q is the total number of weighted queries: Q=Sq*99, with Sq being the number of streams executed in a
Throughput Test
TPT=TPower*Sq, where TPower is the total elapsed time to complete the Power Test, as defined in Clause 7.4.4,
and Sq is the number of streams executed in a Throughput Test
TTT= TTT1+TTT2, where TTT1 is the total elapsed time of Throughput Test 1 and TTT2 is the total elapsed time
of Throughput Test 2, as defined in Clause 7.4.6.
TDM= TDM1+TDM2, where TDM1 is the total elapsed time of Data Maintenance Test 1 and TDM2 is the total
elapsed time of Data Maintenance Test 2, as defined in Clause 7.4.9.
TLD is the load factor computed as T LD=0.01*Sq*TLoad, where Sq is the number of streams executed in a
Throughput Test and TLoad is the time to finish the load, as defined in Clause 7.1.2.
TPT, TTT, TDM and TLD quantities are in units of decimal hours with a resolution of at least 1/3600 th of an
hour (i.e., 1 second)
7.6.3.2
Comment: The floor symbol ( ) in the above equation truncates any fractional part.
7.6.4 The Price Performance Metric ($/QphDS@SF)
7.6.4.1 The price-performance metric for the benchmark is defined as:
The size of the test database, expressed separately or as part of the metric's names (e.g., QphDS@10GB);
The TPC-DS Performance Metric, QphDS@Size;
The TPC-DS Price/Performance metric, $/QphDS@Size;
The Availability Date of the complete configuration (see TPC Pricing Specification located on the TPC
website (http://www.tpc.org).
Following are two examples of compliant reporting of TPC-DS results:
Example 1: At 10GB the RALF/3000 Server has a TPC-DS Query-per-Hour metric of 3010 when run against a
10GB database yielding a TPC-DS Price/Performance of $1,202 per query-per-hour and will be available 1-
Apr-06.
Example 2: The RALF/3000 Server, which will start shipping on 1-Apr-06, is rated 3,010 QphDS@10GB and
1202 $/QphDS@10GB.
Host Systems
*
*
Query
DRIVER
Execution *
Network
&
Database
Access
Client(s) Server(s)
* *
* *
DRIVER
Query Network
*
Network
Execution Database
Access
DRIVER
Commercially Available
Products
(e.g., OS, DBMS, ISQL)
SUT
8.2.4 If present on the SUT, an implementation-specific layer, shall be minimal and general purpose (i.e., not limited
to the TPC-DS queries). The source code shall be disclosed. The functions performed by an implementation
specific layer shall be strictly limited to the following:
a) Database transaction control operations before and after each query execution
b) Cursor control and manipulation operations around the executable query text
c) Definition of procedures and data structures required to process dynamic SQL, including the
communication of the executable query text to the commercially available layers of the SUT and the
reception of the query output data
d) Communication with the commercially available layers of the SUT
e) Buffering of the query output data
f) Communication with the drivere it
The following are examples of functions that the implementation-specific layer shall not perform:
a) Any modification of the executable query text;
b) Any use of stored procedures to execute the queries;
c) Any sorting or translation of the query output data;
d) Any function prohibited by the requirements of Clause 7.2.8.1.
Driver SUT
Pricing Boundary
Durable Medium
Cluster of 4 Systems
16 x I486DX 6 Units
1 GB of memory
16 x SCSI-2 16
Channels
1 Ethernet
adapter
6 Units
10.3.4.1 The query language used to implement the queries must be identified (e.g., "RALF/SQL-Plus").
10.3.4.2 The method of verification for the random number generation must be described unless the supplied dsdgen
and dsqgen were used.
10.3.4.3 The method used to generate values for substitution parameters must be disclosed. The version number (i.e., the
major revision number, the minor revision number, and third tier number) of dsqgen must be disclosed..
10.3.4.4 The executable query text used for query validation must be disclosed along with the corresponding output data
generated during the execution of the query text against the qualification database. If minor modifications have
been applied to any functional query definitions or approved variants in order to obtain executable query text,
these modifications must be disclosed and justified. The justification for a particular minor query modification
can apply collectively to all queries for which it has been used. The output data for the power and Throughput
Tests must be made available electronically upon request.
Comment: For query output of more than 10 rows, only the first 10 need to be disclosed in the FDR. The
remaining rows must be made available upon request.
10.3.4.5 All the query substitution parameters used during the performance test must be disclosed in tabular format,
along with the seeds used to generate these parameters.
10.3.4.6 All query and refresh session initialization parameters, settings and commands must be disclosed (see Clauses
7.2.2 through 7.2.7).
10.3.4.7 The details of how the data maintenance functions were implemented must be disclosed (including source code
of any non-commercial program used).
10.3.4.8 Any object created in the staging area (see Clause 5.1.8 for definition and usage restrictions) used to implement
the data maintenance functions must be disclosed. Also, any disk storage used for the staging area must be
priced, and any mapping or virtualization of disk storage must be disclosed.
10.3.5 Clause 6– Data Persistence Properties Related Items
10.3.5.1 The results of the data accessibility tests must be disclosed along with a description of how the data accessibility
requirements were met. This includes disclosure of the code written to implement the data accessibility Query.
10.3.8.1 A detailed list of hardware and software used in the priced system must be reported. The rules for pricing are
included in the current revision of the TPC Pricing Specification located on the TPC website
(http://www.tpc.org).
10.3.8.2 The System Availability Date (see Clause 7.6.5) must be the single availability date reported on the first page of
the executive summary. The full disclosure report must report Availability Dates individually for at least each
of the categories for which a pricing subtotal must be. All Availability Dates required to be reported must be
disclosed to a precision of 1 day, but the precise format is left to the test sponsor.
Comment: A test sponsor may disclose additional detail on the availability of the system’s components in the
Notes section of the Executive Summary and may add a footnote reference to the System Availability Date.
10.3.8.3 Additional Clause 7 related items may be included in the full disclosure report for each country specific priced
configuration.
10.3.9 Clause 11 - Audit Related Items
10.3.9.1 The auditor's agency name, address, phone number, and attestation letter with a brief audit summary report
indicating compliance must be included in the full disclosure report. A statement should be included specifying
whom to contact in order to obtain further information regarding the audit process.
Total number of nodes used/total number of processors used with their types and speeds in GHz/ total
number of cores used/total number of threads used;
Main and cache memory sizes;
Network and I/O connectivity;
Disk quantity and geometry.
If the implementation used a two-tier architecture, front-end and back-end systems must be detailed separately.
10.4.2.1 The first section contains the results that were obtained from the reported runs of the Performance test.
Title Quantity Precisio Units Font
Data Processing System Brand, Software Version of Data 9-12 pt. Times
Processing System used
Operating System Brand, Software Version of OS used 9-12 pt. Times
Other Software Brand, Software Version of other software 9-12 pt. Times
components
System Availability Date System Availability Date 1 day 9-12 pt. Times
Clustered Or Not Yes/No 9-12 pt. Times
Comment: The Software Version must uniquely identify the orderable software product referenced in the
Priced Configuration (e.g., RALF/2000 4.2.1)
10.4.2.2 The middle portion of the page must contain two diagrams, which must be of equal size and fill out the width of
the entire space. The left diagram shows the benchmarked configuration and the right diagram shows a pie
chart with the percentages of the total time and the total times for the Load Test, Throughput Test 1 and
Throughput Test 2.
10.4.2.3 This section contains the database load and RAID information
Title Quantity Precision Units Font
RAID None / Base tables only / N/A N/A 9-12 pt. Times
Explicit Auxiliary Data Structures /
Everything
10.4.2.4 The next section of the Implementation Overview shall contain a synopsis of the SUT’s major components,
including:
Node and/or processor count and speed in GHz;
Main and cache memory sizes;
Network and IO connectivity;
Disk quantity and geometry
Total mass storage in the priced system.
If the implementation used a two-tier architecture, front-end and back-end systems should be defined
separately.
10.4.2.5 The final section of the Implementation Overview shall contain a note stating:
“Database Size includes only raw data (i.e., no temp, index, redundant storage space, etc.).”
This clause defines the audit requirements for TPC-DS. The auditor needs to ensure that the benchmark under
audit complies with the TPC-DS specification. Rules for auditing Pricing information are included in the TPC
Pricing Specification located at www.tpc.org. When the TPC-Energy optional reporting is selected by the test
sponsor, the rules for auditing of TPC-Energy related items are included in the TPC Energy Specification
located at www.tpc.org.
[This auditor clause states a requirement that does not appear to be stated before (that no ADS can be created
during the test). If such a requirement exists it should be stated in clause 2.]
11.2.4.3 Clause 3 Related Items
11.2.4.4 Verify that the qualification database is properly scaled and populated.
11.2.4.5 Verify that the qualification and test databases were constructed in the same manner so that correct behavior on
the qualification database is indicative of correct behavior on the test database.
11.2.4.6 Note the method used to populate the database (i.e., dsdgen or modified version of dsdgen). Note the version
number (i.e., the major revision number, the minor revision number, and third tier number) of dsdgen, and the
names of the dsdgen files which have been modified. Verify that the version matches the benchmark
specification.
11.2.4.7 Verify that storage and processing elements that are not included in the priced configuration are physically
removed or made inaccessible during the performance test using a vendor supported method.
11.2.4.8 Verify that the validation data sets are proven consistent with the data loaded into the database according to
clause 3.5.
11.2.4.9 Verify referential integrity in the database after the initial load. Referential Integrity is a data property that can
be VERIFIED BY CHECKING THAT EVERY FOREIGN KEY HAS A CORRESPONDING PRIMARY KEY.
The following DDL statements define a detailed structure of the flat files, generated by dsdgen, that constitute
the refresh data set. The datatypes correspond to those in Clause 2.2.
Table A-1: Column definition s_zip_to_gmt
Column Datatype NULLs Foreign Key
zipg_zip char(5) N
zipg_gmt_offset integer N
B.1 query1.tpl
Find customers who have returned items more than 20% more often than the average customer returns for a
store in a given state for a given year.
Qualification Substitution Parameters:
YEAR.01=2000
STATE.01=TN
AGG_FIELD.01 = SR_RETURN_AMT
B.2 query2.tpl
Report the increase of weekly web and catalog sales from one year to the next year for each week. That is,
compute the increase of Monday, Tuesday, ... Sunday sales from one year to the following.
Qualification Substitution Parameters:
YEAR.01=2001
B.3 query3.tpl
Report the total extended sales price per item brand of a specific manufacturer for all sales in a specific month
of the year.
Qualification Substitution Parameters:
MONTH.01=11
MANUFACT =128
AGGC = ss_ext_sales_price
B.4 query4.tpl
Find customers who spend more money via catalog than in stores. Identify preferred customers and their
country of origin.
Qualification Substitution Parameters:
YEAR.01=2001
SELECTONE.01= t_s_secyear.customer_preferred_cust_flag
B.5 query5.tpl
Report sales, profit, return amount, and net loss in the store, catalog, and web channels for a 14-day window.
Rollup results by sales channel and channel specific sales method (store for store sales, catalog page for catalog
sales and web site for web sales)
Qualification Substitution Parameters:
SALES_DATE.01=2000-08-23
YEAR.01=2000
List all the states with at least 10 customers who during a given month bought items with the price tag at least
20% higher than the average price of items in the same category.
Qualification Substitution Parameters:
MONTH.01=1
YEAR.01=2001
B.7 query7.tpl
Compute the average quantity, list price, discount, and sales price for promotional items sold in stores where the
promotion is not offered by mail or a special event. Restrict the results to a specific gender, marital and
educational status.
Qualification Substitution Parameters:
YEAR.01=2000
ES.01=College
MS.01=S
GEN.01=M
B.8 query8.tpl
Compute the net profit of stores located in 400 Metropolitan areas with more than 10 preferred customers.
Qualification Substitution Parameters:
B.9 query9.tpl
Categorize store sales transactions into 5 buckets according to the number of items sold. Each bucket contains
the average discount amount, sales price, list price, tax, net paid, paid price including tax, or net profit..
Qualification Substitution Parameters:
AGGCTHEN.01= ss_ext_discount_amt
AGGCELSE.01= ss_net_paid
RC.01=74129
RC.02=122840
RC.03=56580
RC.04=10097
RC.05=165306
B.10 query10.tpl
Count the customers with the same gender, marital status, education status, purchase estimate, credit rating,
dependent count, employed dependent count and college dependent count who live in certain counties and who
have purchased from both stores and another sales channel during a three month time period of a given year.
Qualification Substitution Parameters:
YEAR.01 = 2002
MONTH.01 = 1
COUNTY.01 = Rush County
COUNTY.02 = Toole County
COUNTY.03 = Jefferson County
COUNTY.04 = Dona Ana County
COUNTY.05 = La Porte County
B.11 query11.tpl
Find customers whose increase in spending was large over the web than in stores this year compared to last
year.
Qualification Substitution Parameters:
YEAR.01 = 2001
SELECTONE = t_s_secyear.customer_preferred_cust_flag
B.12 query12.tpl
Compute the revenue ratios across item classes: For each item in a list of given categories, during a 30 day time
period, sold through the web channel compute the ratio of sales of that item to the sum of all of the sales in that
item's class.
Qualification Substitution Parameters
CATEGORY.01 = Sports
CATEGORY.02 = Books
CATEGORY.03 = Home
SDATE.01 = 1999-02-22
B.13 query13.tpl
Calculate the average sales quantity, average sales price, average wholesale cost, total wholesale cost for store
sales of different customer types (e.g., based on marital status, education status) including their household
demographics, sales price and different combinations of state and sales profit for a given year.
Qualification Substitution Parameters:
STATE.01 = TX
STATE.02 = OH
STATE.03 = TX
STATE.04 = OR
STATE.05 = NM
STATE.06 = KY
STATE.07 = VA
STATE.08 = TX
STATE.09 = MS
ES.01 = Advanced Degree
ES.02 = College
ES.03 = 2 yr Degree
MS.01 = M
MS.02 = S
MS.03 = W
B.14 query14.tpl)
DAY.01 = 11
YEAR.01 = 1999
B.15 query15.tpl
Report the total catalog sales for customers in selected geographical regions or who made large purchases for a
given year and quarter.
Qualification Substitution Parameters:
QOY.01 = 2
YEAR.01 = 2001
Report number of orders, total shipping costs and profits from catalog sales of particular counties and states for
a given 60 day period for non-returned sales filled from an alternate warehouse.
Qualification Substitution Parameters:
B.17 query17.tpl
Analyze, for each state, all items that were sold in stores in a particular quarter and returned in the next three
quarters and then re-purchased by the customer through the catalog channel in the three following quarters.
Qualification Substitution Parameters:
YEAR.01 = 2001
B.18 query18.tpl
Compute, for each county, the average quantity, list price, coupon amount, sales price, net profit, age, and
number of dependents for all items purchased through catalog sales in a given year by customers who were born
in a given list of six months and living in a given list of seven states and who also belong to a given gender and
education demographic.
Qualification Substitution Parameters:
MONTH.01 = 1
MONTH.02 = 6
MONTH.03 = 8
MONTH.04 = 9
MONTH.05 = 12
MONTH.06 = 2
STATE.01 = MS
STATE.02 = IN
STATE.03 = ND
STATE.04 = OK
STATE.05 = NM
STATE.06 = VA
STATE.07 = MS
ES.01 = Unknown
GEN.01 = F
YEAR.01 = 1998
B.19 query19.tpl
Select the top revenue generating products bought by out of zip code customers for a given year, month and
manager. Qualification Substitution Parameters
MANAGER.01 = 8
B.20 query20.tpl
Compute the total revenue and the ratio of total revenue to revenue by item class for specified item categories
and time periods.
Qualification Substitution Parameters:
CATEGORY.01 = Sports
CATEGORY.02 = Books
CATEGORY.03 = Home
SDATE.01 = 1999-02-22
YEAR.01 = 1999
B.21 query21.tpl
For all items whose price was changed on a given date, compute the percentage change in inventory between
the 30-day period BEFORE the price change and the 30-day period AFTER the change. Group this information
by warehouse.
Qualification Substitution Parameters:
SALES_DATE.01 = 2000-03-11
YEAR.01 = 2000
B.22 query22.tpl
For each product name, brand, class, category, calculate the average quantity on hand. Rollup data by product
name, brand, class and category.
Qualification Substitution Parameters:
DMS.01 = 1200
B.23 query23.tpl
MONTH.01 = 2
YEAR.01 = 2000
TOPPERCENT=50
MARKET = 8
COLOR.1 = peach
COLOR.2 = saddle
AMOUNTONE = ss_net_paid
B.25 query25.tpl
YEAR.01 = 2001
AGG.01 = sum
B.26 query26.tpl
Computes the average quantity, list price, discount, sales price for promotional items sold through the catalog
channel where the promotion was not offered by mail or in an event for given gender, marital status and
educational status.
Qualification Substitution Parameters:
YEAR.01 = 2000
ES.01 = College
MS.01 = S
GEN.01 = M
B.27 query27.tpl
For all items sold in stores located in six states during a given year, find the average quantity, average list price,
average list sales price, average coupon amount for a given gender, marital status, education and customer
demographic.
Qualification Substitution Parameters:
STATE_F.01 = TN
B.28 query28.tpl
Calculate the average list price, number of non empty (null) list prices and number of distinct list prices of six
different sales buckets of the store sales channel. Each bucket is defined by a range of distinct items and
information about list price, coupon amount and wholesale cost.
Qualification Substitution Parameters:
WHOLESALECOST.01=57
WHOLESALECOST.02=31
WHOLESALECOST.03=79
WHOLESALECOST.04=38
WHOLESALECOST.05=17
WHOLESALECOST.06=7
COUPONAMT.01=459
COUPONAMT.02=2323
COUPONAMT.03=12214
COUPONAMT.04=6071
COUPONAMT.05=836
COUPONAMT.06=7326
LISTPRICE.01=8
LISTPRICE.02=90
LISTPRICE.03=142
LISTPRICE.04=135
LISTPRICE.05=122
LISTPRICE.06=154
B.29 query29.tpl
Get all items that were sold in stores in a specific month and year and which were returned in the next six
months of the same year and re-purchased by the returning customer afterwards through the catalog sales
channel in the following three years.
For those these items, compute the total quantity sold through the store, the quantity returned and the quantity
purchased through the catalog. Group this information by item and store.
Qualification Substitution Parameters:
MONTH.01 = 9
YEAR.01 = 1999
AGG.01 = sum
Find customers and their detailed customer data who have returned items, which they bought on the web, for an
amount that is 20% higher than the average amount a customer returns in a given state in a given time period
across all items. Order the output by customer data.
Qualification Substitution Parameters:
YEAR.01 = 2002
STATE.01 = GA
B.31 query31.tpl
List counties where the percentage growth in web sales is consistently higher compared to the percentage
growth in store sales in the first three consecutive quarters for a given year.
Qualification Substitution Parameters:
YEAR.01 = 2000
AGG.01 = ss1.ca_county
B.32 query32.tpl
Compute the total discounted amount for a particular manufacturer in a particular 90 day period for catalog
sales whose discounts exceeded the average discount by at least 30%.
Qualification Substitution Parameters:
CSDATE.01 = 2000-01-27
YEAR.01 = 2000
IMID.01 = 977
B.33 query33.tpl
What is the monthly sales figure based on extended price for a specific month in a specific year, for
manufacturers in a specific category in a given time zone. Group sales by manufacturer identifier and sort
output by sales amount, by channel, and give Total sales.
Qualification Substitution Parameters:
CATEGORY.01 = Electronics
GMT.01 = -5
MONTH.01 = 5
YEAR.01 = 1998
B.34 query34.tpl
Display all customers with specific buy potentials and whose dependent count to vehicle count ratio is larger
than 1.2, who in three consecutive years made purchases with between 15 and 20 items in the beginning or the
end of each month in stores located in 8 counties.
Qualification Substitution Parameters:
B.35 query35.tpl
For the groups of customers living in the same state, having the same gender and marital status who have
purchased from stores and from either the catalog or the web during a given year, display the following:
• state, gender, marital status, count of customers
• min, max, avg, count distinct of the customer’s dependent count
• min, max, avg, count distinct of the customer’s employed dependent count
• min, max, avg, count distinct of the customer’s dependents in college count
Display / calculate the “count of customers” multiple times to emulate a potential reporting tool scenario.
Qualification Substitution Parameters:
YEAR.01 = 2002
AGGONE = min
AGGTWO = max
AGGTHREE = avg
B.36 query36.tpl
Compute store sales gross profit margin ranking for items in a given year for a given list of states.\
Qualification Substitution Parameters:
STATE_H.01 = TN
STATE_G.01 = TN
STATE_F.01 = TN
STATE_E.01 = TN
STATE_D.01 = TN
STATE_C.01 = TN
STATE_B.01 = TN
STATE_A.01 = TN
YEAR.01 = 2001
B.37 query37.tpl
List all items and current prices sold through the catalog channel from certain manufacturers in a given $30
price range and consistently had a quantity between 100 and 500 on hand in a 60-day period.
Qualification Substitution Parameters:
PRICE.01 = 68
MANUFACT_ID.01 = 677
MANUFACT_ID.02 = 940
MANUFACT_ID.03 = 694
MANUFACT_ID.04 = 808
INVDATE.01 = 2000-02-01
Display count of customers with purchases from all 3 channels in a given year.
Qualification Substitution Parameters:
DMS.01 = 1200
B.39 query39.tpl
YEAR.01 = 2001
MONTH.01 = 1
B.40 query40.tpl
Compute the impact of an item price change on the sales by computing the total sales for items in a 30 day
period before and after the price change. Group the items by location of warehouse where they were delivered
from.
Qualification Substitution Parameters
SALES_DATE.01 = 2000-03-11
YEAR.01 = 2000
B.41 query41.tpl
How many items do we carry with specific combinations of color, units, size and category.
Qualification Substitution Parameters
MANUFACT.01 = 738
SIZE.01 = medium
SIZE.02 = extra large
SIZE.03 = N/A
SIZE.04 = small
SIZE.05 = petite
SIZE.06 = large
UNIT.01 = Ounce
UNIT.02 = Oz
UNIT.03 = Bunch
UNIT.04 = Ton
UNIT.05 = N/A
UNIT.06 = Dozen
UNIT.07 = Box
UNIT.08 = Pound
UNIT.09 = Pallet
UNIT.10 = Gross
UNIT.11 = Cup
UNIT.12 = Dram
B.42 query42.tpl
For each item and a specific year and month calculate the sum of the extended sales price of store transactions.
Qualification Substitution Parameters:
MONTH.01 = 11
YEAR.01 = 2000
B.43 query43.tpl
Report the sum of all sales from Sunday to Saturday for stores in a given data range by stores.
Qualification Substitution Parameters:
YEAR.01 = 2000
GMT.01 = -5
B.44 query44.tpl
List the best and worst performing products measured by net profit.
Qualification Substitution Parameters:
NULLCOLSS.01 = ss_addr_sk
STORE.01 = 4
B.45 query45.tpl
Report the total web sales for customers in specific zip codes, cities, counties or states, or specific items for a
given year and quarter. .
Qualification Substitution Parameters:
QOY.01 = 2
YEAR.01 = 2001
B.46 query46.tpl
Compute the per-customer coupon amount and net profit of all "out of town" customers buying from stores
located in 5 cities on weekends in three consecutive years. The customers need to fit the profile of having a
specific dependent count and vehicle count. For all these customers print the city they lived in at the time of
purchase, the city in which the store is located, the coupon amount and net profit
Qualification Substitution Parameters:
CITY_E.01 = Fairview
CITY_D.01 = Fairview
CITY_C.01 = Fairview
CITY_B.01 = Midway
CITY_A.01 = Fairview
VEHCNT.01 = 3
YEAR.01 = 1999
DEPCNT.01 = 4
B.47 query47.tpl
Find the item brands and categories for each store and company, the monthly sales figures for a specified year,
where the monthly sales figure deviated more than 10% of the average monthly sales for the year, sorted by
deviation and store. Report deviation of sales from the previous and the following monthly sales.
Qualification Substitution Parameters
YEAR.01 = 1999
SELECTONE = v1.i_category, v1.i_brand, v1.s_store_name, v1.s_company_name
SELECTTWO = ,v1.d_year, v1.d_moy
B.48 query48.tpl
Calculate the total sales by different types of customers (e.g., based on marital status, education status), sales
price and different combinations of state and sales profit.
Qualification Substitution Parameters:
MS.01=M
MS.02=D
MS.03=S
ES.01=4 yr Degree
ES.02=2 yr Degree
ES.03=College
STATE.01=CO
STATE.02=OH
STATE.03=TX
STATE.04=OR
STATE.05=MN
STATE.06=KY
STATE.07=VA
STATE.08=CA
STATE.09=MS
YEAR.01=2000
Report the worst return ratios (sales to returns) of all items for each channel by quantity and currency sorted by
ratio. Quantity ratio is defined as total number of sales to total number of returns. Currency ratio is defined as
sum of return amount to sum of net paid.
Qualification Substitution Parameters:
MONTH.01 = 12
YEAR.01 = 2001
B.50 query50.tpl
For each store count the number of items in a specified month that were returned after 30, 60, 90, 120 and more
than 120 days from the day of purchase.
Qualification Substitution Parameters:
MONTH.01 = 8
YEAR.01 = 2001
B.51 query51.tpl
Compute the count of store sales resulting from promotions, the count of all store sales and their ratio for
specific categories in a particular time zone and for a given year and month.
Qualification Substitution Parameters:
DMS.01 = 1200
B.52 query52.tpl
Report the total of extended sales price for all items of a specific brand in a specific year and month.
Qualification Substitution Parameters
MONTH.01=11
YEAR.01=2000
B.53 query53.tpl
Find the ID, quarterly sales and yearly sales of those manufacturers who produce items with specific
characteristics and whose average monthly sales are larger than 10% of their monthly sales.
Qualification Substitution Parameters:
DMS.01 = 1200
B.54 query54.tpl
Find all customers who purchased items of a given category and class on the web or through catalog in a given
month and year that was followed by an in-store purchase at a store near their residence in the three consecutive
months. Calculate a histogram of the revenue by these customers in $50 segments showing the number of
customers in each of these revenue generated segments.
Qualification Substitution Parameters:
CLASS.01 = maternity
CATEGORY.01 = Women
MONTH.01 = 12
B.55 query55.tpl
For a given year, month and store manager calculate the total store sales of any combination all brands.
Qualification Substitution Parameters:
MANAGER.01 = 28
MONTH.01 = 11
YEAR.01 = 1999
B.56 query56.tpl
Compute the monthly sales amount for a specific month in a specific year, for items with three specific colors
across all sales channels. Only consider sales of customers residing in a specific time zone. Group sales by
item and sort output by sales amount.
Qualification Substitution Parameters:
COLOR.01 = slate
COLOR.02 = blanched
COLOR.03 = burnished
GMT.01 = -5
MONTH.01 = 2
YEAR.01 = 2001
B.57 query57.tpl
Find the item brands and categories for each call center and their monthly sales figures for a specified year,
where the monthly sales figure deviated more than 10% of the average monthly sales for the year, sorted by
deviation and call center. Report the sales deviation from the previous and following month.
Qualification Substitution Parameters:
YEAR.01 = 1999
SELECTONE = v1.i_category, v1.i_brand, v1.cc_name
SELECTTWO = ,v1.d_year, v1.d_moy
B.58 query58.tpl
Retrieve the items generating the highest revenue and which had a revenue that was approximately equivalent
across all of store, catalog and web within the week ending a given date.
Qualification Substitution Parameters:
SALES_DATE.01 = 2000-01-03
B.59 query59.tpl
Report the increase of weekly store sales from one year to the next year for each store and day of the week.
Qualification Substitution Parameters:
DMS.01 = 1212
What is the monthly sales amount for a specific month in a specific year, for items in a specific category,
purchased by customers residing in a specific time zone. Group sales by item and sort output by sales amount.
Qualification Substitution Parameters:
CATEGORY.01 = Music
GMT.01 = -5
MONTH.01 = 9
YEAR=1998
B.61 query61.tpl
Find the ratio of items sold with and without promotions in a given month and year. Only items in certain
categories sold to customers living in a specific time zone are considered.
Qualification Substitution Parameters:
GMT.01 = -5
CATEGORY.01 = Jewelry
MONTH.01 = 11
YEAR.01 = 1998
B.62 query62.tpl
For web sales, create a report showing the counts of orders shipped within 30 days, from 31 to 60 days, from 61
to 90 days, from 91 to 120 days and over 120 days within a given year, grouped by warehouse, shipping mode
and web site.
Qualification Substitution Parameters:
DMS.01 = 1200
B.63 query63.tpl
For a given year calculate the monthly sales of items of specific categories, classes and brands that were sold in
stores and group the results by store manager. Additionally, for every month and manager print the yearly
average sales of those items.
Qualification Substitution Parameters:
DMS.01 = 1200
B.64 query64.tpl
Find those stores that sold more cross-sales items from one year to another. Cross-sale items are items that are
sold over the Internet, by catalog and in store.
Qualification Substitution Parameters:
YEAR.01 = 1999
PRICE.01 = 64
COLOR.01 = purple
COLOR.02 = burlywood
COLOR.03 = indian
COLOR.04 = spring
COLOR.05 = floral
COLOR.06 = medium
In a given period, for each store, report the list of items with revenue less than 10% the average revenue for all
the items in that store.
Qualification Substitution Parameters:
DMS.01 = 1176
B.66 query66.tpl
Compute web and catalog sales and profits by warehouse. Report results by month for a given year during a
given 8-hour period.
Qualification Substitution Parameters
SALESTWO.01 = cs_sales_price
SALESONE.01 = ws_ext_sales_price
NETTWO.01 = cs_net_paid_inc_tax
NETONE.01 = ws_net_paid
SMC.01 = DHL
SMC.02 = BARIAN
TIMEONE.01 = 30838
YEAR.01 = 2001
B.67 query67.tpl
Find top stores for each category based on store sales in a specific year.
Qualification Substitution Parameters:
DMS.01 = 1200
B.68 query68.tpl
Compute the per customer extended sales price, extended list price and extended tax for "out of town" shoppers
buying from stores located in two cities in the first two days of each month of three consecutive years. Only
consider customers with specific dependent and vehicle counts.
Qualification Substitution Parameters:
CITY_B.01 = Midway
CITY_A.01 = Fairview
VEHCNT.01 = 3
YEAR.01 = 1999
DEPCNT.01 = 4
B.69 query69.tpl
Count the customers with the same gender, marital status, education status, education status, purchase estimate
and credit rating who live in certain states and who have purchased from stores but neither form the catalog nor
from the web during a two month time period of a given year.
Qualification Substitution Parameters:
STATE.01 = KY
STATE.02 = GA
STATE.03 = NM
B.70 query70.tpl
Compute store sales net profit ranking by state and county for a given year and determine the five most
profitable states.
Qualification Substitution Parameters:
DMS.01 = 1200
B.71 query71.tpl
Select the top revenue generating products, sold during breakfast or dinner time for one month managed by a
given manager across all three sales channels.
Qualification Substitution Parameters:
MANAGER.01 = 1
MONTH.01 = 11
YEAR.01 = 1999
B.72 query72.tpl
For each item, warehouse and week combination count the number of sales with and without promotion.
Qualification Substitution Parameters:
BP.01 = >10000
MS.01 = D
YEAR.01 = 1999
Comment: The adding of the scalar number 5 to d1.d_date in the predicate “d3.d_date > d1.d_date + 5”
means that 5 days are added to d1.d_date.
B.73 query73.tpl
Count the number of customers with specific buy potentials and whose dependent count to vehicle count ratio is
larger than 1 and who in three consecutive years bought in stores located in 4 counties between 1 and 5 items in
one purchase. Only purchases in the first 2 days of the months are considered.
Qualification Substitution Parameters:
Display customers with both store and web sales in consecutive years for whom the increase in web sales
exceeds the increase in store sales for a specified year.
Qualification Substitution Parameters:
YEAR.01 = 2001
AGGONE.01 = sum
ORDERC.01 = 1
ORDERC.02 = 1
ORDERC.03 = 1
B.75 query75.tpl
For two consecutive years track the sales of items by brand, class and category.
Qualification Substitution Parameters:
CATEGORY.01 = Books
YEAR.01 = 2002
B.76 query76.tpl
Computes the average quantity, list price, discount, sales price for promotional items sold through the web
channel where the promotion is not offered by mail or in an event for given gender, marital status and
educational status.
Qualification Substitution Parameters:
NULLCOLCS.01 = cs_ship_addr_sk
NULLCOLWS.01 = ws_ship_customer_sk
NULLCOLSS.01 = ss_store_sk
B.77 query77.tpl
Report the total sales, returns and profit for all three sales channels for a given 30 day period. Roll up the
results by channel and a unique channel location identifier.
Qualification Substitution Parameters:
SALES_DATE.01 = 2000-08-23
B.78 query78.tpl
Report the top customer / item combinations having the highest ratio of store channel sales to all other channel
sales (minimum 2 to 1 ratio), for combinations with at least one store sale and one other channel sale. Order the
output by highest ratio.
Qualification Substitution Parameters:
YEAR.01 = 2000
SELECTONE.01 = ss_sold_year, ss_item_sk, ss_customer_sk
Compute the per customer coupon amount and net profit of Monday shoppers. Only purchases of three
consecutive years made on Mondays in large stores by customers with a certain dependent count and with a
large vehicle count are considered.
Qualification Substitution Parameters:
VEHCNT.01 = 2
YEAR.01 = 1999
DEPCNT.01 = 6
B.80 query80.tpl
Report extended sales, extended net profit and returns in the store, catalog, and web channels for a 30 day
window for items with prices larger than $50 not promoted on television, rollup results by sales channel and
channel specific sales means (store for store sales, catalog page for catalog sales and web site for web sales)
Qualification Substitution Parameters:
SALES_DATE.01 = 2000-08-23
B.81 query81.tpl
Find customers and their detailed customer data who have returned items bought from the catalog more than 20
percent the average customer returns for customers in a given state in a given time period. Order output by
customer data.
Qualification Substitution Parameters:
YEAR.01 = 2000
STATE.01 = GA
B.82 query82.tpl
Find customers who tend to spend more money (net-paid) on-line than in stores.
Qualification Substitution Parameters
MANUFACT_ID.01 = 129
MANUFACT_ID.02 = 270
MANUFACT_ID.03 = 821
MANUFACT_ID.04 = 423
INVDATE.01 = 2000-05-25
PRICE.01 = 62
B.83 query83.tpl
Retrieve the items with the highest number of returns where the number of returns was approximately
equivalent across all store, catalog and web channels (within a tolerance of +/- 10%), within the week ending a
given date.
Qualification Substitution Parameters
RETURNED_DATE_THREE.01 = 2000-11-17
RETURNED_DATE_TWO.01 = 2000-09-27
RETURNED_DATE_ONE.01 = 2000-06-30
List all customers living in a specified city, with an income between 2 values.
Qualification Substitution Parameters
INCOME.01 = 38128
CITY.01 = Edgewood
B.85 query85.tpl
For all web return reason calculate the average sales, average refunded cash and average return fee by different
combinations of customer and sales types (e.g., based on marital status, education status, state and sales profit).
Qualification Substitution Parameters:
YEAR.01 = 2000
STATE.01 = IN
STATE.02 = OH
STATE.03 = NJ
STATE.04 = WI
STATE.05 = CT
STATE.06 = KY
STATE.07 = LA
STATE.08 = IA
STATE.09 = AR
ES.01 = Advanced Degree
ES.02 = College
ES.03 = 2 yr Degree
MS.01 = M
MS.02 = S
MS.03 = W
B.86 query86.tpl
Rollup the web sales for a given year by category and class, and rank the sales among peers within the parent,
for each group compute sum of sales, location with the hierarchy and rank within the group.
Qualification Substitution Parameters:
DMS.01 = 1200
B.87 query87.tpl
Count how many customers have ordered on the same day items on the web and the catalog and on the same
day have bought items in a store.
Qualification Substitution Parameters:
DMS.01 = 1200
How many items do we sell between pacific times of a day in certain stores to customers with one dependent
count and 2 or less vehicles registered or 2 dependents with 4 or fewer vehicles registered or 3 dependents and
five or less vehicles registered. In one row break the counts into sells from 8:30 to 9, 9 to 9:30, 9:30 to 10 ... 12
to 12:30
Qualification Substitution Parameters:
STORE.01=Unknown
HOUR.01=4
HOUR.02=2
HOUR.03=0
B.89 query89.tpl
Within a year list all month and combination of item categories, classes and brands that have had monthly sales
larger than 0.1 percent of the total yearly sales.
Qualification Substitution Parameters:
CLASS_F.01 = dresses
CAT_F.01 = Women
CLASS_E.01 = birdal
CAT_E.01 = Jewelry
CLASS_D.01 = shirts
CAT_D.01 = Men
CLASS_C.01 = football
CAT_C.01 = Sports
CLASS_B.01 = stereo
CAT_B.01 = Electronics
CLASS_A.01 = computers
CAT_A.01 = Books
YEAR.01 = 1999
B.90 query90.tpl
What is the ratio between the number of items sold over the internet in the morning (8 to 9am) to the number of
items sold in the evening (7 to 8pm) of customers with a specified number of dependents. Consider only
websites with a high amount of content.
Qualification Substitution Parameters:
HOUR_PM.01 = 19
HOUR_AM.01 = 8
DEPCNT.01 = 6
B.91 query91.tpl
Display total returns of catalog sales by call center and manager in a particular month for male customers of
unknown education or female customers with advanced degrees with a specified buy potential and from a
particular time zone.
Qualification Substitution Parameters:
YEAR.01 = 1998
MONTH.01 = 11
B.92 query92.tpl
Compute the total discount on web sales of items from a given manufacturer over a particular 90 day period for
sales whose discount exceeded 30% over the average discount of items from that manufacturer in that period of
time.
Qualification Substitution Parameters:
IMID.01 = 350
WSDATE.01 = 2000-01-27
B.93 query93.tpl
For a given merchandise return reason, report on customers’ total cost of purchases minus the cost of returned
items.
REASON.01 = reason 28
B.94 query94.tpl
Produce a count of web sales and total shipping cost and net profit in a given 60 day period to customers in a
given state from a named web site for non returned orders shipped from more than one warehouse.
Qualification Substitution Parameters:
YEAR.01 = 1999
MONTH.01 = 2
STATE.01 = IL
B.95 query95.tpl
Produce a count of web sales and total shipping cost and net profit in a given 60 day period to customers in a
given state from a named web site for returned orders shipped from more than one warehouse.
Qualification Substitution Parameters:
STATE.01=IL
MONTH.01=2
YEAR.01=1999
B.96 query96.tpl
Compute a count of sales from a named store to customers with a given number of dependents made in a
specified half hour period of the day.
Qualification Substitution Parameters:
HOUR.01 = 20
DEPCNT.01 = 7
Generate counts of promotional sales and total sales, and their ratio from the web channel for a particular item
category and month to customers in a given time zone.
Qualification Substitution Parameters:
DMS.01 = 1200
B.98 query98.tpl
Report on items sold in a given 30 day period, belonging to the specified category.
Qualification Substitution Parameters
YEAR.01 = 1999
SDATE.01 = 1999-02-22
CATEGORY.01 = Sports
CATEGORY.02 = Books
CATEGORY.03 = Home
B.99 query99.tpl
For catalog sales, create a report showing the counts of orders shipped within 30 days, from 31 to 60 days, from
61 to 90 days, from 91 to 120 days and over 120 days within a given year, grouped by warehouse, call center
and shipping mode.
Qualification Substitution Parameters
DMS.01 = 1200
Query10.tpl Query10a.tpl
Query18.tpl Query18a.tpl
Query27.tpl Query27a.tpl
Query35.tpl Query35a.tpl
Query36.tpl Query36a.tpl
Query51.tpl Query51a.tpl
Query70.tpl Query70a.tpl
Query77.tpl Query77a.tpl
Query80.tpl Query80a.tpl
Query86.tpl Query86a.tpl
In addition to this document, TPC-DS relies on material that is only available electronically. While not included
in the printed version of the specification, this “soft appendix” is integral to the submission of a compliant TPC-
DS benchmark result.
F.2 Availability
F.3 Compatibility
This material is maintained, versioned and revised independently of the specification itself. It is the benchmark
sponsor’s responsibility to assure that any benchmark submission relies on a revision of the soft appendix that is
compliant with the revision of the TPC-DS specification against which the result is being submitted.
The soft appendix includes a version number similar to that used in the specification, with a major version
number, a minor version number and a third tier level, each separated by a decimal point. The major and minor
revision numbers are tied to those of the TPC-DS specification with which the soft appendix is compliant. The
third tier level of the soft appendix is incremented whenever the appendix itself is updated, and is independent
of revision changes or updates to the specification.
A revision of the soft appendix may be used to submit a TPC-DS benchmark result provided that the major
revision number of the soft appendix matches that of a specification revision that is eligible for benchmark
submission;
Comment: The intent of this clause is to allow for the possibly lengthy tuning and preparation cycle that
precedes a benchmark submission, during which a third tier revision could be released.
Benchmark sponsors are encouraged to use the most recent patch level of a given soft appendix version, as it
will contain the latest clarifications and bug fixes, but any third tier level may be used to produce a compliant
benchmark submission as long as the prior conditions are met.
The schema of the ES.xml document is defined by the XML schema document tpcds-es.xsd available at located
on the TPC website (http://www.tpc.org). The ES.xml file must conform to the tpcds-es.xsd (established by
XML schema validation).
An XML document conforming to the tpcds-es.xsd schema contains a single element named tpcdsResult of type
RootType. The main complex types are explained in the sections below. The other types not included here can
be found in tpcds-es.xsd.
SpecVersion SpecVersionType
PricingSpecVersion SpecVersionType
ReportDate date
RevisionDate date
AuditorName AuditorType The name of the Auditor who certified the result.
DBName string
DBMiscInfo string
OSName string
OSVersion string
OSMiscInfo string
ProcessorName string
Database
ProcessorCount positiveInteger
Server
Database
CoreCount positiveInteger
Server
Database
ThreadCount positiveInteger
Server
Memory decimal
Database Server
PerNodeHardware PerNodeHardwareType
RaidLevel string
SpindleTechnology string
SpindleCount positiveInteger
SpindleRPM positiveInteger
StorageSwitchDescription string
StorageSwitchCount positiveInteger
StorageSwitchTechnology string
LoadTimeIncludesBackup YesNoType
RunTiming RunTimingDataType
RunTiming RunTimingDataType
PowerQuery PowerQueryDataType
RunTiming RunTimingDataType
RefreshFunction RefreshDataType
RunTiming RunTimingDataType
Query QueryDataType
QueryNumber positiveInteger
RT RTType
QueryNumber positiveInteger
RTMin RTType
RTMax RTTYpe
RTMedian RTType
RT25th RTType
RT75th RTType
RefreshFunctionName RefreshFunctionNameDataType
Table 5-4
RT RTType