0% found this document useful (0 votes)
449 views

Duckdb Docs

This document provides documentation for DuckDB version 0.10.0-dev. It contains sections covering connecting to DuckDB, importing different data formats like CSV and JSON, working with multiple files, partitioning data, client APIs for C, and more. The document serves as a technical reference for using DuckDB's various features and interfaces.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
449 views

Duckdb Docs

This document provides documentation for DuckDB version 0.10.0-dev. It contains sections covering connecting to DuckDB, importing different data formats like CSV and JSON, working with multiple files, partitioning data, client APIs for C, and more. The document serves as a technical reference for using DuckDB's various features and interfaces.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 721

DuckDB Documentation

DuckDB version 0.10.0‑dev


Generated on 2024‑03‑04 at 09:16 UTC
Contents

Contents i

Summary 1

Documentation 3

Connect 5
Connect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Data Import 7
Importing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CSV Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
CSV Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
CSV Auto Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Reading Faulty CSV Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
CSV Import Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
JSON Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
JSON Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Multiple Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Reading Multiple Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Combining Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Parquet Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Reading and Writing Parquet Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Querying Parquet Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Parquet Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Parquet Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Hive Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Partitioned Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Appender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
INSERT Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Client APIs 39
Client APIs Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Startup & Shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Data Chunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Prepared Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Appender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Table Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

i
DuckDB Documentation

Replacement Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


Complete API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
C++ API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
CLI API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Command Line Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Dot Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Output Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Autocomplete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Syntax Highlighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Go . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Java JDBC API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Julia Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Node.js . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Node.js API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Node.js API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Python API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Data Ingestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Result Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Python DB API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Relational API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Python Function API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Types API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Expression API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Spark API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Python Client API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Known Python Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
R API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Rust API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Swift API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Wasm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
DuckDB Wasm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Data Ingestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
ADBC API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
ODBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
ODBC API ‑ Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
ODBC API ‑ Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
ODBC API ‑ Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
ODBC API ‑ macOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

Configuration 301
Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Secrets Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

SQL 313
SQL Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
Statements Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

ii
DuckDB Documentation

ALTER TABLE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320


ALTER VIEW Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
ATTACH/DETACH Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
CALL Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
CHECKPOINT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
COMMENT ON Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
COPY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
CREATE MACRO Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
CREATE SCHEMA Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
CREATE SECRET Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
CREATE SEQUENCE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
CREATE TABLE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
CREATE VIEW Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
CREATE TYPE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
DELETE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
DROP Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
EXPORT/IMPORT DATABASE Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
INSERT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
PIVOT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Profiling Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
SELECT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
SET/RESET Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
Transaction Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
UNPIVOT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
UPDATE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
USE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
VACUUM Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
Query Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
SELECT Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
FROM & JOIN Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
WHERE Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
GROUP BY Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
GROUPING SETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
HAVING Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
ORDER BY Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
LIMIT Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
SAMPLE Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Unnesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
WITH Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
WINDOW Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
QUALIFY Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
VALUES Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
FILTER Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Prepared Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Array Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Bitstring Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Blob Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Boolean Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Date Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Enum Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

iii
DuckDB Documentation

Interval Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403


List Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Literal Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Map Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
NULL Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
Numeric Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Struct Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
Text Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
Time Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Timestamp Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Time Zone Reference List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
Union Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
Typecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
CASE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Casting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Collations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
IN Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Star Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
Subqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Bitstring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Blob Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Date Format Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Date Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
Date Part Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Enum Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
Interval Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
Lambda Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
Nested Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Numeric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
Text Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
Time Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
Timestamp Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
Timestamp with Time Zone Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
Aggregate Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
Information Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
DuckDB_% Metadata Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
Keywords and Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
Window Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

Extensions 533
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
Official Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535

iv
DuckDB Documentation

Working with Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536


Versioning of Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Arrow Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
AutoComplete Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
AWS Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
Azure Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
Excel Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Full‑Text Search Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
httpfs (HTTP and S3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
httpfs Extension for HTTP and S3 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
HTTP(S) Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
S3 API Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Legacy Authentication Scheme for S3 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Iceberg Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
ICU Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
inet Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
jemalloc Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
JSON Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
MySQL Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
PostgreSQL Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
Spatial Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
SQLite Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
Substrait Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
TPC‑DS Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
TPC‑H Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595

Guides 599

Data Import & Export 601


Data Import Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
CSV Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
CSV Export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
Parquet Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
Parquet Export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
Querying Parquet Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
HTTP Parquet Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
S3 Parquet Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
S3 Parquet Export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
S3 Iceberg Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
S3 Express One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
GCS Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
Cloudflare R2 Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
JSON Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
JSON Export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
Excel Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
Excel Export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
MySQL Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
PostgreSQL Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
SQLite Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
Directly Reading Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612

Performance 615
Performance Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615

v
DuckDB Documentation

Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
Tuning Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
My Workload Is Slow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623

Meta Queries 625


Describe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
EXPLAIN: Inspect Query Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
EXPLAIN ANALYZE: Profile Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628
List Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630
Summarize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
DuckDB Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632

ODBC 635
ODBC 101: A Duck Themed Guide to ODBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635

Python 643
Installing the Python Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
Executing SQL in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
Jupyter Notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
SQL on Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
Import from Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
Export to Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
SQL on Apache Arrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
Import from Apache Arrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
Export to Apache Arrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
Relational API on Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
Multiple Python Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
Integration with Ibis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
Integration with Polars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
Using fsspec Filesystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664

SQL Features 667


Friendly SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
AsOf Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
Full‑Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671

SQL Editors 673


DBeaver SQL IDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673

Data Viewers 675


Tableau ‑ A Data Visualization Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
CLI Charting with YouPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678

Under the Hood 681

Internals 683
Overview of DuckDB Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
Execution Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687

vi
DuckDB Documentation

Developer Guides 691


Building DuckDB from Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
sqllogictest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
sqllogictest ‑ Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
sqllogictest ‑ Result Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
sqllogictest ‑ Persistent Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
sqllogictest ‑ Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
sqllogictest ‑ Multiple Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
Catch C/C++ Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707

Acknowledgments 709

vii
Summary

This document contains DuckDB's official documentation and guides in a single‑file easy‑to‑search form. If you find any issues, please
report them as a GitHub issue. Contributions are very welcome in the form of pull requests. If you are considering submitting a contribution
to the documentation, please consult our contributor guide.

Code repositories:

• DuckDB source code: github.com/duckdb/duckdb

• DuckDB documentation source code: github.com/duckdb/duckdb‑web

1
DuckDB Documentation

2
DuckDB Documentation

Documentation

3
Connect

Connect

Connect or Create a Database

To use DuckDB, you must first create a connection to a database. The exact syntax varies between the client APIs but it typically involves
passing an argument to configure persistence.

Persistence

DuckDB can operate in both persistent mode, where the data is saved to disk, and in in‑memory mode, where the entire data set is stored
in the main memory.

Persistent Database To create or open a persistent database, set the path of the database file, e.g., my_database.duckdb, when
creating the connection. This path can point to an existing database or to a file that does not yet exist and DuckDB will open or create a
database at that location as needed. The file may have an arbitrary extension, but .db or .duckdb are two common choices.

Note. Tip Running on a persistent database allows spilling to disk, thus facilitating larger‑than‑memory workloads (i.e., out‑of‑core‑
processing).

Starting with v0.10, DuckDB's storage format is backwards‑compatible, i.e., DuckDB is able to read database files produced by an older
versions of DuckDB. For example, DuckDB v0.10 can read and operate on files created by the previous DuckDB version, v0.9. For more
details on DuckDB's storage format, see the storage page.

In‑Memory Database DuckDB can operate in in‑memory mode. In most clients, this can be activated by passing the special value :mem-
ory: as the database file or omitting the database file argument. In in‑memory mode, no data is persisted to disk, therefore, all data is
lost when the process finishes.

Concurrency

Handling Concurrency

DuckDB has two configurable options for concurrency:

1. One process can both read and write to the database.


2. Multiple processes can read from the database, but no processes can write (access_mode = 'READ_ONLY').

When using option 1, DuckDB supports multiple writer threads using a combination of MVCC (Multi‑Version Concurrency Control) and
optimistic concurrency control (see Concurrency within a Single Process), but all within that single writer process. The reason for this con‑
currency model is to allow for the caching of data in RAM for faster analytical queries, rather than going back and forth to disk during each
query. It also allows the caching of functions pointers, the database catalog, and other items so that subsequent queries on the same
connection are faster.

5
DuckDB Documentation

Note. DuckDB is optimized for bulk operations, so executing many small transactions is not a primary design goal.

Concurrency within a Single Process

DuckDB supports concurrency within a single process according to the following rules. As long as there are no write conflicts, multiple
concurrent writes will succeed. Appends will never conflict, even on the same table. Multiple threads can also simultaneously update
separate tables or separate subsets of the same table. Optimistic concurrency control comes into play when two threads attempt to edit
(update or delete) the same row at the same time. In that situation, the second thread to attempt the edit will fail with a conflict error.

Writing to DuckDB from Multiple Processes

Writing to DuckDB from multiple processes is not supported automatically and is not a primary design goal (see Handling Concurrency).

If multiple processes must write to the same file, several design patterns are possible, but would need to be implemented in application
logic. For example, each process could acquire a cross‑process mutex lock, then open the database in read/write mode and close it when the
query is complete. Instead of using a mutex lock, each process could instead retry the connection if another process is already connected to
the database (being sure to close the connection upon query completion). Another alternative would be to do multi‑process transactions
on a MySQL, PostgreSQL, or SQLite database, and use DuckDB's MySQL, PostgreSQL, or SQLite extensions to execute analytical queries on
that data periodically.

Additional options include writing data to Parquet files and using DuckDB's ability to read multiple Parquet files, taking a similar approach
with CSV files, or creating a web server to receive requests and manage reads and writes to DuckDB.

6
Data Import

Importing Data

The first step to using a database system is to insert data into that system. DuckDB provides several data ingestion methods that allow you
to easily and efficiently fill up the database. In this section, we provide an overview of these methods so you can select which one is correct
for you.

Insert Statements

Insert statements are the standard way of loading data into a database system. They are suitable for quick prototyping, but should be
avoided for bulk loading as they have significant per‑row overhead.

INSERT INTO people VALUES (1, 'Mark');

For a more detailed description, see the page on the INSERT statement.

CSV Loading

Data can be efficiently loaded from CSV files using the read_csv function or the COPY statement.

SELECT * FROM read_csv('test.csv');

You can also load data from compressed (e.g., compressed with gzip) CSV files, for example:

SELECT * FROM read_csv('test.csv.gz');

For more details, see the page on CSV loading.

Parquet Loading

Parquet files can be efficiently loaded and queried using the read_parquet function.

SELECT * FROM read_parquet('test.parquet');

For more details, see the page on Parquet loading.

JSON Loading

JSON files can be efficiently loaded and queried using the read_json_auto function.

SELECT * FROM read_json_auto('test.json');

For more details, see the page on JSON loading.

Appender

In several APIs (C, C++, Go, Java, and Rust), the Appender can be used as an alternative for bulk data loading. This class can be used to
efficiently add rows to the database system without using SQL statements.

7
DuckDB Documentation

CSV Files

CSV Import

Examples

The following examples use the flights.csv file.

-- read a CSV file from disk, auto-infer options


SELECT * FROM 'flights.csv';
-- read_csv with custom options
SELECT * FROM read_csv('flights.csv',
delim = '|',
header = true,
columns = {
'FlightDate': 'DATE',
'UniqueCarrier': 'VARCHAR',
'OriginCityName': 'VARCHAR',
'DestCityName': 'VARCHAR'
});

# read a CSV from stdin, auto-infer options


cat flights.csv | duckdb -c "SELECT * FROM read_csv('/dev/stdin')"

-- read a CSV file into a table


CREATE TABLE ontime (
FlightDate DATE,
UniqueCarrier VARCHAR,
OriginCityName VARCHAR,
DestCityName VARCHAR
);
COPY ontime FROM 'flights.csv';

-- alternatively, create a table without specifying the schema manually


CREATE TABLE ontime AS SELECT * FROM 'flights.csv';
-- we can use the FROM-first syntax to omit 'SELECT *'
CREATE TABLE ontime AS FROM 'flights.csv';

-- write the result of a query to a CSV file


COPY (SELECT * FROM ontime) TO 'flights.csv' WITH (HEADER true, DELIMITER '|');
-- if we serialize the entire table, we can simply refer to it with its name
COPY ontime TO 'flights.csv' WITH (HEADER true, DELIMITER '|');

CSV Loading

CSV loading, i.e., importing CSV files to the database, is a very common, and yet surprisingly tricky, task. While CSVs seem simple on the
surface, there are a lot of inconsistencies found within CSV files that can make loading them a challenge. CSV files come in many different
varieties, are often corrupt, and do not have a schema. The CSV reader needs to cope with all of these different situations.

The DuckDB CSV reader can automatically infer which configuration flags to use by analyzing the CSV file using the CSV sniffer. This will
work correctly in most situations, and should be the first option attempted. In rare situations where the CSV reader cannot figure out the
correct configuration it is possible to manually configure the CSV reader to correctly parse the CSV file. See the auto detection page for
more information.

Parameters

Below are parameters that can be passed to the CSV reader. These parameters are accepted by both the COPY statement and the read_
csv function.

8
DuckDB Documentation

Name Description Type Default

all_varchar Option to skip type detection for CSV parsing and assume BOOL false
all columns to be of type VARCHAR.
allow_quoted_nulls Option to allow the conversion of quoted values to NULL BOOL true
values
auto_detect Enables auto detection of CSV parameters. BOOL true
auto_type_candidates This option allows you to specify the types that the sniffer TYPE[] ['SQLNULL',
will use when detecting CSV column types, e.g., SELECT 'BOOLEAN',
* FROM read_csv('csv_file.csv', auto_ 'BIGINT',
type_candidates=['BIGINT', 'DATE']). The 'DOUBLE',
VARCHAR type is always included in the detected types (as 'TIME',
a fallback option). 'DATE',
'TIMESTAMP',
'VARCHAR']
columns A struct that specifies the column names and column types STRUCT (empty)
contained within the CSV file (e.g., {'col1':
'INTEGER', 'col2': 'VARCHAR'}). Using this
option implies that auto detection is not used.
compression The compression type for the file. By default this will be VARCHAR auto
detected automatically from the file extension (e.g.,
t.csv.gz will use gzip, t.csv will use none). Options
are none, gzip, zstd.
dateformat Specifies the date format to use when parsing dates. See VARCHAR (empty)
Date Format.
decimal_separator The decimal separator of numbers. VARCHAR .
delim or sep Specifies the string that separates columns within each VARCHAR ,
row (line) of the file.
escape Specifies the string that should appear before a data VARCHAR "
character sequence that matches the quote value.
filename Whether or not an extra filename column should be BOOL false
included in the result.
force_not_null Do not match the specified columns' values against the VARCHAR[] []
NULL string. In the default case where the NULL string is
empty, this means that empty values will be read as
zero‑length strings rather than NULLs.
header Specifies that the file contains a header line with the BOOL false
names of each column in the file.
hive_partitioning Whether or not to interpret the path as a Hive partitioned BOOL false
path.
ignore_errors Option to ignore any parsing errors encountered ‑ and BOOL false
instead ignore rows with errors.
max_line_size The maximum line size in bytes. BIGINT 2097152
names The column names as a list, see example. VARCHAR[] (empty)
new_line Set the new line character(s) in the file. Options are VARCHAR (empty)
'\r','\n', or '\r\n'.

9
DuckDB Documentation

Name Description Type Default

normalize_names Boolean value that specifies whether or not column names BOOL false
should be normalized, removing any non‑alphanumeric
characters from them.
null_padding If this option is enabled, when a row lacks columns, it will BOOL false
pad the remaining columns on the right with null values.
nullstr Specifies the string that represents a NULL value. VARCHAR (empty)
parallel Whether or not the parallel CSV reader is used. BOOL true
quote Specifies the quoting string to be used when a data value is VARCHAR "
quoted.
sample_size The number of sample rows for auto detection of BIGINT 20480
parameters.
skip The number of lines at the top of the file to skip. BIGINT 0
timestampformat Specifies the date format to use when parsing timestamps. VARCHAR (empty)
See Date Format
types or dtypes The column types as either a list (by position) or a struct VARCHAR[] or (empty)
(by name). Example here. STRUCT
union_by_name Whether the columns of multiple schemas should be BOOL false
unified by name, rather than by position.

CSV Functions

Note. Deprecated DuckDB v0.10.0 introduced breaking changes to the read_csv function. Namely, The read_csv function now
attempts auto‑detecting the CSV parameters, making its behavior identical to the old read_csv_auto function. If you would like to
use read_csv with its old behavior, turn off the auto‑detection manually by using read_csv(..., auto_detect = false).

The read_csv automatically attempts to figure out the correct configuration of the CSV reader using the CSV sniffer. It also automatically
deduces types of columns. If the CSV file has a header, it will use the names found in that header to name the columns. Otherwise, the
columns will be named column0, column1, column2, .... An example with the flights.csv file:

SELECT * FROM read_csv('flights.csv');

FlightDate UniqueCarrier OriginCityName DestCityName

1988‑01‑01 AA New York, NY Los Angeles, CA


1988‑01‑02 AA New York, NY Los Angeles, CA
1988‑01‑03 AA New York, NY Los Angeles, CA

The path can either be a relative path (relative to the current working directory) or an absolute path.

We can use read_csv to create a persistent table as well:

CREATE TABLE ontime AS SELECT * FROM read_csv('flights.csv');


DESCRIBE ontime;

Field Type Null Key Default Extra

FlightDate DATE YES NULL NULL NULL

10
DuckDB Documentation

Field Type Null Key Default Extra

UniqueCarrier VARCHAR YES NULL NULL NULL


OriginCityName VARCHAR YES NULL NULL NULL
DestCityName VARCHAR YES NULL NULL NULL

SELECT * FROM read_csv('flights.csv', sample_size = 20000);

If we set delim/sep, quote, escape, or header explicitly, we can bypass the automatic detection of this particular parameter:

SELECT * FROM read_csv('flights.csv', header = true);

Multiple files can be read at once by providing a glob or a list of files. Refer to the multiple files section for more information.

Writing Using the COPY Statement

The COPY statement can be used to load data from a CSV file into a table. This statement has the same syntax as the one used in PostgreSQL.
To load the data using the COPY statement, we must first create a table with the correct schema (which matches the order of the columns
in the CSV file and uses types that fit the values in the CSV file). COPY detects the CSV's configuration options automatically.

CREATE TABLE ontime (


flightdate DATE,
uniquecarrier VARCHAR,
origincityname VARCHAR,
destcityname VARCHAR
);
COPY ontime FROM 'flights.csv';
SELECT * FROM ontime;

flightdate uniquecarrier origincityname destcityname

1988‑01‑01 AA New York, NY Los Angeles, CA


1988‑01‑02 AA New York, NY Los Angeles, CA
1988‑01‑03 AA New York, NY Los Angeles, CA

If we want to manually specify the CSV format, we can do so using the configuration options of COPY.

CREATE TABLE ontime (flightdate DATE, uniquecarrier VARCHAR, origincityname VARCHAR, destcityname
VARCHAR);
COPY ontime FROM 'flights.csv' (DELIMITER '|', HEADER);
SELECT * FROM ontime;

Reading Faulty CSV Files

DuckDB supports reading erroneous CSV files. For details, see the Reading Faulty CSV Files page.

Limitations

The CSV reader only supports input files using UTF‑8 character encoding. For CSV files using different encodings, use e.g. the iconv
command‑line tool to convert them to UTF‑8.

11
DuckDB Documentation

CSV Auto Detection

When using read_csv, the system tries to automatically infer how to read the CSV file using the CSV sniffer. This step is necessary because
CSV files are not self‑describing and come in many different dialects. The auto‑detection works roughly as follows:

• Detect the dialect of the CSV file (delimiter, quoting rule, escape)
• Detect the types of each of the columns
• Detect whether or not the file has a header row

By default the system will try to auto‑detect all options. However, options can be individually overridden by the user. This can be useful in
case the system makes a mistake. For example, if the delimiter is chosen incorrectly, we can override it by calling the read_csv with an
explicit delimiter (e.g., read_csv('file.csv', delim = '|')).

The detection works by operating on a sample of the file. The size of the sample can be modified by setting the sample_size parameter.
The default sample size is 20480 rows. Setting the sample_size parameter to -1 means the entire file is read for sampling. The way
sampling is performed depends on the type of file. If we are reading from a regular file on disk, we will jump into the file and try to sample
from different locations in the file. If we are reading from a file in which we cannot jump ‑ such as a .gz compressed CSV file or stdin ‑
samples are taken only from the beginning of the file.

sniff_csv Function

It is possible to run the CSV sniffer as a separate step using the sniff_csv(filename) function, which returns the detected CSV prop‑
erties as a table with a single row. The sniff_csv function accepts an optional sample_size parameter to configure the number of
rows sampled.

FROM sniff_csv('my_file.csv');
FROM sniff_csv('my_file.csv', sample_size = 1000);

Column name Description Example

Delimiter delimiter ,
Quote quote character "
Escape escape \
NewLineDelimiter new‑line delimiter \r\n
SkipRow number of rows skipped 1
HasHeader whether the CSV has a header true
Columns column types encoded as a LIST of ({'name': 'VARCHAR', 'age': 'BIGINT'})
STRUCTs
DateFormat date Format %d/%m/%Y
TimestampFormat timestamp Format %Y-%m-%dT%H:%M:%S.%f
UserArguments arguments used to invoke sniff_csv sample_size = 1000
Prompt prompt ready to be used to read the CSV FROM read_csv('my_file.csv', auto_
detect=false, delim=',', ...)

Prompt The Prompt column contains a SQL command with the configurations detected by the sniffer.

-- use line mode in CLI to get the full command


.mode line
SELECT Prompt FROM sniff_csv('my_file.csv');

Prompt = FROM read_csv('my_file.csv', auto_detect=false, delim=',', quote='"', escape='"', new_


line='\n', skip=0, header=true, columns={...});

12
DuckDB Documentation

Detection Steps

Dialect Detection Dialect detection works by attempting to parse the samples using the set of considered values. The detected dialect
is the dialect that has (1) a consistent number of columns for each row, and (2) the highest number of columns for each row.

The following dialects are considered for automatic dialect detection.

Parameters Considered values

delim , | ; \t
quote " ' (empty)
escape " ' \ (empty)

Consider the example file flights.csv:

FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-01|AA|New York, NY|Los Angeles, CA
1988-01-02|AA|New York, NY|Los Angeles, CA
1988-01-03|AA|New York, NY|Los Angeles, CA

In this file, the dialect detection works as follows:

• If we split by a | every row is split into 4 columns


• If we split by a , rows 2‑4 are split into 3 columns, while the first row is split into 1 column
• If we split by ;, every row is split into 1 column
• If we split by \t, every row is split into 1 column

In this example ‑ the system selects the | as the delimiter. All rows are split into the same amount of columns, and there is more than one
column per row meaning the delimiter was actually found in the CSV file.

Type Detection After detecting the dialect, the system will attempt to figure out the types of each of the columns. Note that this step
is only performed if we are calling read_csv. In case of the COPY statement the types of the table that we are copying into will be used
instead.

The type detection works by attempting to convert the values in each column to the candidate types. If the conversion is unsuccessful, the
candidate type is removed from the set of candidate types for that column. After all samples have been handled ‑ the remaining candidate
type with the highest priority is chosen. The set of considered candidate types in order of priority is given below:

Types

BOOLEAN
BIGINT
DOUBLE
TIME
DATE
TIMESTAMP
VARCHAR

Note everything can be cast to VARCHAR. This type has the lowest priority ‑ i.e., columns are converted to VARCHAR if they cannot be cast
to anything else. In flights.csv the FlightDate column will be cast to a DATE, while the other columns will be cast to VARCHAR.

13
DuckDB Documentation

The detected types can be individually overridden using the types option. This option takes either a list of types (e.g., types=[INT,
VARCHAR, DATE]) which overrides the types of the columns in‑order of occurrence in the CSV file. Alternatively, types takes a name
-> type map which overrides options of individual columns (e.g., types={'quarter': INT}).

The type detection can be entirely disabled by using the all_varchar option. If this is set all columns will remain as VARCHAR (as they
originally occur in the CSV file).

Header Detection

Header detection works by checking if the candidate header row deviates from the other rows in the file in terms of types. For example,
in flights.csv, we can see that the header row consists of only VARCHAR columns ‑ whereas the values contain a DATE value for the
FlightDate column. As such ‑ the system defines the first row as the header row and extracts the column names from the header row.

In files that do not have a header row, the column names are generated as column0, column1, etc.

Note that headers cannot be detected correctly if all columns are of type VARCHAR ‑ as in this case the system cannot distinguish the header
row from the other rows in the file. In this case the system assumes the file has no header. This can be overridden using the header
option.

Dates and Timestamps DuckDB supports the ISO 8601 format format by default for timestamps, dates and times. Unfortunately, not all
dates and times are formatted using this standard. For that reason, the CSV reader also supports the dateformat and timestampfor-
mat options. Using this format the user can specify a format string that specifies how the date or timestamp should be read.

As part of the auto‑detection, the system tries to figure out if dates and times are stored in a different representation. This is not always
possible ‑ as there are ambiguities in the representation. For example, the date 01-02-2000 can be parsed as either January 2nd or
February 1st. Often these ambiguities can be resolved. For example, if we later encounter the date 21-02-2000 then we know that the
format must have been DD-MM-YYYY. MM-DD-YYYY is no longer possible as there is no 21nd month.

If the ambiguities cannot be resolved by looking at the data the system has a list of preferences for which date format to use. If the system
choses incorrectly, the user can specify the dateformat and timestampformat options manually.

The system considers the following formats for dates (dateformat). Higher entries are chosen over lower entries in case of ambiguities
(i.e., ISO 8601 is preferred over MM-DD-YYYY).

dateformat

ISO 8601
%y-%m-%d
%Y-%m-%d
%d-%m-%y
%d-%m-%Y
%m-%d-%y
%m-%d-%Y

The system considers the following formats for timestamps (timestampformat). Higher entries are chosen over lower entries in case of
ambiguities.

timestampformat

ISO 8601
%y-%m-%d %H:%M:%S
%Y-%m-%d %H:%M:%S

14
DuckDB Documentation

timestampformat

%d-%m-%y %H:%M:%S
%d-%m-%Y %H:%M:%S
%m-%d-%y %I:%M:%S %p
%m-%d-%Y %I:%M:%S %p
%Y-%m-%d %H:%M:%S.%f

Reading Faulty CSV Files

Reading erroneous CSV files is possible by utilizing the ignore_errors option. With that option set, rows containing data that would
otherwise cause the CSV Parser to generate an error will be ignored.

Using the ignore_errors Option

For example, consider the following CSV file, faulty.csv:

Pedro,31
Oogie Boogie, three

If you read the CSV file, specifying that the first column is a VARCHAR and the second column is an INTEGER, loading the file would fail, as
the string three cannot be converted to an INTEGER.

For example, the following query will throw a casting error.

FROM read_csv('faulty.csv', columns = {'name': 'VARCHAR', 'age': 'INTEGER'});

However, with ignore_errors set, the second row of the file is skipped, outputting only the complete first row. For example:

FROM read_csv(
'faulty.csv',
columns = {'name': 'VARCHAR', 'age': 'INTEGER'},
ignore_errors = true
);

Outputs:

name age

Pedro 31

One should note that the CSV Parser is affected by the projection pushdown optimization. Hence, if we were to select only the name column,
both rows would be considered valid, as the casting error on the age would never occur. For example:

SELECT name
FROM read_csv('faulty.csv', columns = {'name': 'VARCHAR', 'age': 'INTEGER'});

Outputs:

name

Pedro
Oogie Boogie

15
DuckDB Documentation

Retrieving Faulty CSV Lines

Being able to read faulty CSV files is important, but for many data cleaning operations, it is also necessary to know exactly which lines are
corrupted and what errors the parser discovered on them. For scenarios like these, it is possible to use DuckDB's CSV Rejects Table feature.
It is important to note that the Rejects Table can only be used when ignore_errors is set, and currently, only stores casting errors and
does not save errors when the number of columns differ.

The CSV Rejects Table returns the following information:

Column name Description Type

file File path. VARCHAR


line Line number, from the CSV File, where the error occured. INTEGER
column Column number, from the CSV File, where the error occured. INTEGER
column_name Column name, from the CSV File, where the error occured. VARCHAR
parsed_value The value, where the casting error happened, in a string format. VARCHAR
recovery_columns An optional primary key of the CSV File. STRUCT {NAME:
VALUE}
error Exact error encountered by the parser. VARCHAR

Parameters

The parameters listed below are used in the read_csv function to configure the CSV Rejects Table.

Name Description Type Default

rejects_table Name of a temporary table where the information of the VARCHAR (empty)
faulty lines of a CSV file are stored.
rejects_limit Upper limit on the number of faulty records from a CSV file BIGINT 0
that will be recorded in the rejects table. 0 is used when no
limit should be applied.
rejects_recovery_ Column values that serve as a primary key to the CSV file. VARCHAR[] (empty)
columns The are stored in the CSV Rejects Table to help identify the
faulty tuples.

To store the information of the faulty CSV lines in a rejects table, the user must simply provide the rejects table name in therejects_
table option. For example:

FROM read_csv(
'faulty.csv',
columns = {'name': 'VARCHAR', 'age': 'INTEGER'},
rejects_table = 'rejects_table',
ignore_errors = true
);

You can then query the rejects_table table, to retrieve information about the rejected tuples. For example:

FROM rejects_table;

Outputs:

16
DuckDB Documentation

file line column column_name parsed_value error

faulty.csv 2 1 age three Could not convert string ' three' to 'INTEGER'

Additionally, the name column could also be provided as a primary key via the rejects_recovery_columns option to provide more
information over the faulty lines. For example:

FROM read_csv(
'faulty.csv',
columns = {'name': 'VARCHAR', 'age': 'INTEGER'},
rejects_table = 'rejects_table',
rejects_recovery_columns = '[name]',
ignore_errors = true
);

Reading from the rejects_table will return:

column_ parsed_
file line column name value recovery_columns error

faulty.csv 2 1 age three {'name': 'Oogie Could not convert string ' three' to 'INTEGER'
Boogie'}

CSV Import Tips

Below is a collection of tips to help when attempting to import complex CSV files. In the examples, we use the flights.csv file.

Override the Header Flag if the Header Is Not Correctly Detected If a file contains only string columns the header auto‑detection
might fail. Provide the header option to override this behavior.

SELECT * FROM read_csv('flights.csv', header = true);

Provide Names if the File Does Not Contain a Header If the file does not contain a header, names will be auto‑generated by default.
You can provide your own names with the names option.

SELECT * FROM read_csv('flights.csv', names = ['DateOfFlight', 'CarrierName']);

Override the Types of Specific Columns The types flag can be used to override types of only certain columns by providing a struct of
name -> type mappings.

SELECT * FROM read_csv('flights.csv', types = {'FlightDate': 'DATE'});

Use COPY When Loading Data into a Table The COPY statement copies data directly into a table. The CSV reader uses the schema of
the table instead of auto‑detecting types from the file. This speeds up the auto‑detection, and prevents mistakes from being made during
auto‑detection.

COPY tbl FROM 'test.csv';

Use union_by_name When Loading Files with Different Schemas The union_by_name option can be used to unify the schema of
files that have different or missing columns. For files that do not have certain columns, NULL values are filled in.

SELECT * FROM read_csv('flights*.csv', union_by_name = true);

17
DuckDB Documentation

JSON Files

JSON Loading

Examples

-- read a JSON file from disk, auto-infer options


SELECT * FROM 'todos.json';
-- read_json with custom options
SELECT *
FROM read_json('todos.json',
format = 'array',
columns = {userId: 'UBIGINT',
id: 'UBIGINT',
title: 'VARCHAR',
completed: 'BOOLEAN'});

-- read a JSON file from stdin, auto-infer options


cat data/json/todos.json | duckdb -c "SELECT * FROM read_json_auto('/dev/stdin')"

-- read a JSON file into a table


CREATE TABLE todos (userId UBIGINT, id UBIGINT, title VARCHAR, completed BOOLEAN);
COPY todos FROM 'todos.json';
-- alternatively, create a table without specifying the schema manually
CREATE TABLE todos AS SELECT * FROM 'todos.json';

-- write the result of a query to a JSON file


COPY (SELECT * FROM todos) TO 'todos.json';

JSON Loading

JSON is an open standard file format and data interchange format that uses human‑readable text to store and transmit data objects con‑
sisting of attribute–value pairs and arrays (or other serializable values). While it is not a very efficient format for tabular data, it is very
commonly used, especially as a data interchange format.

The DuckDB JSON reader can automatically infer which configuration flags to use by analyzing the JSON file. This will work correctly in most
situations, and should be the first option attempted. In rare situations where the JSON reader cannot figure out the correct configuration,
it is possible to manually configure the JSON reader to correctly parse the JSON file.

Below are parameters that can be passed in to the JSON reader.

Parameters

Name Description Type Default

auto_detect Whether to auto‑detect detect the names of the keys and BOOL false
data types of the values automatically
columns A struct that specifies the key names and value types STRUCT (empty)
contained within the JSON file (e.g., {key1:
'INTEGER', key2: 'VARCHAR'}). If auto_detect
is enabled these will be inferred

18
DuckDB Documentation

Name Description Type Default

compression The compression type for the file. By default this will be VARCHAR 'auto'
detected automatically from the file extension (e.g.,
t.json.gz will use gzip, t.json will use none). Options
are 'none', 'gzip', 'zstd', and 'auto'.
convert_strings_to_ Whether strings representing integer values should be BOOL false
integers converted to a numerical type.
dateformat Specifies the date format to use when parsing dates. See VARCHAR 'iso'
Date Format
filename Whether or not an extra filename column should be BOOL false
included in the result.
format Can be one of ['auto', 'unstructured', VARCHAR 'array'
'newline_delimited', 'array']
hive_partitioning Whether or not to interpret the path as a Hive partitioned BOOL false
path.
ignore_errors Whether to ignore parse errors (only possible when BOOL false
format is 'newline_delimited')
maximum_depth Maximum nesting depth to which the automatic schema BIGINT -1
detection detects types. Set to ‑1 to fully detect nested
JSON types
maximum_object_size The maximum size of a JSON object (in bytes) UINTEGER 16777216
records Can be one of ['auto', 'true', 'false'] VARCHAR 'records'
sample_size Option to define number of sample objects for automatic UBIGINT 20480
JSON type detection. Set to ‑1 to scan the entire input file
timestampformat Specifies the date format to use when parsing timestamps. VARCHAR 'iso'
See Date Format
union_by_name Whether the schema's of multiple JSON files should be BOOL false
unified.

When using read_json_auto, every parameter that supports auto‑detection is enabled.

Examples of Format Settings

The JSON extension can attempt to determine the format of a JSON file when setting format to auto. Here are some example JSON files
and the corresponding format settings that should be used.

In each of the below cases, the format setting was not needed, as DuckDB was able to infer it correctly, but it is included for illustrative
purposes. A query of this shape would work in each case:

SELECT *
FROM filename.json;

Format: newline_delimited With format = 'newline_delimited' newline‑delimited JSON can be parsed. Each line is a
JSON.

{"key1":"value1", "key2": "value1"}


{"key1":"value2", "key2": "value2"}
{"key1":"value3", "key2": "value3"}

19
DuckDB Documentation

SELECT *
FROM read_json_auto('records.json', format = 'newline_delimited');

key1 key2

value1 value1
value2 value2
value3 value3

Format: array If the JSON file contains a JSON array of objects (pretty‑printed or not), array_of_objects may be used.

[
{"key1":"value1", "key2": "value1"},
{"key1":"value2", "key2": "value2"},
{"key1":"value3", "key2": "value3"}
]

SELECT *
FROM read_json_auto('array.json', format = 'array');

key1 key2

value1 value1
value2 value2
value3 value3

Format: unstructured If the JSON file contains JSON that is not newline‑delimited or an array, unstructured may be used.

{
"key1":"value1",
"key2":"value1"
}
{
"key1":"value2",
"key2":"value2"
}
{
"key1":"value3",
"key2":"value3"
}

SELECT *
FROM read_json_auto('unstructured.json', format = 'unstructured');

key1 key2

value1 value1
value2 value2
value3 value3

20
DuckDB Documentation

Examples of Records Settings

The JSON extension can attempt to determine whether a JSON file contains records when setting records = auto. When records =
true, the JSON extension expects JSON objects, and will unpack the fields of JSON objects into individual columns.

Continuing with the same example file from before:

{"key1":"value1", "key2": "value1"}


{"key1":"value2", "key2": "value2"}
{"key1":"value3", "key2": "value3"}

SELECT *
FROM read_json_auto('records.json', records = true);

key1 key2

value1 value1
value2 value2
value3 value3

When records = false, the JSON extension will not unpack the top‑level objects, and create STRUCTs instead:

SELECT *
FROM read_json_auto('records.json', records = false);

json

{'key1': value1, 'key2': value1}


{'key1': value2, 'key2': value2}
{'key1': value3, 'key2': value3}

This is especially useful if we have non‑object JSON, for example:

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]

SELECT *
FROM read_json_auto('arrays.json', records = false);

json

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]

Writing

The contents of tables or the result of queries can be written directly to a JSON file using the COPY statement. See the COPY documentation
for more information.

21
DuckDB Documentation

read_json_auto Function

The read_json_auto is the simplest method of loading JSON files: it automatically attempts to figure out the correct configuration of
the JSON reader. It also automatically deduces types of columns.

SELECT *
FROM read_json_auto('todos.json')
LIMIT 5;

userId id title completed

1 1 delectus aut autem false


1 2 quis ut nam facilis et officia qui false
1 3 fugiat veniam minus false
1 4 et porro tempora true
1 5 laboriosam mollitia et enim quasi adipisci quia provident illum false

The path can either be a relative path (relative to the current working directory) or an absolute path.

We can use read_json_auto to create a persistent table as well:

CREATE TABLE todos AS


SELECT *
FROM read_json_auto('todos.json');
DESCRIBE todos;

column_name column_type null key default extra

userId UBIGINT YES


id UBIGINT YES
title VARCHAR YES
completed BOOLEAN YES

If we specify the columns, we can bypass the automatic detection. Note that not all columns need to be specified:

SELECT *
FROM read_json_auto('todos.json',
columns = {userId: 'UBIGINT',
completed: 'BOOLEAN'});

Multiple files can be read at once by providing a glob or a list of files. Refer to the multiple files section for more information.

COPY Statement

The COPY statement can be used to load data from a JSON file into a table. For the COPY statement, we must first create a table with the
correct schema to load the data into. We then specify the JSON file to load from plus any configuration options separately.

CREATE TABLE todos (userId UBIGINT, id UBIGINT, title VARCHAR, completed BOOLEAN);
COPY todos FROM 'todos.json';
SELECT * FROM todos LIMIT 5;

22
DuckDB Documentation

userId id title completed

1 1 delectus aut autem false


1 2 quis ut nam facilis et officia qui false
1 3 fugiat veniam minus false
1 4 et porro tempora true
1 5 laboriosam mollitia et enim quasi adipisci quia provident illum false

For more details, see the page on the COPY statement.

Multiple Files

Reading Multiple Files

DuckDB can read multiple files of different types (CSV, Parquet, JSON files) at the same time using either the glob syntax, or by providing a
list of files to read. See the combining schemas page for tips on reading files with different schemas.

CSV

-- read all files with a name ending in ".csv" in the folder "dir"
SELECT * FROM 'dir/*.csv';
-- read all files with a name ending in ".csv", two directories deep
SELECT * FROM '*/*/*.csv';
-- read all files with a name ending in ".csv", at any depth in the folder "dir"
SELECT * FROM 'dir/**/*.csv';
-- read the CSV files 'flights1.csv' and 'flights2.csv'
SELECT * FROM read_csv(['flights1.csv', 'flights2.csv']);
-- read the CSV files 'flights1.csv' and 'flights2.csv', unifying schemas by name and outputting a
`filename` column
SELECT * FROM read_csv(['flights1.csv', 'flights2.csv'], union_by_name = true, filename = true);

Parquet

-- read all files that match the glob pattern


SELECT * FROM 'test/*.parquet';
-- read 3 Parquet files and treat them as a single table
SELECT * FROM read_parquet(['file1.parquet', 'file2.parquet', 'file3.parquet']);
-- Read all Parquet files from 2 specific folders
SELECT * FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']);
-- read all Parquet files that match the glob pattern at any depth
SELECT * FROM read_parquet('dir/**/*.parquet');

Multi‑File Reads and Globs

DuckDB can also read a series of Parquet files and treat them as if they were a single table. Note that this only works if the Parquet files
have the same schema. You can specify which Parquet files you want to read using a list parameter, glob pattern matching syntax, or a
combination of both.

23
DuckDB Documentation

List Parameter The read_parquet function can accept a list of filenames as the input parameter.

-- read 3 Parquet files and treat them as a single table


SELECT * FROM read_parquet(['file1.parquet', 'file2.parquet', 'file3.parquet']);

Glob Syntax Any file name input to the read_parquet function can either be an exact filename, or use a glob syntax to read multiple files
that match a pattern.

Wildcard Description

* matches any number of any characters (including none)


** matches any number of subdirectories (including none)
? matches any single character
[abc] matches one character given in the bracket
[a-z] matches one character from the range given in the bracket

Note that the ? wildcard in globs is not supported for reads over S3 due to HTTP encoding issues.

Here is an example that reads all the files that end with .parquet located in the test folder:

-- read all files that match the glob pattern


SELECT * FROM read_parquet('test/*.parquet');

List of Globs The glob syntax and the list input parameter can be combined to scan files that meet one of multiple patterns.

-- Read all Parquet files from 2 specific folders


SELECT * FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']);

DuckDB can read multiple CSV files at the same time using either the glob syntax, or by providing a list of files to read.

Filename

The filename argument can be used to add an extra filename column to the result that indicates which row came from which file. For
example:

SELECT * FROM read_csv(['flights1.csv', 'flights2.csv'], union_by_name = true, filename = true);

FlightDate OriginCityName DestCityName UniqueCarrier filename

1988‑01‑01 New York, NY Los Angeles, CA NULL flights1.csv


1988‑01‑02 New York, NY Los Angeles, CA NULL flights1.csv
1988‑01‑03 New York, NY Los Angeles, CA AA flights2.csv

Glob Function to Find Filenames

The glob pattern matching syntax can also be used to search for filenames using the glob table function. It accepts one parameter: the
path to search (which may include glob patterns).

-- Search the current directory for all files


SELECT * FROM glob('*');

24
DuckDB Documentation

file

duckdb.exe
test.csv
test.json
test.parquet
test2.csv
test2.parquet
todos.json

Combining Schemas

Examples

-- read a set of CSV files combining columns by position


SELECT * FROM read_csv('flights*.csv');
-- read a set of CSV files combining columns by name
SELECT * FROM read_csv('flights*.csv', union_by_name = true);

Combining Schemas

When reading from multiple files, we have to combine schemas from those files. That is because each file has its own schema that can
differ from the other files. DuckDB offers two ways of unifying schemas of multiple files: by column position and by column name.

By default, DuckDB reads the schema of the first file provided, and then unifies columns in subsequent files by column position. This works
correctly as long as all files have the same schema. If the schema of the files differs, you might want to use the union_by_name option
to allow DuckDB to construct the schema by reading all of the names instead.

Below is an example of how both methods work.

Union by Position

By default, DuckDB unifies the columns of these different files by position. This means that the first column in each file is combined
together, as well as the second column in each file, etc. For example, consider the following two files.

flights1.csv:

FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-01|AA|New York, NY|Los Angeles, CA
1988-01-02|AA|New York, NY|Los Angeles, CA

flights2.csv:

FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-03|AA|New York, NY|Los Angeles, CA

Reading the two files at the same time will produce the following result set:

FlightDate UniqueCarrier OriginCityName DestCityName

1988‑01‑01 AA New York, NY Los Angeles, CA


1988‑01‑02 AA New York, NY Los Angeles, CA

25
DuckDB Documentation

FlightDate UniqueCarrier OriginCityName DestCityName

1988‑01‑03 AA New York, NY Los Angeles, CA

This is equivalent to the SQL construct UNION ALL.

Union by Name

If you are processing multiple files that have different schemas, perhaps because columns have been added or renamed, it might be de‑
sirable to unify the columns of different files by name instead. This can be done by providing the union_by_name option. For example,
consider the following two files, where flights4.csv has an extra column (UniqueCarrier).

flights3.csv:

FlightDate|OriginCityName|DestCityName
1988-01-01|New York, NY|Los Angeles, CA
1988-01-02|New York, NY|Los Angeles, CA

flights4.csv:

FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-03|AA|New York, NY|Los Angeles, CA

Reading these when unifying column names by position results in an error ‑ as the two files have a different number of columns. When
specifying the union_by_name option, the columns are correctly unified, and any missing values are set to NULL.

SELECT * FROM read_csv(['flights3.csv', 'flights4.csv'], union_by_name = true);

FlightDate OriginCityName DestCityName UniqueCarrier

1988‑01‑01 New York, NY Los Angeles, CA NULL


1988‑01‑02 New York, NY Los Angeles, CA NULL
1988‑01‑03 New York, NY Los Angeles, CA AA

This is equivalent to the SQL construct UNION ALL BY NAME.

Parquet Files

Reading and Writing Parquet Files

Examples

-- read a single Parquet file


SELECT * FROM 'test.parquet';
-- figure out which columns/types are in a Parquet file
DESCRIBE SELECT * FROM 'test.parquet';
-- create a table from a Parquet file
CREATE TABLE test AS SELECT * FROM 'test.parquet';
-- if the file does not end in ".parquet", use the read_parquet function
SELECT * FROM read_parquet('test.parq');
-- use list parameter to read 3 Parquet files and treat them as a single table
SELECT * FROM read_parquet(['file1.parquet', 'file2.parquet', 'file3.parquet']);
-- read all files that match the glob pattern

26
DuckDB Documentation

SELECT * FROM 'test/*.parquet';


-- read all files that match the glob pattern, and include a "filename" column
-- that specifies which file each row came from
SELECT * FROM read_parquet('test/*.parquet', filename = true);
-- use a list of globs to read all Parquet files from 2 specific folders
SELECT * FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']);
-- read over https
SELECT * FROM read_parquet('https://some.url/some_file.parquet');
-- query the metadata of a Parquet file
SELECT * FROM parquet_metadata('test.parquet');
-- query the schema of a Parquet file
SELECT * FROM parquet_schema('test.parquet');

-- write the results of a query to a Parquet file using the default compression (Snappy)
COPY
(SELECT * FROM tbl)
TO 'result-snappy.parquet'
(FORMAT 'parquet');

-- write the results from a query to a Parquet file with specific compression and row group size
COPY
(FROM generate_series(100_000))
TO 'test.parquet'
(FORMAT 'parquet', COMPRESSION 'zstd', ROW_GROUP_SIZE 100_000);

-- export the table contents of the entire database as parquet


EXPORT DATABASE 'target_directory' (FORMAT PARQUET);

Parquet Files

Parquet files are compressed columnar files that are efficient to load and process. DuckDB provides support for both reading and writing
Parquet files in an efficient manner, as well as support for pushing filters and projections into the Parquet file scans.

Note. Parquet data sets differ based on the number of files, the size of individual files, the compression algorithm used row group
size, etc. These have a significant effect on performance. Please consult the Performance Guide for details.

read_parquet Function

Function Description Example

read_parquet( path(s), Read Parquet file(s) SELECT * FROM read_parquet('test.parquet');


*)
parquet_scan( path(s), Alias for read_parquet SELECT * FROM parquet_scan('test.parquet');
*)

If your file ends in .parquet, the function syntax is optional. The system will automatically infer that you are reading a Parquet file.

SELECT * FROM 'test.parquet';

Multiple files can be read at once by providing a glob or a list of files. Refer to the multiple files section for more information.

Parameters There are a number of options exposed that can be passed to the read_parquet function or the COPY statement.

27
DuckDB Documentation

Name Description Type Default

binary_as_string Parquet files generated by legacy writers do not correctly BOOL false
set the UTF8 flag for strings, causing string columns to be
loaded as BLOB instead. Set this to true to load binary
columns as strings.
encryption_config Configuration for Parquet encryption. STRUCT ‑
filename Whether or not an extra filename column should be BOOL false
included in the result.
file_row_number Whether or not to include the file_row_number BOOL false
column.
hive_partitioning Whether or not to interpret the path as a Hive partitioned BOOL false
path.
union_by_name Whether the columns of multiple schemas should be BOOL false
unified by name, rather than by position.

Partial Reading

DuckDB supports projection pushdown into the Parquet file itself. That is to say, when querying a Parquet file, only the columns required
for the query are read. This allows you to read only the part of the Parquet file that you are interested in. This will be done automatically
by DuckDB.

DuckDB also supports filter pushdown into the Parquet reader. When you apply a filter to a column that is scanned from a Parquet file,
the filter will be pushed down into the scan, and can even be used to skip parts of the file using the built‑in zonemaps. Note that this will
depend on whether or not your Parquet file contains zonemaps.

Filter and projection pushdown provide significant performance benefits. See our blog post on this for more information.

Inserts and Views

You can also insert the data into a table or create a table from the Parquet file directly. This will load the data from the Parquet file and
insert it into the database.

-- insert the data from the Parquet file in the table


INSERT INTO people SELECT * FROM read_parquet('test.parquet');
-- create a table directly from a Parquet file
CREATE TABLE people AS SELECT * FROM read_parquet('test.parquet');

If you wish to keep the data stored inside the Parquet file, but want to query the Parquet file directly, you can create a view over the read_
parquet function. You can then query the Parquet file as if it were a built‑in table.

-- create a view over the Parquet file


CREATE VIEW people AS SELECT * FROM read_parquet('test.parquet');
-- query the Parquet file
SELECT * FROM people;

Writing to Parquet Files

DuckDB also has support for writing to Parquet files using the COPY statement syntax. See the COPY Statement page for details, including
all possible parameters for the COPY statement.

-- write a query to a snappy compressed Parquet file


COPY

28
DuckDB Documentation

(SELECT * FROM tbl)


TO 'result-snappy.parquet'
(FORMAT 'parquet')

-- write "tbl" to a zstd compressed Parquet file


COPY tbl
TO 'result-zstd.parquet'
(FORMAT 'parquet', CODEC 'zstd')

-- write a CSV file to an uncompressed Parquet file


COPY
'test.csv'
TO 'result-uncompressed.parquet'
(FORMAT 'parquet', CODEC 'uncompressed')

-- write a query to a Parquet file with ZSTD compression (same as CODEC) and row_group_size
COPY
(FROM generate_series(100_000))
TO 'row-groups-zstd.parquet'
(FORMAT PARQUET, COMPRESSION ZSTD, ROW_GROUP_SIZE 100_000);

DuckDB's EXPORT command can be used to export an entire database to a series of Parquet files. See the Export statement documentation
for more details.

-- export the table contents of the entire database as parquet


EXPORT DATABASE 'target_directory' (FORMAT PARQUET);

Encryption

DuckDB supports reading and writing encrypted Parquet files.

Installing and Loading the Parquet Extension

The support for Parquet files is enabled via extension. The parquet extension is bundled with almost all clients. However, if your client
does not bundle the parquet extension, the extension must be installed and loaded separately.

INSTALL parquet;
LOAD parquet;

Querying Parquet Metadata

Parquet Metadata

The parquet_metadata function can be used to query the metadata contained within a Parquet file, which reveals various internal
details of the Parquet file such as the statistics of the different columns. This can be useful for figuring out what kind of skipping is possible
in Parquet files, or even to obtain a quick overview of what the different columns contain.

SELECT *
FROM parquet_metadata('test.parquet');

Below is a table of the columns returned by parquet_metadata.

Field Type

file_name VARCHAR
row_group_id BIGINT

29
DuckDB Documentation

Field Type

row_group_num_rows BIGINT
row_group_num_columns BIGINT
row_group_bytes BIGINT
column_id BIGINT
file_offset BIGINT
num_values BIGINT
path_in_schema VARCHAR
type VARCHAR
stats_min VARCHAR
stats_max VARCHAR
stats_null_count BIGINT
stats_distinct_count BIGINT
stats_min_value VARCHAR
stats_max_value VARCHAR
compression VARCHAR
encodings VARCHAR
index_page_offset BIGINT
dictionary_page_offset BIGINT
data_page_offset BIGINT
total_compressed_size BIGINT
total_uncompressed_size BIGINT
key_value_metadata MAP(BLOB, BLOB)

Parquet Schema

The parquet_schema function can be used to query the internal schema contained within a Parquet file. Note that this is the schema
as it is contained within the metadata of the Parquet file. If you want to figure out the column names and types contained within a Parquet
file it is easier to use DESCRIBE.

-- fetch the column names and column types


DESCRIBE SELECT * FROM 'test.parquet';
-- fetch the internal schema of a Parquet file
SELECT *
FROM parquet_schema('test.parquet');

Below is a table of the columns returned by parquet_schema.

Field Type

file_name VARCHAR
name VARCHAR
type VARCHAR
type_length VARCHAR
repetition_type VARCHAR

30
DuckDB Documentation

Field Type

num_children BIGINT
converted_type VARCHAR
scale BIGINT
precision BIGINT
field_id BIGINT
logical_type VARCHAR

Parquet File Metadata

The parquet_file_metadata function can be used to query file‑level metadata such as the format version and the encryption algo‑
rithm used.

SELECT *
FROM parquet_file_metadata('test.parquet');

Below is a table of the columns returned by parquet_file_metadata.

Field Type

file_name VARCHAR
created_by VARCHAR
num_rows BIGINT
num_row_groups BIGINT
format_version BIGINT
encryption_algorithm VARCHAR
footer_signing_key_metadata VARCHAR

Parquet Key‑Value Metadata

The parquet_kv_metadata function can be used to query custom metadata defined as key‑value pairs.

SELECT *
FROM parquet_kv_metadata('test.parquet');

Below is a table of the columns returned by parquet_kv_metadata.

Field Type

file_name VARCHAR
key BLOB
value BLOB

Parquet Encryption

Starting with version 0.10.0, DuckDB supports reading and writing encrypted Parquet files. DuckDB broadly follows the Parquet Modular
Encryption specification with some limitations.

31
DuckDB Documentation

Reading and Writing Encrypted Files

Using the PRAGMA add_parquet_key function, named encryption keys of 128, 192, or 256 bits can be added to a session. These keys
are stored in‑memory.

PRAGMA add_parquet_key('key128', '0123456789112345');


PRAGMA add_parquet_key('key192', '012345678911234501234567');
PRAGMA add_parquet_key('key256', '01234567891123450123456789112345');

Writing Encrypted Parquet Files After specifying the key (e.g., key256), files can be encrypted as follows:

COPY tbl TO 'tbl.parquet' (ENCRYPTION_CONFIG {footer_key: 'key256'});

Reading Encrpyted Parquet Files An encrypted Parquet file using a specific key (e.g., key256), can then be read as follows:

COPY tbl FROM 'tbl.parquet' (ENCRYPTION_CONFIG {footer_key: 'key256'});

Or:

SELECT *
FROM read_parquet('tbl.parquet', encryption_config = {footer_key: 'key256'});

Limitations

DuckDB's Parquet encryption currently has the following limitations.

1. It is not compatible with the encryption of, e.g., PyArrow, until the missing details are implemented.

2. DuckDB encrypts the footer and all columns using the footer_key. The Parquet specification allows encryption of individual
columns with different keys, e.g.:

COPY tbl TO 'tbl.parquet'


(ENCRYPTION_CONFIG {
footer_key: 'key256',
column_keys: {key256: ['col0', 'col1']}
});

However, this is unsupported at the moment and will cause an error to be thrown (for now):

Not implemented Error: Parquet encryption_config column_keys not yet implemented

Performance Implications

Note that encryption has some performance implications. Without encryption, reading/writing the lineitem table from TPC-H at SF1,
which is 6M rows and 15 columns, from/to a Parquet file takes 0.26 and 0.99 seconds, respectively. With encryption, this takes 0.64 and 2.21
seconds, both approximately 2.5× slower than the unencrypted version.

Parquet Tips

Below is a collection of tips to help when dealing with Parquet files.

Tips for Reading Parquet Files

Use union_by_name When Loading Files with Different Schemas The union_by_name option can be used to unify the schema of
files that have different or missing columns. For files that do not have certain columns, NULL values are filled in.

SELECT *
FROM read_parquet('flights*.parquet', union_by_name = true);

32
DuckDB Documentation

Tips for Writing Parquet Files

Enabling PER_THREAD_OUTPUT If the final number of Parquet files is not important, writing one file per thread can significantly im‑
prove performance. Using a glob pattern upon read or a Hive partitioning structure are good ways to transparently handle multiple files.

COPY
(FROM generate_series(10_000_000))
TO 'test.parquet'
(FORMAT PARQUET, PER_THREAD_OUTPUT true);

Selecting a ROW_GROUP_SIZE The ROW_GROUP_SIZE parameter specifies the minimum number of rows in a Parquet row group, with
a minimum value equal to DuckDB's vector size (currently 2048, but adjustable when compiling DuckDB), and a default of 122,880. A Parquet
row group is a partition of rows, consisting of a column chunk for each column in the dataset.

Compression algorithms are only applied per row group, so the larger the row group size, the more opportunities to compress the data.
DuckDB can read Parquet row groups in parallel even within the same file and uses predicate pushdown to only scan the row groups whose
metadata ranges match the WHERE clause of the query. However there is some overhead associated with reading the metadata in each
group. A good approach would be to ensure that within each file, the total number of row groups is at least as large as the number of CPU
threads used to query that file. More row groups beyond the thread count would improve the speed of highly selective queries, but slow
down queries that must scan the whole file like aggregations.

-- write a query to a Parquet file with a different row_group_size


COPY
(FROM generate_series(100_000))
TO 'row-groups.parquet'
(FORMAT PARQUET, ROW_GROUP_SIZE 100_000);

See the Performance Guide on file formats for more tips.

Partitioning

Hive Partitioning

Examples

-- read data from a Hive partitioned data set


SELECT * FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true);
-- write a table to a Hive partitioned data set
COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month));

Hive Partitioning

Hive partitioning is a partitioning strategy that is used to split a table into multiple files based on partition keys. The files are organized
into folders. Within each folder, the partition key has a value that is determined by the name of the folder.

Below is an example of a Hive partitioned file hierarchy. The files are partitioned on two keys (year and month).

orders
├── year=2021
│ ├── month=1
│ │ ├── file1.parquet
│ │ └── file2.parquet
│ └── month=2
│ └── file3.parquet
└── year=2022
├── month=11

33
DuckDB Documentation

│ ├── file4.parquet
│ └── file5.parquet
└── month=12
└── file6.parquet

Files stored in this hierarchy can be read using the hive_partitioning flag.

SELECT *
FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true);

When we specify the hive_partitioning flag, the values of the columns will be read from the directories.

Filter Pushdown Filters on the partition keys are automatically pushed down into the files. This way the system skips reading files that
are not necessary to answer a query. For example, consider the following query on the above dataset:

SELECT *
FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true)
WHERE year = 2022 AND month = 11;

When executing this query, only the following files will be read:

orders
└── year=2022
└── month=11
├── file4.parquet
└── file5.parquet

Autodetection By default the system tries to infer if the provided files are in a hive partitioned hierarchy. And if so, the hive_
partitioning flag is enabled automatically. The autodetection will look at the names of the folders and search for a 'key' =
'value' pattern. This behaviour can be overridden by setting the hive_partitioning flag manually.

Hive Types hive_types is a way to specify the logical types of the hive partitions in a struct:

SELECT *
FROM read_parquet(
'dir/**/*.parquet',
hive_partitioning = true,
hive_types = {'release': DATE, 'orders': BIGINT}
);

hive_types will be autodetected for the following types: DATE, TIMESTAMP and BIGINT. To switch off the autodetection, the flag
hive_types_autocast = 0 can be set.

Writing Partitioned Files See the Partitioned Writes section.

Partitioned Writes

Examples

-- write a table to a Hive partitioned data set of Parquet files


COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month));
-- write a table to a Hive partitioned data set of CSV files, allowing overwrites
COPY orders TO 'orders' (FORMAT CSV, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE 1);

34
DuckDB Documentation

Partitioned Writes

When the partition_by clause is specified for the COPY statement, the files are written in a Hive partitioned folder hierarchy. The target
is the name of the root directory (in the example above: orders). The files are written in‑order in the file hierarchy. Currently, one file is
written per thread to each directory.

orders
├── year=2021
│ ├── month=1
│ │ ├── data_1.parquet
│ │ └── data_2.parquet
│ └── month=2
│ └── data_1.parquet
└── year=2022
├── month=11
│ ├── data_1.parquet
│ └── data_2.parquet
└── month=12
└── data_1.parquet

The values of the partitions are automatically extracted from the data. Note that it can be very expensive to write many partitions as many
files will be created. The ideal partition count depends on how large your data set is.

Note. Bestpractice Writing data into many small partitions is expensive. It is generally recommended to have at least 100MB of
data per partition.

Overwriting By default the partitioned write will not allow overwriting existing directories. Use the OVERWRITE_OR_IGNORE option
to allow overwriting an existing directory.

Filename Pattern By default, files will be named data_0.parquet or data_0.csv. With the flag FILENAME_PATTERN a pattern
with {i} or {uuid} can be defined to create specific filenames:

• {i} will be replaced by an index


• {uuid} will be replaced by a 128 bits long UUID

-- write a table to a Hive partitioned data set of .parquet files, with an index in the filename
COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_
PATTERN "orders_{i}");
-- write a table to a Hive partitioned data set of .parquet files, with unique filenames
COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_
PATTERN "file_{uuid}");

Appender

The Appender can be used to load bulk data into a DuckDB database. It is currently available in the C, C++, Go, Java, and Rust APIs. The
Appender is tied to a connection, and will use the transaction context of that connection when appending. An Appender always appends
to a single table in the database file.

In the C++ API, the Appender works as follows:

DuckDB db;
Connection con(db);
// create the table
con.Query("CREATE TABLE people (id INTEGER, name VARCHAR)");
// initialize the appender
Appender appender(con, "people");

35
DuckDB Documentation

The AppendRow function is the easiest way of appending data. It uses recursive templates to allow you to put all the values of a single row
within one function call, as follows:

appender.AppendRow(1, "Mark");

Rows can also be individually constructed using the BeginRow, EndRow and Append methods. This is done internally by AppendRow,
and hence has the same performance characteristics.

appender.BeginRow();
appender.Append<int32_t>(2);
appender.Append<string>("Hannes");
appender.EndRow();

Any values added to the appender are cached prior to being inserted into the database system for performance reasons. That means that,
while appending, the rows might not be immediately visible in the system. The cache is automatically flushed when the appender goes
out of scope or when appender.Close() is called. The cache can also be manually flushed using the appender.Flush() method.
After either Flush or Close is called, all the data has been written to the database system.

Date, Time and Timestamps

While numbers and strings are rather self‑explanatory, dates, times and timestamps require some explanation. They can be directly ap‑
pended using the methods provided by duckdb::Date, duckdb::Time or duckdb::Timestamp. They can also be appended using
the internal duckdb::Value type, however, this adds some additional overheads and should be avoided if possible.

Below is a short example:

con.Query("CREATE TABLE dates (d DATE, t TIME, ts TIMESTAMP)");


Appender appender(con, "dates");

// construct the values using the Date/Time/Timestamp types


// (this is the most efficient approach)
appender.AppendRow(
Date::FromDate(1992, 1, 1),
Time::FromTime(1, 1, 1, 0),
Timestamp::FromDatetime(Date::FromDate(1992, 1, 1), Time::FromTime(1, 1, 1, 0))
);
// construct duckdb::Value objects
appender.AppendRow(
Value::DATE(1992, 1, 1),
Value::TIME(1, 1, 1, 0),
Value::TIMESTAMP(1992, 1, 1, 1, 1, 1, 0)
);

Handling Constraint Violations

If the appender encounters a PRIMARY KEY conflict or a UNIQUE constraint violation, it fails and returns the following error:

Constraint Error: PRIMARY KEY or UNIQUE constraint violated: duplicate key "..."

In this case, the entire append operation fails and no rows are inserted.

Appender Support in Other Clients

The appender is also available in the following client APIs:

• C
• Go
• JDBC (Java)
• Rust

36
DuckDB Documentation

INSERT Statements

INSERT statements are the standard way of loading data into a relational database. When using INSERT statements, the values are
supplied row‑by‑row. While simple, there is significant overhead involved in parsing and processing individual INSERT statements. This
makes lots of individual row‑by‑row insertions very inefficient for bulk insertion.

Note. Bestpractice As a rule‑of‑thumb, avoid using lots of individual row‑by‑row INSERT statements when inserting more than a
few rows (i.e., avoid using INSERT statements as part of a loop). When bulk inserting data, try to maximize the amount of data that
is inserted per statement.

If you must use INSERT statements to load data in a loop, avoid executing the statements in auto‑commit mode. After every commit,
the database is required to sync the changes made to disk to ensure no data is lost. In auto‑commit mode every single statement will be
wrapped in a separate transaction, meaning fsync will be called for every statement. This is typically unnecessary when bulk loading and
will significantly slow down your program.

Note. If you absolutely must use INSERT statements in a loop to load data, wrap them in calls to BEGIN TRANSACTION and
COMMIT.

Syntax

An example of using INSERT INTO to load data in a table is as follows:

CREATE TABLE people (id INTEGER, name VARCHAR);


INSERT INTO people VALUES (1, 'Mark'), (2, 'Hannes');

For a more detailed description together with syntax diagram can be found, see the page on the INSERT statement.

37
DuckDB Documentation

38
Client APIs

Client APIs Overview

There are various client APIs for DuckDB:

• C
• C++
• Go by marcboeker
• Java
• Julia
• Node.js
• Python
• R
• Rust
• WebAssembly/Wasm
• ADBC API
• ODBC API

Additionally, there is a standalone Command Line Interface (CLI) client.

There are also contributed third‑party DuckDB wrappers, which currently do not have an official documentation page:

• C# by Giorgi
• Common Lisp by ak‑coram
• Crystal by amauryt
• Ruby by suketa
• Zig by karlseguin

Overview

DuckDB implements a custom C API modelled somewhat following the SQLite C API. The API is contained in the duckdb.h header. Con‑
tinue to Startup & Shutdown to get started, or check out the Full API overview.

We also provide a SQLite API wrapper which means that if your applications is programmed against the SQLite C API, you can re‑link to
DuckDB and it should continue working. See the sqlite_api_wrapper folder in our source repository for more information.

Installation

The DuckDB C API can be installed as part of the libduckdb packages. Please see the installation page for details.

Startup & Shutdown

To use DuckDB, you must first initialize a duckdb_database handle using duckdb_open(). duckdb_open() takes as parameter the
database file to read and write from. The special value NULL (nullptr) can be used to create an in‑memory database. Note that for an
in‑memory database no data is persisted to disk (i.e., all data is lost when you exit the process).

39
DuckDB Documentation

With the duckdb_database handle, you can create one or many duckdb_connection using duckdb_connect(). While individual
connections are thread‑safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection
to allow for the best parallel performance.

All duckdb_connections have to explicitly be disconnected with duckdb_disconnect() and the duckdb_database has to be
explicitly closed with duckdb_close() to avoid memory and file handle leaking.

Example

duckdb_database db;
duckdb_connection con;

if (duckdb_open(NULL, &db) == DuckDBError) {


// handle error
}
if (duckdb_connect(db, &con) == DuckDBError) {
// handle error
}

// run queries...

// cleanup
duckdb_disconnect(&con);
duckdb_close(&db);

API Reference

duckdb_state duckdb_open(const char *path, duckdb_database *out_database);


duckdb_state duckdb_open_ext(const char *path, duckdb_database *out_database, duckdb_config config, char
**out_error);
void duckdb_close(duckdb_database *database);
duckdb_state duckdb_connect(duckdb_database database, duckdb_connection *out_connection);
void duckdb_interrupt(duckdb_connection connection);
duckdb_query_progress_type duckdb_query_progress(duckdb_connection connection);
void duckdb_disconnect(duckdb_connection *connection);
const char *duckdb_library_version();

duckdb_open Creates a new database or opens an existing database file stored at the given path. If no path is given a new in‑memory
database is created instead. The instantiated database should be closed with 'duckdb_close'.

Syntax

duckdb_state duckdb_open(
const char *path,
duckdb_database *out_database
);

Parameters

• path

Path to the database file on disk, or nullptr or :memory: to open an in‑memory database.

• out_database

The result database object.

40
DuckDB Documentation

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_open_ext Extended version of duckdb_open. Creates a new database or opens an existing database file stored at the given
path. The instantiated database should be closed with 'duckdb_close'.

Syntax
duckdb_state duckdb_open_ext(
const char *path,
duckdb_database *out_database,
duckdb_config config,
char **out_error
);

Parameters

• path

Path to the database file on disk, or nullptr or :memory: to open an in‑memory database.

• out_database

The result database object.

• config

(Optional) configuration used to start up the database system.

• out_error

If set and the function returns DuckDBError, this will contain the reason why the start‑up failed. Note that the error must be freed using
duckdb_free.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_close Closes the specified database and de‑allocates all memory allocated for that database. This should be called after you
are done with any database allocated through duckdb_open or duckdb_open_ext. Note that failing to call duckdb_close (in case
of e.g., a program crash) will not cause data corruption. Still, it is recommended to always correctly close a database object after you are
done with it.

Syntax
void duckdb_close(
duckdb_database *database
);

Parameters

• database

The database object to shut down.

duckdb_connect Opens a connection to a database. Connections are required to query the database, and store transactional state
associated with the connection. The instantiated connection should be closed using 'duckdb_disconnect'.

41
DuckDB Documentation

Syntax

duckdb_state duckdb_connect(
duckdb_database database,
duckdb_connection *out_connection
);

Parameters

• database

The database file to connect to.

• out_connection

The result connection object.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_interrupt Interrupt running query

Syntax

void duckdb_interrupt(
duckdb_connection connection
);

Parameters

• connection

The connection to interrupt

duckdb_query_progress Get progress of the running query

Syntax

duckdb_query_progress_type duckdb_query_progress(
duckdb_connection connection
);

Parameters

• connection

The working connection

• returns

‑1 if no progress or a percentage of the progress

duckdb_disconnect Closes the specified connection and de‑allocates all memory allocated for that connection.

42
DuckDB Documentation

Syntax

void duckdb_disconnect(
duckdb_connection *connection
);

Parameters

• connection

The connection to close.

duckdb_library_version Returns the version of the linked DuckDB, with a version postfix for dev versions

Usually used for developing C extensions that must return this for a compatibility check.

Syntax

const char *duckdb_library_version(

);

Configuration

Configuration options can be provided to change different settings of the database system. Note that many of these settings can be changed
later on using PRAGMA statements as well. The configuration object should be created, filled with values and passed to duckdb_open_
ext.

Example

duckdb_database db;
duckdb_config config;

// create the configuration object


if (duckdb_create_config(&config) == DuckDBError) {
// handle error
}
// set some configuration options
duckdb_set_config(config, "access_mode", "READ_WRITE"); // or READ_ONLY
duckdb_set_config(config, "threads", "8");
duckdb_set_config(config, "max_memory", "8GB");
duckdb_set_config(config, "default_order", "DESC");

// open the database using the configuration


if (duckdb_open_ext(NULL, &db, config, NULL) == DuckDBError) {
// handle error
}
// cleanup the configuration object
duckdb_destroy_config(&config);

// run queries...

// cleanup
duckdb_close(&db);

43
DuckDB Documentation

API Reference

duckdb_state duckdb_create_config(duckdb_config *out_config);


size_t duckdb_config_count();
duckdb_state duckdb_get_config_flag(size_t index, const char **out_name, const char **out_description);
duckdb_state duckdb_set_config(duckdb_config config, const char *name, const char *option);
void duckdb_destroy_config(duckdb_config *config);

duckdb_create_config Initializes an empty configuration object that can be used to provide start‑up options for the DuckDB in‑
stance through duckdb_open_ext. The duckdb_config must be destroyed using 'duckdb_destroy_config'

This will always succeed unless there is a malloc failure.

Syntax
duckdb_state duckdb_create_config(
duckdb_config *out_config
);

Parameters

• out_config

The result configuration object.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_config_count This returns the total amount of configuration options available for usage with duckdb_get_config_
flag.

This should not be called in a loop as it internally loops over all the options.

Syntax
size_t duckdb_config_count(

);

Parameters

• returns

The amount of config options available.

duckdb_get_config_flag Obtains a human‑readable name and description of a specific configuration option. This can be used to
e.g. display configuration options. This will succeed unless index is out of range (i.e., >= duckdb_config_count).

The result name or description MUST NOT be freed.

Syntax
duckdb_state duckdb_get_config_flag(
size_t index,
const char **out_name,
const char **out_description
);

44
DuckDB Documentation

Parameters

• index

The index of the configuration option (between 0 and duckdb_config_count)

• out_name

A name of the configuration flag.

• out_description

A description of the configuration flag.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_set_config Sets the specified option for the specified configuration. The configuration option is indicated by name. To
obtain a list of config options, see duckdb_get_config_flag.

In the source code, configuration options are defined in config.cpp.

This can fail if either the name is invalid, or if the value provided for the option is invalid.

Syntax

duckdb_state duckdb_set_config(
duckdb_config config,
const char *name,
const char *option
);

Parameters

• duckdb_config

The configuration object to set the option on.

• name

The name of the configuration flag to set.

• option

The value to set the configuration flag to.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_destroy_config Destroys the specified configuration object and de‑allocates all memory allocated for the object.

Syntax

void duckdb_destroy_config(
duckdb_config *config
);

45
DuckDB Documentation

Parameters

• config

The configuration object to destroy.

Query

The duckdb_query method allows SQL queries to be run in DuckDB from C. This method takes two parameters, a (null‑terminated) SQL
query string and a duckdb_result result pointer. The result pointer may be NULL if the application is not interested in the result set
or if the query produces no result. After the result is consumed, the duckdb_destroy_result method should be used to clean up the
result.

Elements can be extracted from the duckdb_result object using a variety of methods. The duckdb_column_count and duckdb_
row_count methods can be used to extract the number of columns and the number of rows, respectively. duckdb_column_name and
duckdb_column_type can be used to extract the names and types of individual columns.

Example

duckdb_state state;
duckdb_result result;

// create a table
state = duckdb_query(con, "CREATE TABLE integers (i INTEGER, j INTEGER);", NULL);
if (state == DuckDBError) {
// handle error
}
// insert three rows into the table
state = duckdb_query(con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);", NULL);
if (state == DuckDBError) {
// handle error
}
// query rows again
state = duckdb_query(con, "SELECT * FROM integers", &result);
if (state == DuckDBError) {
// handle error
}
// handle the result
// ...

// destroy the result after we are done with it


duckdb_destroy_result(&result);

Value Extraction

Values can be extracted using either the duckdb_column_data/duckdb_nullmask_data functions, or using the duckdb_value
convenience functions. The duckdb_column_data/duckdb_nullmask_data functions directly hand you a pointer to the result
arrays in columnar format, and can therefore be very fast. The duckdb_value functions perform bounds‑ and type‑checking, and will
automatically cast values to the desired type. This makes them more convenient and easier to use, at the expense of being slower.

See the Types page for more information.

Note. For optimal performance, use duckdb_column_data and duckdb_nullmask_data to extract data from the query
result. The duckdb_value functions perform internal type‑checking, bounds‑checking and casting which makes them slower.

46
DuckDB Documentation

duckdb_value Below is an example that prints the above result to CSV format using the duckdb_value_varchar function. Note
that the function is generic: we do not need to know about the types of the individual result columns.

// print the above result to CSV format using `duckdb_value_varchar`


idx_t row_count = duckdb_row_count(&result);
idx_t column_count = duckdb_column_count(&result);
for(idx_t row = 0; row < row_count; row++) {
for(idx_t col = 0; col < column_count; col++) {
if (col > 0) printf(",");
auto str_val = duckdb_value_varchar(&result, col, row);
printf("%s", str_val);
duckdb_free(str_val);
}
printf("\n");
}

duckdb_column_data Below is an example that prints the above result to CSV format using the duckdb_column_data function.
Note that the function is NOT generic: we do need to know exactly what the types of the result columns are.

int32_t *i_data = (int32_t *) duckdb_column_data(&result, 0);


int32_t *j_data = (int32_t *) duckdb_column_data(&result, 1);
bool *i_mask = duckdb_nullmask_data(&result, 0);
bool *j_mask = duckdb_nullmask_data(&result, 1);
idx_t row_count = duckdb_row_count(&result);
for(idx_t row = 0; row < row_count; row++) {
if (i_mask[row]) {
printf("NULL");
} else {
printf("%d", i_data[row]);
}
printf(",");
if (j_mask[row]) {
printf("NULL");
} else {
printf("%d", j_data[row]);
}
printf("\n");
}

Note. Warning When using duckdb_column_data, be careful that the type matches exactly what you expect it to be. As the code
directly accesses an internal array, there is no type‑checking. Accessing a DUCKDB_TYPE_INTEGER column as if it was a DUCKDB_
TYPE_BIGINT column will provide unpredictable results!

API Reference

duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result);


void duckdb_destroy_result(duckdb_result *result);
const char *duckdb_column_name(duckdb_result *result, idx_t col);
duckdb_type duckdb_column_type(duckdb_result *result, idx_t col);
duckdb_statement_type duckdb_result_statement_type(duckdb_result result);
duckdb_logical_type duckdb_column_logical_type(duckdb_result *result, idx_t col);
idx_t duckdb_column_count(duckdb_result *result);
idx_t duckdb_row_count(duckdb_result *result);
idx_t duckdb_rows_changed(duckdb_result *result);
void *duckdb_column_data(duckdb_result *result, idx_t col);
bool *duckdb_nullmask_data(duckdb_result *result, idx_t col);
const char *duckdb_result_error(duckdb_result *result);

47
DuckDB Documentation

duckdb_query Executes a SQL query within a connection and stores the full (materialized) result in the out_result pointer. If the query
fails to execute, DuckDBError is returned and the error message can be retrieved by calling duckdb_result_error.

Note that after running duckdb_query, duckdb_destroy_result must be called on the result object even if the query fails, other‑
wise the error stored within the result will not be freed correctly.

Syntax

duckdb_state duckdb_query(
duckdb_connection connection,
const char *query,
duckdb_result *out_result
);

Parameters

• connection

The connection to perform the query in.

• query

The SQL query to run.

• out_result

The query result.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_destroy_result Closes the result and de‑allocates all memory allocated for that connection.

Syntax

void duckdb_destroy_result(
duckdb_result *result
);

Parameters

• result

The result to destroy.

duckdb_column_name Returns the column name of the specified column. The result should not need to be freed; the column names
will automatically be destroyed when the result is destroyed.

Returns NULL if the column is out of range.

Syntax

const char *duckdb_column_name(


duckdb_result *result,
idx_t col
);

48
DuckDB Documentation

Parameters

• result

The result object to fetch the column name from.

• col

The column index.

• returns

The column name of the specified column.

duckdb_column_type Returns the column type of the specified column.

Returns DUCKDB_TYPE_INVALID if the column is out of range.

Syntax
duckdb_type duckdb_column_type(
duckdb_result *result,
idx_t col
);

Parameters

• result

The result object to fetch the column type from.

• col

The column index.

• returns

The column type of the specified column.

duckdb_result_statement_type Returns the statement type of the statement that was executed

Syntax
duckdb_statement_type duckdb_result_statement_type(
duckdb_result result
);

Parameters

• result

The result object to fetch the statement type from.

• returns

duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID

duckdb_column_logical_type Returns the logical column type of the specified column.

The return type of this call should be destroyed with duckdb_destroy_logical_type.

Returns NULL if the column is out of range.

49
DuckDB Documentation

Syntax
duckdb_logical_type duckdb_column_logical_type(
duckdb_result *result,
idx_t col
);

Parameters

• result

The result object to fetch the column type from.

• col

The column index.

• returns

The logical column type of the specified column.

duckdb_column_count Returns the number of columns present in a the result object.

Syntax
idx_t duckdb_column_count(
duckdb_result *result
);

Parameters

• result

The result object.

• returns

The number of columns present in the result object.

duckdb_row_count Returns the number of rows present in the result object.

Syntax
idx_t duckdb_row_count(
duckdb_result *result
);

Parameters

• result

The result object.

• returns

The number of rows present in the result object.

duckdb_rows_changed Returns the number of rows changed by the query stored in the result. This is relevant only for IN‑
SERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0.

50
DuckDB Documentation

Syntax
idx_t duckdb_rows_changed(
duckdb_result *result
);

Parameters

• result

The result object.

• returns

The number of rows changed.

duckdb_column_data DEPRECATED: Prefer using duckdb_result_get_chunk instead.

Returns the data of a specific column of a result in columnar format.

The function returns a dense array which contains the result data. The exact type stored in the array depends on the corresponding duckdb_
type (as provided by duckdb_column_type). For the exact type by which the data should be accessed, see the comments in the types
section or the DUCKDB_TYPE enum.

For example, for a column of type DUCKDB_TYPE_INTEGER, rows can be accessed in the following manner:

int32_t *data = (int32_t *) duckdb_column_data(&result, 0);


printf("Data for row %d: %d\n", row, data[row]);

Syntax
void *duckdb_column_data(
duckdb_result *result,
idx_t col
);

Parameters

• result

The result object to fetch the column data from.

• col

The column index.

• returns

The column data of the specified column.

duckdb_nullmask_data DEPRECATED: Prefer using duckdb_result_get_chunk instead.

Returns the nullmask of a specific column of a result in columnar format. The nullmask indicates for every row whether or not the corre‑
sponding row is NULL. If a row is NULL, the values present in the array provided by duckdb_column_data are undefined.

int32_t *data = (int32_t *) duckdb_column_data(&result, 0);


bool *nullmask = duckdb_nullmask_data(&result, 0);
if (nullmask[row]) {
printf("Data for row %d: NULL\n", row);
} else {
printf("Data for row %d: %d\n", row, data[row]);
}

51
DuckDB Documentation

Syntax

bool *duckdb_nullmask_data(
duckdb_result *result,
idx_t col
);

Parameters

• result

The result object to fetch the nullmask from.

• col

The column index.

• returns

The nullmask of the specified column.

duckdb_result_error Returns the error message contained within the result. The error is only set if duckdb_query returns
DuckDBError.

The result of this function must not be freed. It will be cleaned up when duckdb_destroy_result is called.

Syntax

const char *duckdb_result_error(


duckdb_result *result
);

Parameters

• result

The result object to fetch the error from.

• returns

The error of the result.

Data Chunks

Data chunks represent a horizontal slice of a table. They hold a number of vectors, that can each hold up to the VECTOR_SIZE rows. The
vector size can be obtained through the duckdb_vector_size function and is configurable, but is usually set to 2048.

Data chunks and vectors are what DuckDB uses natively to store and represent data. For this reason, the data chunk interface is the most
efficient way of interfacing with DuckDB. Be aware, however, that correctly interfacing with DuckDB using the data chunk API does require
knowledge of DuckDB's internal vector format.

The primary manner of interfacing with data chunks is by obtaining the internal vectors of the data chunk using the duckdb_data_
chunk_get_vector method, and subsequently using the duckdb_vector_get_data and duckdb_vector_get_validity
methods to read the internal data and the validity mask of the vector. For composite types (list and struct vectors), duckdb_list_
vector_get_child and duckdb_struct_vector_get_child should be used to read child vectors.

52
DuckDB Documentation

API Reference

duckdb_data_chunk duckdb_create_data_chunk(duckdb_logical_type *types, idx_t column_count);


void duckdb_destroy_data_chunk(duckdb_data_chunk *chunk);
void duckdb_data_chunk_reset(duckdb_data_chunk chunk);
idx_t duckdb_data_chunk_get_column_count(duckdb_data_chunk chunk);
duckdb_vector duckdb_data_chunk_get_vector(duckdb_data_chunk chunk, idx_t col_idx);
idx_t duckdb_data_chunk_get_size(duckdb_data_chunk chunk);
void duckdb_data_chunk_set_size(duckdb_data_chunk chunk, idx_t size);

Vector Interface

duckdb_logical_type duckdb_vector_get_column_type(duckdb_vector vector);


void *duckdb_vector_get_data(duckdb_vector vector);
uint64_t *duckdb_vector_get_validity(duckdb_vector vector);
void duckdb_vector_ensure_validity_writable(duckdb_vector vector);
void duckdb_vector_assign_string_element(duckdb_vector vector, idx_t index, const char *str);
void duckdb_vector_assign_string_element_len(duckdb_vector vector, idx_t index, const char *str, idx_t
str_len);
duckdb_vector duckdb_list_vector_get_child(duckdb_vector vector);
idx_t duckdb_list_vector_get_size(duckdb_vector vector);
duckdb_state duckdb_list_vector_set_size(duckdb_vector vector, idx_t size);
duckdb_state duckdb_list_vector_reserve(duckdb_vector vector, idx_t required_capacity);
duckdb_vector duckdb_struct_vector_get_child(duckdb_vector vector, idx_t index);
duckdb_vector duckdb_array_vector_get_child(duckdb_vector vector);

Validity Mask Functions

bool duckdb_validity_row_is_valid(uint64_t *validity, idx_t row);


void duckdb_validity_set_row_validity(uint64_t *validity, idx_t row, bool valid);
void duckdb_validity_set_row_invalid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_valid(uint64_t *validity, idx_t row);

duckdb_create_data_chunk Creates an empty DataChunk with the specified set of types.

Note that the result must be destroyed with duckdb_destroy_data_chunk.

Syntax

duckdb_data_chunk duckdb_create_data_chunk(
duckdb_logical_type *types,
idx_t column_count
);

Parameters

• types

An array of types of the data chunk.

• column_count

The number of columns.

• returns

The data chunk.

53
DuckDB Documentation

duckdb_destroy_data_chunk Destroys the data chunk and de‑allocates all memory allocated for that chunk.

Syntax
void duckdb_destroy_data_chunk(
duckdb_data_chunk *chunk
);

Parameters

• chunk

The data chunk to destroy.

duckdb_data_chunk_reset Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0.

Syntax
void duckdb_data_chunk_reset(
duckdb_data_chunk chunk
);

Parameters

• chunk

The data chunk to reset.

duckdb_data_chunk_get_column_count Retrieves the number of columns in a data chunk.

Syntax
idx_t duckdb_data_chunk_get_column_count(
duckdb_data_chunk chunk
);

Parameters

• chunk

The data chunk to get the data from

• returns

The number of columns in the data chunk

duckdb_data_chunk_get_vector Retrieves the vector at the specified column index in the data chunk.

The pointer to the vector is valid for as long as the chunk is alive. It does NOT need to be destroyed.

Syntax
duckdb_vector duckdb_data_chunk_get_vector(
duckdb_data_chunk chunk,
idx_t col_idx
);

54
DuckDB Documentation

Parameters

• chunk

The data chunk to get the data from

• returns

The vector

duckdb_data_chunk_get_size Retrieves the current number of tuples in a data chunk.

Syntax

idx_t duckdb_data_chunk_get_size(
duckdb_data_chunk chunk
);

Parameters

• chunk

The data chunk to get the data from

• returns

The number of tuples in the data chunk

duckdb_data_chunk_set_size Sets the current number of tuples in a data chunk.

Syntax

void duckdb_data_chunk_set_size(
duckdb_data_chunk chunk,
idx_t size
);

Parameters

• chunk

The data chunk to set the size in

• size

The number of tuples in the data chunk

duckdb_vector_get_column_type Retrieves the column type of the specified vector.

The result must be destroyed with duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_vector_get_column_type(
duckdb_vector vector
);

55
DuckDB Documentation

Parameters

• vector

The vector get the data from

• returns

The type of the vector

duckdb_vector_get_data Retrieves the data pointer of the vector.

The data pointer can be used to read or write values from the vector. How to read or write values depends on the type of the vector.

Syntax

void *duckdb_vector_get_data(
duckdb_vector vector
);

Parameters

• vector

The vector to get the data from

• returns

The data pointer

duckdb_vector_get_validity Retrieves the validity mask pointer of the specified vector.

If all values are valid, this function MIGHT return NULL!

The validity mask is a bitset that signifies null‑ness within the data chunk. It is a series of uint64_t values, where each uint64_t value contains
validity for 64 tuples. The bit is set to 1 if the value is valid (i.e., not NULL) or 0 if the value is invalid (i.e., NULL).

Validity of a specific value can be obtained like this:

idx_t entry_idx = row_idx / 64; idx_t idx_in_entry = row_idx % 64; bool is_valid = validity_mask[entry_idx] & (1 « idx_in_entry);

Alternatively, the (slower) duckdb_validity_row_is_valid function can be used.

Syntax

uint64_t *duckdb_vector_get_validity(
duckdb_vector vector
);

Parameters

• vector

The vector to get the data from

• returns

The pointer to the validity mask, or NULL if no validity mask is present

56
DuckDB Documentation

duckdb_vector_ensure_validity_writable Ensures the validity mask is writable by allocating it.

After this function is called, duckdb_vector_get_validity will ALWAYS return non‑NULL. This allows null values to be written to the
vector, regardless of whether a validity mask was present before.

Syntax

void duckdb_vector_ensure_validity_writable(
duckdb_vector vector
);

Parameters

• vector

The vector to alter

duckdb_vector_assign_string_element Assigns a string element in the vector at the specified location.

Syntax

void duckdb_vector_assign_string_element(
duckdb_vector vector,
idx_t index,
const char *str
);

Parameters

• vector

The vector to alter

• index

The row position in the vector to assign the string to

• str

The null‑terminated string

duckdb_vector_assign_string_element_len Assigns a string element in the vector at the specified location. You may also
use this function to assign BLOBs.

Syntax

void duckdb_vector_assign_string_element_len(
duckdb_vector vector,
idx_t index,
const char *str,
idx_t str_len
);

57
DuckDB Documentation

Parameters

• vector

The vector to alter

• index

The row position in the vector to assign the string to

• str

The string

• str_len

The length of the string (in bytes)

duckdb_list_vector_get_child Retrieves the child vector of a list vector.

The resulting vector is valid as long as the parent vector is valid.

Syntax

duckdb_vector duckdb_list_vector_get_child(
duckdb_vector vector
);

Parameters

• vector

The vector

• returns

The child vector

duckdb_list_vector_get_size Returns the size of the child vector of the list.

Syntax

idx_t duckdb_list_vector_get_size(
duckdb_vector vector
);

Parameters

• vector

The vector

• returns

The size of the child list

duckdb_list_vector_set_size Sets the total size of the underlying child‑vector of a list vector.

58
DuckDB Documentation

Syntax

duckdb_state duckdb_list_vector_set_size(
duckdb_vector vector,
idx_t size
);

Parameters

• vector

The list vector.

• size

The size of the child list.

• returns

The duckdb state. Returns DuckDBError if the vector is nullptr.

duckdb_list_vector_reserve Sets the total capacity of the underlying child‑vector of a list.

Syntax

duckdb_state duckdb_list_vector_reserve(
duckdb_vector vector,
idx_t required_capacity
);

Parameters

• vector

The list vector.

• required_capacity

the total capacity to reserve.

• return

The duckdb state. Returns DuckDBError if the vector is nullptr.

duckdb_struct_vector_get_child Retrieves the child vector of a struct vector.

The resulting vector is valid as long as the parent vector is valid.

Syntax

duckdb_vector duckdb_struct_vector_get_child(
duckdb_vector vector,
idx_t index
);

59
DuckDB Documentation

Parameters

• vector

The vector

• index

The child index

• returns

The child vector

duckdb_array_vector_get_child Retrieves the child vector of a array vector.

The resulting vector is valid as long as the parent vector is valid. The resulting vector has the size of the parent vector multiplied by the
array size.

Syntax

duckdb_vector duckdb_array_vector_get_child(
duckdb_vector vector
);

Parameters

• vector

The vector

• returns

The child vector

duckdb_validity_row_is_valid Returns whether or not a row is valid (i.e., not NULL) in the given validity mask.

Syntax

bool duckdb_validity_row_is_valid(
uint64_t *validity,
idx_t row
);

Parameters

• validity

The validity mask, as obtained through duckdb_vector_get_validity

• row

The row index

• returns

true if the row is valid, false otherwise

60
DuckDB Documentation

duckdb_validity_set_row_validity In a validity mask, sets a specific row to either valid or invalid.

Note that duckdb_vector_ensure_validity_writable should be called before calling duckdb_vector_get_validity, to


ensure that there is a validity mask to write to.

Syntax
void duckdb_validity_set_row_validity(
uint64_t *validity,
idx_t row,
bool valid
);

Parameters

• validity

The validity mask, as obtained through duckdb_vector_get_validity.

• row

The row index

• valid

Whether or not to set the row to valid, or invalid

duckdb_validity_set_row_invalid In a validity mask, sets a specific row to invalid.

Equivalent to duckdb_validity_set_row_validity with valid set to false.

Syntax
void duckdb_validity_set_row_invalid(
uint64_t *validity,
idx_t row
);

Parameters

• validity

The validity mask

• row

The row index

duckdb_validity_set_row_valid In a validity mask, sets a specific row to valid.

Equivalent to duckdb_validity_set_row_validity with valid set to true.

Syntax
void duckdb_validity_set_row_valid(
uint64_t *validity,
idx_t row
);

61
DuckDB Documentation

Parameters

• validity

The validity mask

• row

The row index

Values

The value class represents a single value of any type.

API Reference

void duckdb_destroy_value(duckdb_value *value);


duckdb_value duckdb_create_varchar(const char *text);
duckdb_value duckdb_create_varchar_length(const char *text, idx_t length);
duckdb_value duckdb_create_int64(int64_t val);
duckdb_value duckdb_create_struct_value(duckdb_logical_type type, duckdb_value *values);
duckdb_value duckdb_create_list_value(duckdb_logical_type type, duckdb_value *values, idx_t value_
count);
duckdb_value duckdb_create_array_value(duckdb_logical_type type, duckdb_value *values, idx_t value_
count);
char *duckdb_get_varchar(duckdb_value value);
int64_t duckdb_get_int64(duckdb_value value);

duckdb_destroy_value Destroys the value and de‑allocates all memory allocated for that type.

Syntax

void duckdb_destroy_value(
duckdb_value *value
);

Parameters

• value

The value to destroy.

duckdb_create_varchar Creates a value from a null‑terminated string

Syntax

duckdb_value duckdb_create_varchar(
const char *text
);

62
DuckDB Documentation

Parameters

• value

The null‑terminated string

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_create_varchar_length Creates a value from a string

Syntax
duckdb_value duckdb_create_varchar_length(
const char *text,
idx_t length
);

Parameters

• value

The text

• length

The length of the text

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_create_int64 Creates a value from an int64

Syntax
duckdb_value duckdb_create_int64(
int64_t val
);

Parameters

• value

The bigint value

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_create_struct_value Creates a struct value from a type and an array of values

Syntax
duckdb_value duckdb_create_struct_value(
duckdb_logical_type type,
duckdb_value *values
);

63
DuckDB Documentation

Parameters

• type

The type of the struct

• values

The values for the struct fields

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_create_list_value Creates a list value from a type and an array of values of length value_count

Syntax

duckdb_value duckdb_create_list_value(
duckdb_logical_type type,
duckdb_value *values,
idx_t value_count
);

Parameters

• type

The type of the list

• values

The values for the list

• value_count

The number of values in the list

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_create_array_value Creates a array value from a type and an array of values of length value_count

Syntax

duckdb_value duckdb_create_array_value(
duckdb_logical_type type,
duckdb_value *values,
idx_t value_count
);

Parameters

• type

The type of the array

• values

64
DuckDB Documentation

The values for the array

• value_count

The number of values in the array

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_get_varchar Obtains a string representation of the given value. The result must be destroyed with duckdb_free.

Syntax
char *duckdb_get_varchar(
duckdb_value value
);

Parameters

• value

The value

• returns

The string value. This must be destroyed with duckdb_free.

duckdb_get_int64 Obtains an int64 of the given value.

Syntax
int64_t duckdb_get_int64(
duckdb_value value
);

Parameters

• value

The value

• returns

The int64 value, or 0 if no conversion is possible

Types

DuckDB is a strongly typed database system. As such, every column has a single type specified. This type is constant over the entire column.
That is to say, a column that is labeled as an INTEGER column will only contain INTEGER values.

DuckDB also supports columns of composite types. For example, it is possible to define an array of integers (INT[]). It is also possible to
define types as arbitrary structs (ROW(i INTEGER, j VARCHAR)). For that reason, native DuckDB type objects are not mere enums,
but a class that can potentially be nested.

Types in the C API are modeled using an enum (duckdb_type) and a complex class (duckdb_logical_type). For most primitive
types, e.g., integers or varchars, the enum is sufficient. For more complex types, such as lists, structs or decimals, the logical type must be
used.

65
DuckDB Documentation

typedef enum DUCKDB_TYPE {


DUCKDB_TYPE_INVALID,
DUCKDB_TYPE_BOOLEAN,
DUCKDB_TYPE_TINYINT,
DUCKDB_TYPE_SMALLINT,
DUCKDB_TYPE_INTEGER,
DUCKDB_TYPE_BIGINT,
DUCKDB_TYPE_UTINYINT,
DUCKDB_TYPE_USMALLINT,
DUCKDB_TYPE_UINTEGER,
DUCKDB_TYPE_UBIGINT,
DUCKDB_TYPE_FLOAT,
DUCKDB_TYPE_DOUBLE,
DUCKDB_TYPE_TIMESTAMP,
DUCKDB_TYPE_DATE,
DUCKDB_TYPE_TIME,
DUCKDB_TYPE_INTERVAL,
DUCKDB_TYPE_HUGEINT,
DUCKDB_TYPE_VARCHAR,
DUCKDB_TYPE_BLOB,
DUCKDB_TYPE_DECIMAL,
DUCKDB_TYPE_TIMESTAMP_S,
DUCKDB_TYPE_TIMESTAMP_MS,
DUCKDB_TYPE_TIMESTAMP_NS,
DUCKDB_TYPE_ENUM,
DUCKDB_TYPE_LIST,
DUCKDB_TYPE_STRUCT,
DUCKDB_TYPE_MAP,
DUCKDB_TYPE_UUID,
DUCKDB_TYPE_UNION,
DUCKDB_TYPE_BIT,
} duckdb_type;

Functions

The enum type of a column in the result can be obtained using the duckdb_column_type function. The logical type of a column can be
obtained using the duckdb_column_logical_type function.

duckdb_value The duckdb_value functions will auto‑cast values as required. For example, it is no problem to use duckdb_
value_double on a column of type duckdb_value_int32. The value will be auto‑cast and returned as a double. Note that in certain
cases the cast may fail. For example, this can happen if we request a duckdb_value_int8 and the value does not fit within an int8
value. In this case, a default value will be returned (usually 0 or nullptr). The same default value will also be returned if the corresponding
value is NULL.

The duckdb_value_is_null function can be used to check if a specific value is NULL or not.

The exception to the auto‑cast rule is the duckdb_value_varchar_internal function. This function does not auto‑cast and only
works for VARCHAR columns. The reason this function exists is that the result does not need to be freed.
Note. duckdb_value_varchar and duckdb_value_blob require the result to be de‑allocated using duckdb_free.

duckdb_result_get_chunk The duckdb_result_get_chunk function can be used to read data chunks from a DuckDB result
set, and is the most efficient way of reading data from a DuckDB result using the C API. It is also the only way of reading data of certain types
from a DuckDB result. For example, the duckdb_value functions do not support structural reading of composite types (lists or structs)
or more complex types like enums and decimals.

For more information about data chunks, see the documentation on data chunks.

66
DuckDB Documentation

API Reference

duckdb_data_chunk duckdb_result_get_chunk(duckdb_result result, idx_t chunk_index);


bool duckdb_result_is_streaming(duckdb_result result);
idx_t duckdb_result_chunk_count(duckdb_result result);
duckdb_result_type duckdb_result_return_type(duckdb_result result);

Date/Time/Timestamp Helpers

duckdb_date_struct duckdb_from_date(duckdb_date date);


duckdb_date duckdb_to_date(duckdb_date_struct date);
bool duckdb_is_finite_date(duckdb_date date);
duckdb_time_struct duckdb_from_time(duckdb_time time);
duckdb_time_tz duckdb_create_time_tz(int64_t micros, int32_t offset);
duckdb_time_tz_struct duckdb_from_time_tz(duckdb_time_tz micros);
duckdb_time duckdb_to_time(duckdb_time_struct time);
duckdb_timestamp_struct duckdb_from_timestamp(duckdb_timestamp ts);
duckdb_timestamp duckdb_to_timestamp(duckdb_timestamp_struct ts);
bool duckdb_is_finite_timestamp(duckdb_timestamp ts);

Hugeint Helpers

double duckdb_hugeint_to_double(duckdb_hugeint val);


duckdb_hugeint duckdb_double_to_hugeint(double val);

Decimal Helpers

duckdb_decimal duckdb_double_to_decimal(double val, uint8_t width, uint8_t scale);


double duckdb_decimal_to_double(duckdb_decimal val);

Logical Type Interface

duckdb_logical_type duckdb_create_logical_type(duckdb_type type);


char *duckdb_logical_type_get_alias(duckdb_logical_type type);
duckdb_logical_type duckdb_create_list_type(duckdb_logical_type type);
duckdb_logical_type duckdb_create_array_type(duckdb_logical_type type, idx_t array_size);
duckdb_logical_type duckdb_create_map_type(duckdb_logical_type key_type, duckdb_logical_type value_
type);
duckdb_logical_type duckdb_create_union_type(duckdb_logical_type *member_types, const char **member_
names, idx_t member_count);
duckdb_logical_type duckdb_create_struct_type(duckdb_logical_type *member_types, const char **member_
names, idx_t member_count);
duckdb_logical_type duckdb_create_enum_type(const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_decimal_type(uint8_t width, uint8_t scale);
duckdb_type duckdb_get_type_id(duckdb_logical_type type);
uint8_t duckdb_decimal_width(duckdb_logical_type type);
uint8_t duckdb_decimal_scale(duckdb_logical_type type);
duckdb_type duckdb_decimal_internal_type(duckdb_logical_type type);
duckdb_type duckdb_enum_internal_type(duckdb_logical_type type);
uint32_t duckdb_enum_dictionary_size(duckdb_logical_type type);
char *duckdb_enum_dictionary_value(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_list_type_child_type(duckdb_logical_type type);
duckdb_logical_type duckdb_array_type_child_type(duckdb_logical_type type);
idx_t duckdb_array_type_array_size(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_key_type(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_value_type(duckdb_logical_type type);
idx_t duckdb_struct_type_child_count(duckdb_logical_type type);

67
DuckDB Documentation

char *duckdb_struct_type_child_name(duckdb_logical_type type, idx_t index);


duckdb_logical_type duckdb_struct_type_child_type(duckdb_logical_type type, idx_t index);
idx_t duckdb_union_type_member_count(duckdb_logical_type type);
char *duckdb_union_type_member_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_union_type_member_type(duckdb_logical_type type, idx_t index);
void duckdb_destroy_logical_type(duckdb_logical_type *type);

duckdb_result_get_chunk Fetches a data chunk from the duckdb_result. This function should be called repeatedly until the result
is exhausted.

The result must be destroyed with duckdb_destroy_data_chunk.

This function supersedes all duckdb_value functions, as well as the duckdb_column_data and duckdb_nullmask_data func‑
tions. It results in significantly better performance, and should be preferred in newer code‑bases.

If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy
result functions).

Use duckdb_result_chunk_count to figure out how many chunks there are in the result.

Syntax

duckdb_data_chunk duckdb_result_get_chunk(
duckdb_result result,
idx_t chunk_index
);

Parameters

• result

The result object to fetch the data chunk from.

• chunk_index

The chunk index to fetch from.

• returns

The resulting data chunk. Returns NULL if the chunk index is out of bounds.

duckdb_result_is_streaming Checks if the type of the internal result is StreamQueryResult.

Syntax

bool duckdb_result_is_streaming(
duckdb_result result
);

Parameters

• result

The result object to check.

• returns

Whether or not the result object is of the type StreamQueryResult

68
DuckDB Documentation

duckdb_result_chunk_count Returns the number of data chunks present in the result.

Syntax

idx_t duckdb_result_chunk_count(
duckdb_result result
);

Parameters

• result

The result object

• returns

Number of data chunks present in the result.

duckdb_result_return_type Returns the return_type of the given result, or DUCKDB_RETURN_TYPE_INVALID on error

Syntax

duckdb_result_type duckdb_result_return_type(
duckdb_result result
);

Parameters

• result

The result object

• returns

The return_type

duckdb_from_date Decompose a duckdb_date object into year, month and date (stored as duckdb_date_struct).

Syntax

duckdb_date_struct duckdb_from_date(
duckdb_date date
);

Parameters

• date

The date object, as obtained from a DUCKDB_TYPE_DATE column.

• returns

The duckdb_date_struct with the decomposed elements.

duckdb_to_date Re‑compose a duckdb_date from year, month and date (duckdb_date_struct).

69
DuckDB Documentation

Syntax

duckdb_date duckdb_to_date(
duckdb_date_struct date
);

Parameters

• date

The year, month and date stored in a duckdb_date_struct.

• returns

The duckdb_date element.

duckdb_is_finite_date Test a duckdb_date to see if it is a finite value.

Syntax

bool duckdb_is_finite_date(
duckdb_date date
);

Parameters

• date

The date object, as obtained from a DUCKDB_TYPE_DATE column.

• returns

True if the date is finite, false if it is ±infinity.

duckdb_from_time Decompose a duckdb_time object into hour, minute, second and microsecond (stored as duckdb_time_
struct).

Syntax

duckdb_time_struct duckdb_from_time(
duckdb_time time
);

Parameters

• time

The time object, as obtained from a DUCKDB_TYPE_TIME column.

• returns

The duckdb_time_struct with the decomposed elements.

duckdb_create_time_tz Create a duckdb_time_tz object from micros and a timezone offset.

70
DuckDB Documentation

Syntax
duckdb_time_tz duckdb_create_time_tz(
int64_t micros,
int32_t offset
);

Parameters

• micros

The microsecond component of the time.

• offset

The timezone offset component of the time.

• returns

The duckdb_time_tz element.

duckdb_from_time_tz Decompose a TIME_TZ objects into micros and a timezone offset.

Use duckdb_from_time to further decompose the micros into hour, minute, second and microsecond.

Syntax
duckdb_time_tz_struct duckdb_from_time_tz(
duckdb_time_tz micros
);

Parameters

• micros

The time object, as obtained from a DUCKDB_TYPE_TIME_TZ column.

• out_micros

The microsecond component of the time.

• out_offset

The timezone offset component of the time.

duckdb_to_time Re‑compose a duckdb_time from hour, minute, second and microsecond (duckdb_time_struct).

Syntax
duckdb_time duckdb_to_time(
duckdb_time_struct time
);

Parameters

• time

The hour, minute, second and microsecond in a duckdb_time_struct.

• returns

The duckdb_time element.

71
DuckDB Documentation

duckdb_from_timestamp Decompose a duckdb_timestamp object into a duckdb_timestamp_struct.

Syntax

duckdb_timestamp_struct duckdb_from_timestamp(
duckdb_timestamp ts
);

Parameters

• ts

The ts object, as obtained from a DUCKDB_TYPE_TIMESTAMP column.

• returns

The duckdb_timestamp_struct with the decomposed elements.

duckdb_to_timestamp Re‑compose a duckdb_timestamp from a duckdb_timestamp_struct.

Syntax

duckdb_timestamp duckdb_to_timestamp(
duckdb_timestamp_struct ts
);

Parameters

• ts

The de‑composed elements in a duckdb_timestamp_struct.

• returns

The duckdb_timestamp element.

duckdb_is_finite_timestamp Test a duckdb_timestamp to see if it is a finite value.

Syntax

bool duckdb_is_finite_timestamp(
duckdb_timestamp ts
);

Parameters

• ts

The timestamp object, as obtained from a DUCKDB_TYPE_TIMESTAMP column.

• returns

True if the timestamp is finite, false if it is ±infinity.

duckdb_hugeint_to_double Converts a duckdb_hugeint object (as obtained from a DUCKDB_TYPE_HUGEINT column) into a
double.

72
DuckDB Documentation

Syntax

double duckdb_hugeint_to_double(
duckdb_hugeint val
);

Parameters

• val

The hugeint value.

• returns

The converted double element.

duckdb_double_to_hugeint Converts a double value to a duckdb_hugeint object.

If the conversion fails because the double value is too big the result will be 0.

Syntax

duckdb_hugeint duckdb_double_to_hugeint(
double val
);

Parameters

• val

The double value.

• returns

The converted duckdb_hugeint element.

duckdb_double_to_decimal Converts a double value to a duckdb_decimal object.

If the conversion fails because the double value is too big, or the width/scale are invalid the result will be 0.

Syntax

duckdb_decimal duckdb_double_to_decimal(
double val,
uint8_t width,
uint8_t scale
);

Parameters

• val

The double value.

• returns

The converted duckdb_decimal element.

73
DuckDB Documentation

duckdb_decimal_to_double Converts a duckdb_decimal object (as obtained from a DUCKDB_TYPE_DECIMAL column) into a
double.

Syntax
double duckdb_decimal_to_double(
duckdb_decimal val
);

Parameters

• val

The decimal value.

• returns

The converted double element.

duckdb_create_logical_type Creates a duckdb_logical_type from a standard primitive type. The resulting type should be
destroyed with duckdb_destroy_logical_type.

This should not be used with DUCKDB_TYPE_DECIMAL.

Syntax
duckdb_logical_type duckdb_create_logical_type(
duckdb_type type
);

Parameters

• type

The primitive type to create.

• returns

The logical type.

duckdb_logical_type_get_alias Returns the alias of a duckdb_logical_type, if one is set, else NULL. The result must be de‑
stroyed with duckdb_free.

Syntax
char *duckdb_logical_type_get_alias(
duckdb_logical_type type
);

Parameters

• type

The logical type to return the alias of

• returns

The alias or NULL

74
DuckDB Documentation

duckdb_create_list_type Creates a list type from its child type. The resulting type should be destroyed with duckdb_destroy_
logical_type.

Syntax

duckdb_logical_type duckdb_create_list_type(
duckdb_logical_type type
);

Parameters

• type

The child type of list type to create.

• returns

The logical type.

duckdb_create_array_type Creates a array type from its child type. The resulting type should be destroyed with duckdb_
destroy_logical_type.

Syntax

duckdb_logical_type duckdb_create_array_type(
duckdb_logical_type type,
idx_t array_size
);

Parameters

• type

The child type of array type to create.

• array_size

The number of elements in the array.

• returns

The logical type.

duckdb_create_map_type Creates a map type from its key type and value type. The resulting type should be destroyed with
duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_create_map_type(
duckdb_logical_type key_type,
duckdb_logical_type value_type
);

75
DuckDB Documentation

Parameters

• type

The key type and value type of map type to create.

• returns

The logical type.

duckdb_create_union_type Creates a UNION type from the passed types array. The resulting type should be destroyed with
duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_union_type(
duckdb_logical_type *member_types,
const char **member_names,
idx_t member_count
);

Parameters

• types

The array of types that the union should consist of.

• type_amount

The size of the types array.

• returns

The logical type.

duckdb_create_struct_type Creates a STRUCT type from the passed member name and type arrays. The resulting type should
be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_struct_type(
duckdb_logical_type *member_types,
const char **member_names,
idx_t member_count
);

Parameters

• member_types

The array of types that the struct should consist of.

• member_names

The array of names that the struct should consist of.

• member_count

The number of members that were specified for both arrays.

• returns

The logical type.

76
DuckDB Documentation

duckdb_create_enum_type Creates an ENUM type from the passed member name array. The resulting type should be destroyed
with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_enum_type(
const char **member_names,
idx_t member_count
);

Parameters

• enum_name

The name of the enum.

• member_names

The array of names that the enum should consist of.

• member_count

The number of elements that were specified in the array.

• returns

The logical type.

duckdb_create_decimal_type Creates a duckdb_logical_type of type decimal with the specified width and scale. The re‑
sulting type should be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_decimal_type(
uint8_t width,
uint8_t scale
);

Parameters

• width

The width of the decimal type

• scale

The scale of the decimal type

• returns

The logical type.

duckdb_get_type_id Retrieves the enum type class of a duckdb_logical_type.

Syntax
duckdb_type duckdb_get_type_id(
duckdb_logical_type type
);

77
DuckDB Documentation

Parameters

• type

The logical type object

• returns

The type id

duckdb_decimal_width Retrieves the width of a decimal type.

Syntax

uint8_t duckdb_decimal_width(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The width of the decimal type

duckdb_decimal_scale Retrieves the scale of a decimal type.

Syntax

uint8_t duckdb_decimal_scale(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The scale of the decimal type

duckdb_decimal_internal_type Retrieves the internal storage type of a decimal type.

Syntax

duckdb_type duckdb_decimal_internal_type(
duckdb_logical_type type
);

78
DuckDB Documentation

Parameters

• type

The logical type object

• returns

The internal type of the decimal type

duckdb_enum_internal_type Retrieves the internal storage type of an enum type.

Syntax

duckdb_type duckdb_enum_internal_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The internal type of the enum type

duckdb_enum_dictionary_size Retrieves the dictionary size of the enum type.

Syntax

uint32_t duckdb_enum_dictionary_size(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The dictionary size of the enum type

duckdb_enum_dictionary_value Retrieves the dictionary value at the specified position from the enum.

The result must be freed with duckdb_free.

Syntax

char *duckdb_enum_dictionary_value(
duckdb_logical_type type,
idx_t index
);

79
DuckDB Documentation

Parameters

• type

The logical type object

• index

The index in the dictionary

• returns

The string value of the enum type. Must be freed with duckdb_free.

duckdb_list_type_child_type Retrieves the child type of the given list type.

The result must be freed with duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_list_type_child_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The child type of the list type. Must be destroyed with duckdb_destroy_logical_type.

duckdb_array_type_child_type Retrieves the child type of the given array type.

The result must be freed with duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_array_type_child_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The child type of the array type. Must be destroyed with duckdb_destroy_logical_type.

duckdb_array_type_array_size Retrieves the array size of the given array type.

80
DuckDB Documentation

Syntax

idx_t duckdb_array_type_array_size(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The fixed number of elements the values of this array type can store.

duckdb_map_type_key_type Retrieves the key type of the given map type.

The result must be freed with duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_map_type_key_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The key type of the map type. Must be destroyed with duckdb_destroy_logical_type.

duckdb_map_type_value_type Retrieves the value type of the given map type.

The result must be freed with duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_map_type_value_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The value type of the map type. Must be destroyed with duckdb_destroy_logical_type.

duckdb_struct_type_child_count Returns the number of children of a struct type.

81
DuckDB Documentation

Syntax

idx_t duckdb_struct_type_child_count(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The number of children of a struct type.

duckdb_struct_type_child_name Retrieves the name of the struct child.

The result must be freed with duckdb_free.

Syntax

char *duckdb_struct_type_child_name(
duckdb_logical_type type,
idx_t index
);

Parameters

• type

The logical type object

• index

The child index

• returns

The name of the struct type. Must be freed with duckdb_free.

duckdb_struct_type_child_type Retrieves the child type of the given struct type at the specified index.

The result must be freed with duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_struct_type_child_type(
duckdb_logical_type type,
idx_t index
);

82
DuckDB Documentation

Parameters

• type

The logical type object

• index

The child index

• returns

The child type of the struct type. Must be destroyed with duckdb_destroy_logical_type.

duckdb_union_type_member_count Returns the number of members that the union type has.

Syntax
idx_t duckdb_union_type_member_count(
duckdb_logical_type type
);

Parameters

• type

The logical type (union) object

• returns

The number of members of a union type.

duckdb_union_type_member_name Retrieves the name of the union member.

The result must be freed with duckdb_free.

Syntax
char *duckdb_union_type_member_name(
duckdb_logical_type type,
idx_t index
);

Parameters

• type

The logical type object

• index

The child index

• returns

The name of the union member. Must be freed with duckdb_free.

duckdb_union_type_member_type Retrieves the child type of the given union member at the specified index.

The result must be freed with duckdb_destroy_logical_type.

83
DuckDB Documentation

Syntax
duckdb_logical_type duckdb_union_type_member_type(
duckdb_logical_type type,
idx_t index
);

Parameters

• type

The logical type object

• index

The child index

• returns

The child type of the union member. Must be destroyed with duckdb_destroy_logical_type.

duckdb_destroy_logical_type Destroys the logical type and de‑allocates all memory allocated for that type.

Syntax
void duckdb_destroy_logical_type(
duckdb_logical_type *type
);

Parameters

• type

The logical type to destroy.

Prepared Statements

A prepared statement is a parameterized query. The query is prepared with question marks (?) or dollar symbols ($1) indicating the
parameters of the query. Values can then be bound to these parameters, after which the prepared statement can be executed using those
parameters. A single query can be prepared once and executed many times.

Prepared statements are useful to:

• Easily supply parameters to functions while avoiding string concatenation/SQL injection attacks.
• Speeding up queries that will be executed many times with different parameters.

DuckDB supports prepared statements in the C API with the duckdb_prepare method. The duckdb_bind family of functions is used
to supply values for subsequent execution of the prepared statement using duckdb_execute_prepared. After we are done with the
prepared statement it can be cleaned up using the duckdb_destroy_prepare method.

Example

duckdb_prepared_statement stmt;
duckdb_result result;
if (duckdb_prepare(con, "INSERT INTO integers VALUES ($1, $2)", &stmt) == DuckDBError) {
// handle error
}

84
DuckDB Documentation

duckdb_bind_int32(stmt, 1, 42); // the parameter index starts counting at 1!


duckdb_bind_int32(stmt, 2, 43);
// NULL as second parameter means no result set is requested
duckdb_execute_prepared(stmt, NULL);
duckdb_destroy_prepare(&stmt);

// we can also query result sets using prepared statements


if (duckdb_prepare(con, "SELECT * FROM integers WHERE i = ?", &stmt) == DuckDBError) {
// handle error
}
duckdb_bind_int32(stmt, 1, 42);
duckdb_execute_prepared(stmt, &result);

// do something with result

// clean up
duckdb_destroy_result(&result);
duckdb_destroy_prepare(&stmt);

After calling duckdb_prepare, the prepared statement parameters can be inspected using duckdb_nparams and duckdb_param_
type. In case the prepare fails, the error can be obtained through duckdb_prepare_error.

It is not required that the duckdb_bind family of functions matches the prepared statement parameter type exactly. The values will be
auto‑cast to the required value as required. For example, calling duckdb_bind_int8 on a parameter type of DUCKDB_TYPE_INTEGER
will work as expected.

Note. Warning Do not use prepared statements to insert large amounts of data into DuckDB. Instead it is recommended to use the
Appender.

API Reference

duckdb_state duckdb_prepare(duckdb_connection connection, const char *query, duckdb_prepared_statement


*out_prepared_statement);
void duckdb_destroy_prepare(duckdb_prepared_statement *prepared_statement);
const char *duckdb_prepare_error(duckdb_prepared_statement prepared_statement);
idx_t duckdb_nparams(duckdb_prepared_statement prepared_statement);
const char *duckdb_parameter_name(duckdb_prepared_statement prepared_statement, idx_t index);
duckdb_type duckdb_param_type(duckdb_prepared_statement prepared_statement, idx_t param_idx);
duckdb_state duckdb_clear_bindings(duckdb_prepared_statement prepared_statement);
duckdb_statement_type duckdb_prepared_statement_type(duckdb_prepared_statement statement);

duckdb_prepare Create a prepared statement object from a query.

Note that after calling duckdb_prepare, the prepared statement should always be destroyed using duckdb_destroy_prepare,
even if the prepare fails.

If the prepare fails, duckdb_prepare_error can be called to obtain the reason why the prepare failed.

Syntax

duckdb_state duckdb_prepare(
duckdb_connection connection,
const char *query,
duckdb_prepared_statement *out_prepared_statement
);

85
DuckDB Documentation

Parameters

• connection

The connection object

• query

The SQL query to prepare

• out_prepared_statement

The resulting prepared statement object

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_destroy_prepare Closes the prepared statement and de‑allocates all memory allocated for the statement.

Syntax

void duckdb_destroy_prepare(
duckdb_prepared_statement *prepared_statement
);

Parameters

• prepared_statement

The prepared statement to destroy.

duckdb_prepare_error Returns the error message associated with the given prepared statement. If the prepared statement has no
error message, this returns nullptr instead.

The error message should not be freed. It will be de‑allocated when duckdb_destroy_prepare is called.

Syntax

const char *duckdb_prepare_error(


duckdb_prepared_statement prepared_statement
);

Parameters

• prepared_statement

The prepared statement to obtain the error from.

• returns

The error message, or nullptr if there is none.

duckdb_nparams Returns the number of parameters that can be provided to the given prepared statement.

Returns 0 if the query was not successfully prepared.

86
DuckDB Documentation

Syntax

idx_t duckdb_nparams(
duckdb_prepared_statement prepared_statement
);

Parameters

• prepared_statement

The prepared statement to obtain the number of parameters for.

duckdb_parameter_name Returns the name used to identify the parameter The returned string should be freed using duckdb_
free.

Returns NULL if the index is out of range for the provided prepared statement.

Syntax

const char *duckdb_parameter_name(


duckdb_prepared_statement prepared_statement,
idx_t index
);

Parameters

• prepared_statement

The prepared statement for which to get the parameter name from.

duckdb_param_type Returns the parameter type for the parameter at the given index.

Returns DUCKDB_TYPE_INVALID if the parameter index is out of range or the statement was not successfully prepared.

Syntax

duckdb_type duckdb_param_type(
duckdb_prepared_statement prepared_statement,
idx_t param_idx
);

Parameters

• prepared_statement

The prepared statement.

• param_idx

The parameter index.

• returns

The parameter type

duckdb_clear_bindings Clear the params bind to the prepared statement.

87
DuckDB Documentation

Syntax

duckdb_state duckdb_clear_bindings(
duckdb_prepared_statement prepared_statement
);

duckdb_prepared_statement_type Returns the statement type of the statement to be executed

Syntax

duckdb_statement_type duckdb_prepared_statement_type(
duckdb_prepared_statement statement
);

Parameters

• statement

The prepared statement.

• returns

duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID

Appender

Appenders are the most efficient way of loading data into DuckDB from within the C interface, and are recommended for fast data loading.
The appender is much faster than using prepared statements or individual INSERT INTO statements.

Appends are made in row‑wise format. For every column, a duckdb_append_[type] call should be made, after which the row should
be finished by calling duckdb_appender_end_row. After all rows have been appended, duckdb_appender_destroy should be
used to finalize the appender and clean up the resulting memory.

Note that duckdb_appender_destroy should always be called on the resulting appender, even if the function returns DuckDBEr-
ror.

Example

duckdb_query(con, "CREATE TABLE people (id INTEGER, name VARCHAR)", NULL);

duckdb_appender appender;
if (duckdb_appender_create(con, NULL, "people", &appender) == DuckDBError) {
// handle error
}
// append the first row (1, Mark)
duckdb_append_int32(appender, 1);
duckdb_append_varchar(appender, "Mark");
duckdb_appender_end_row(appender);

// append the second row (2, Hannes)


duckdb_append_int32(appender, 2);
duckdb_append_varchar(appender, "Hannes");
duckdb_appender_end_row(appender);

// finish appending and flush all the rows to the table


duckdb_appender_destroy(&appender);

88
DuckDB Documentation

API Reference

duckdb_state duckdb_appender_create(duckdb_connection connection, const char *schema, const char *table,


duckdb_appender *out_appender);
idx_t duckdb_appender_column_count(duckdb_appender appender);
duckdb_logical_type duckdb_appender_column_type(duckdb_appender appender, idx_t col_idx);
const char *duckdb_appender_error(duckdb_appender appender);
duckdb_state duckdb_appender_flush(duckdb_appender appender);
duckdb_state duckdb_appender_close(duckdb_appender appender);
duckdb_state duckdb_appender_destroy(duckdb_appender *appender);
duckdb_state duckdb_appender_begin_row(duckdb_appender appender);
duckdb_state duckdb_appender_end_row(duckdb_appender appender);
duckdb_state duckdb_append_bool(duckdb_appender appender, bool value);
duckdb_state duckdb_append_int8(duckdb_appender appender, int8_t value);
duckdb_state duckdb_append_int16(duckdb_appender appender, int16_t value);
duckdb_state duckdb_append_int32(duckdb_appender appender, int32_t value);
duckdb_state duckdb_append_int64(duckdb_appender appender, int64_t value);
duckdb_state duckdb_append_hugeint(duckdb_appender appender, duckdb_hugeint value);
duckdb_state duckdb_append_uint8(duckdb_appender appender, uint8_t value);
duckdb_state duckdb_append_uint16(duckdb_appender appender, uint16_t value);
duckdb_state duckdb_append_uint32(duckdb_appender appender, uint32_t value);
duckdb_state duckdb_append_uint64(duckdb_appender appender, uint64_t value);
duckdb_state duckdb_append_uhugeint(duckdb_appender appender, duckdb_uhugeint value);
duckdb_state duckdb_append_float(duckdb_appender appender, float value);
duckdb_state duckdb_append_double(duckdb_appender appender, double value);
duckdb_state duckdb_append_date(duckdb_appender appender, duckdb_date value);
duckdb_state duckdb_append_time(duckdb_appender appender, duckdb_time value);
duckdb_state duckdb_append_timestamp(duckdb_appender appender, duckdb_timestamp value);
duckdb_state duckdb_append_interval(duckdb_appender appender, duckdb_interval value);
duckdb_state duckdb_append_varchar(duckdb_appender appender, const char *val);
duckdb_state duckdb_append_varchar_length(duckdb_appender appender, const char *val, idx_t length);
duckdb_state duckdb_append_blob(duckdb_appender appender, const void *data, idx_t length);
duckdb_state duckdb_append_null(duckdb_appender appender);
duckdb_state duckdb_append_data_chunk(duckdb_appender appender, duckdb_data_chunk chunk);

duckdb_appender_create Creates an appender object.

Note that the object must be destroyed with duckdb_appender_destroy.

Syntax
duckdb_state duckdb_appender_create(
duckdb_connection connection,
const char *schema,
const char *table,
duckdb_appender *out_appender
);

Parameters

• connection

The connection context to create the appender in.

• schema

The schema of the table to append to, or nullptr for the default schema.

• table

89
DuckDB Documentation

The table name to append to.

• out_appender

The resulting appender object.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_appender_column_count Returns the number of columns in the table that belongs to the appender.

• appender The appender to get the column count from.

Syntax

idx_t duckdb_appender_column_count(
duckdb_appender appender
);

Parameters

• returns

The number of columns in the table.

duckdb_appender_column_type Returns the type of the column at the specified index.

Note: The resulting type should be destroyed with duckdb_destroy_logical_type.

• appender The appender to get the column type from.


• col_idx The index of the column to get the type of.

Syntax

duckdb_logical_type duckdb_appender_column_type(
duckdb_appender appender,
idx_t col_idx
);

Parameters

• returns

The duckdb_logical_type of the column.

duckdb_appender_error Returns the error message associated with the given appender. If the appender has no error message, this
returns nullptr instead.

The error message should not be freed. It will be de‑allocated when duckdb_appender_destroy is called.

Syntax

const char *duckdb_appender_error(


duckdb_appender appender
);

90
DuckDB Documentation

Parameters

• appender

The appender to get the error from.

• returns

The error message, or nullptr if there is none.

duckdb_appender_flush Flush the appender to the table, forcing the cache of the appender to be cleared and the data to be ap‑
pended to the base table.

This should generally not be used unless you know what you are doing. Instead, call duckdb_appender_destroy when you are done
with the appender.

Syntax

duckdb_state duckdb_appender_flush(
duckdb_appender appender
);

Parameters

• appender

The appender to flush.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_appender_close Close the appender, flushing all intermediate state in the appender to the table and closing it for further
appends.

This is generally not necessary. Call duckdb_appender_destroy instead.

Syntax

duckdb_state duckdb_appender_close(
duckdb_appender appender
);

Parameters

• appender

The appender to flush and close.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_appender_destroy Close the appender and destroy it. Flushing all intermediate state in the appender to the table, and
de‑allocating all memory associated with the appender.

91
DuckDB Documentation

Syntax

duckdb_state duckdb_appender_destroy(
duckdb_appender *appender
);

Parameters

• appender

The appender to flush, close and destroy.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_appender_begin_row A nop function, provided for backwards compatibility reasons. Does nothing. Only duckdb_
appender_end_row is required.

Syntax

duckdb_state duckdb_appender_begin_row(
duckdb_appender appender
);

duckdb_appender_end_row Finish the current row of appends. After end_row is called, the next row can be appended.

Syntax

duckdb_state duckdb_appender_end_row(
duckdb_appender appender
);

Parameters

• appender

The appender.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_append_bool Append a bool value to the appender.

Syntax

duckdb_state duckdb_append_bool(
duckdb_appender appender,
bool value
);

duckdb_append_int8 Append an int8_t value to the appender.

92
DuckDB Documentation

Syntax
duckdb_state duckdb_append_int8(
duckdb_appender appender,
int8_t value
);

duckdb_append_int16 Append an int16_t value to the appender.

Syntax
duckdb_state duckdb_append_int16(
duckdb_appender appender,
int16_t value
);

duckdb_append_int32 Append an int32_t value to the appender.

Syntax
duckdb_state duckdb_append_int32(
duckdb_appender appender,
int32_t value
);

duckdb_append_int64 Append an int64_t value to the appender.

Syntax
duckdb_state duckdb_append_int64(
duckdb_appender appender,
int64_t value
);

duckdb_append_hugeint Append a duckdb_hugeint value to the appender.

Syntax
duckdb_state duckdb_append_hugeint(
duckdb_appender appender,
duckdb_hugeint value
);

duckdb_append_uint8 Append a uint8_t value to the appender.

Syntax
duckdb_state duckdb_append_uint8(
duckdb_appender appender,
uint8_t value
);

duckdb_append_uint16 Append a uint16_t value to the appender.

93
DuckDB Documentation

Syntax
duckdb_state duckdb_append_uint16(
duckdb_appender appender,
uint16_t value
);

duckdb_append_uint32 Append a uint32_t value to the appender.

Syntax
duckdb_state duckdb_append_uint32(
duckdb_appender appender,
uint32_t value
);

duckdb_append_uint64 Append a uint64_t value to the appender.

Syntax
duckdb_state duckdb_append_uint64(
duckdb_appender appender,
uint64_t value
);

duckdb_append_uhugeint Append a duckdb_uhugeint value to the appender.

Syntax
duckdb_state duckdb_append_uhugeint(
duckdb_appender appender,
duckdb_uhugeint value
);

duckdb_append_float Append a float value to the appender.

Syntax
duckdb_state duckdb_append_float(
duckdb_appender appender,
float value
);

duckdb_append_double Append a double value to the appender.

Syntax
duckdb_state duckdb_append_double(
duckdb_appender appender,
double value
);

duckdb_append_date Append a duckdb_date value to the appender.

94
DuckDB Documentation

Syntax

duckdb_state duckdb_append_date(
duckdb_appender appender,
duckdb_date value
);

duckdb_append_time Append a duckdb_time value to the appender.

Syntax

duckdb_state duckdb_append_time(
duckdb_appender appender,
duckdb_time value
);

duckdb_append_timestamp Append a duckdb_timestamp value to the appender.

Syntax

duckdb_state duckdb_append_timestamp(
duckdb_appender appender,
duckdb_timestamp value
);

duckdb_append_interval Append a duckdb_interval value to the appender.

Syntax

duckdb_state duckdb_append_interval(
duckdb_appender appender,
duckdb_interval value
);

duckdb_append_varchar Append a varchar value to the appender.

Syntax

duckdb_state duckdb_append_varchar(
duckdb_appender appender,
const char *val
);

duckdb_append_varchar_length Append a varchar value to the appender.

Syntax

duckdb_state duckdb_append_varchar_length(
duckdb_appender appender,
const char *val,
idx_t length
);

95
DuckDB Documentation

duckdb_append_blob Append a blob value to the appender.

Syntax
duckdb_state duckdb_append_blob(
duckdb_appender appender,
const void *data,
idx_t length
);

duckdb_append_null Append a NULL value to the appender (of any type).

Syntax
duckdb_state duckdb_append_null(
duckdb_appender appender
);

duckdb_append_data_chunk Appends a pre‑filled data chunk to the specified appender.

The types of the data chunk must exactly match the types of the table, no casting is performed. If the types do not match or the appender
is in an invalid state, DuckDBError is returned. If the append is successful, DuckDBSuccess is returned.

Syntax
duckdb_state duckdb_append_data_chunk(
duckdb_appender appender,
duckdb_data_chunk chunk
);

Parameters

• appender

The appender to append to.

• chunk

The data chunk to append.

• returns

The return state.

Table Functions

The table function API can be used to define a table function that can then be called from within DuckDB in the FROM clause of a query.

API Reference

duckdb_table_function duckdb_create_table_function();
void duckdb_destroy_table_function(duckdb_table_function *table_function);
void duckdb_table_function_set_name(duckdb_table_function table_function, const char *name);
void duckdb_table_function_add_parameter(duckdb_table_function table_function, duckdb_logical_type
type);

96
DuckDB Documentation

void duckdb_table_function_add_named_parameter(duckdb_table_function table_function, const char *name,


duckdb_logical_type type);
void duckdb_table_function_set_extra_info(duckdb_table_function table_function, void *extra_info,
duckdb_delete_callback_t destroy);
void duckdb_table_function_set_bind(duckdb_table_function table_function, duckdb_table_function_bind_t
bind);
void duckdb_table_function_set_init(duckdb_table_function table_function, duckdb_table_function_init_t
init);
void duckdb_table_function_set_local_init(duckdb_table_function table_function, duckdb_table_function_
init_t init);
void duckdb_table_function_set_function(duckdb_table_function table_function, duckdb_table_function_t
function);
void duckdb_table_function_supports_projection_pushdown(duckdb_table_function table_function, bool
pushdown);
duckdb_state duckdb_register_table_function(duckdb_connection con, duckdb_table_function function);

Table Function Bind

void *duckdb_bind_get_extra_info(duckdb_bind_info info);


void duckdb_bind_add_result_column(duckdb_bind_info info, const char *name, duckdb_logical_type type);
idx_t duckdb_bind_get_parameter_count(duckdb_bind_info info);
duckdb_value duckdb_bind_get_parameter(duckdb_bind_info info, idx_t index);
duckdb_value duckdb_bind_get_named_parameter(duckdb_bind_info info, const char *name);
void duckdb_bind_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t
destroy);
void duckdb_bind_set_cardinality(duckdb_bind_info info, idx_t cardinality, bool is_exact);
void duckdb_bind_set_error(duckdb_bind_info info, const char *error);

Table Function Init

void *duckdb_init_get_extra_info(duckdb_init_info info);


void *duckdb_init_get_bind_data(duckdb_init_info info);
void duckdb_init_set_init_data(duckdb_init_info info, void *init_data, duckdb_delete_callback_t
destroy);
idx_t duckdb_init_get_column_count(duckdb_init_info info);
idx_t duckdb_init_get_column_index(duckdb_init_info info, idx_t column_index);
void duckdb_init_set_max_threads(duckdb_init_info info, idx_t max_threads);
void duckdb_init_set_error(duckdb_init_info info, const char *error);

Table Function

void *duckdb_function_get_extra_info(duckdb_function_info info);


void *duckdb_function_get_bind_data(duckdb_function_info info);
void *duckdb_function_get_init_data(duckdb_function_info info);
void *duckdb_function_get_local_init_data(duckdb_function_info info);
void duckdb_function_set_error(duckdb_function_info info, const char *error);

duckdb_create_table_function Creates a new empty table function.

The return value should be destroyed with duckdb_destroy_table_function.

Syntax

duckdb_table_function duckdb_create_table_function(

);

97
DuckDB Documentation

Parameters

• returns

The table function object.

duckdb_destroy_table_function Destroys the given table function object.

Syntax
void duckdb_destroy_table_function(
duckdb_table_function *table_function
);

Parameters

• table_function

The table function to destroy

duckdb_table_function_set_name Sets the name of the given table function.

Syntax
void duckdb_table_function_set_name(
duckdb_table_function table_function,
const char *name
);

Parameters

• table_function

The table function

• name

The name of the table function

duckdb_table_function_add_parameter Adds a parameter to the table function.

Syntax
void duckdb_table_function_add_parameter(
duckdb_table_function table_function,
duckdb_logical_type type
);

Parameters

• table_function

The table function

• type

The type of the parameter to add.

98
DuckDB Documentation

duckdb_table_function_add_named_parameter Adds a named parameter to the table function.

Syntax

void duckdb_table_function_add_named_parameter(
duckdb_table_function table_function,
const char *name,
duckdb_logical_type type
);

Parameters

• table_function

The table function

• name

The name of the parameter

• type

The type of the parameter to add.

duckdb_table_function_set_extra_info Assigns extra information to the table function that can be fetched during binding,
etc.

Syntax

void duckdb_table_function_set_extra_info(
duckdb_table_function table_function,
void *extra_info,
duckdb_delete_callback_t destroy
);

Parameters

• table_function

The table function

• extra_info

The extra information

• destroy

The callback that will be called to destroy the bind data (if any)

duckdb_table_function_set_bind Sets the bind function of the table function.

Syntax

void duckdb_table_function_set_bind(
duckdb_table_function table_function,
duckdb_table_function_bind_t bind
);

99
DuckDB Documentation

Parameters

• table_function

The table function

• bind

The bind function

duckdb_table_function_set_init Sets the init function of the table function.

Syntax

void duckdb_table_function_set_init(
duckdb_table_function table_function,
duckdb_table_function_init_t init
);

Parameters

• table_function

The table function

• init

The init function

duckdb_table_function_set_local_init Sets the thread‑local init function of the table function.

Syntax

void duckdb_table_function_set_local_init(
duckdb_table_function table_function,
duckdb_table_function_init_t init
);

Parameters

• table_function

The table function

• init

The init function

duckdb_table_function_set_function Sets the main function of the table function.

Syntax

void duckdb_table_function_set_function(
duckdb_table_function table_function,
duckdb_table_function_t function
);

100
DuckDB Documentation

Parameters

• table_function

The table function

• function

The function

duckdb_table_function_supports_projection_pushdown Sets whether or not the given table function supports projec‑
tion pushdown.

If this is set to true, the system will provide a list of all required columns in the init stage through the duckdb_init_get_column_
count and duckdb_init_get_column_index functions. If this is set to false (the default), the system will expect all columns to be
projected.

Syntax
void duckdb_table_function_supports_projection_pushdown(
duckdb_table_function table_function,
bool pushdown
);

Parameters

• table_function

The table function

• pushdown

True if the table function supports projection pushdown, false otherwise.

duckdb_register_table_function Register the table function object within the given connection.

The function requires at least a name, a bind function, an init function and a main function.

If the function is incomplete or a function with this name already exists DuckDBError is returned.

Syntax
duckdb_state duckdb_register_table_function(
duckdb_connection con,
duckdb_table_function function
);

Parameters

• con

The connection to register it in.

• function

The function pointer

• returns

Whether or not the registration was successful.

101
DuckDB Documentation

duckdb_bind_get_extra_info Retrieves the extra info of the function as set in duckdb_table_function_set_extra_


info.

Syntax
void *duckdb_bind_get_extra_info(
duckdb_bind_info info
);

Parameters

• info

The info object

• returns

The extra info

duckdb_bind_add_result_column Adds a result column to the output of the table function.

Syntax
void duckdb_bind_add_result_column(
duckdb_bind_info info,
const char *name,
duckdb_logical_type type
);

Parameters

• info

The info object

• name

The name of the column

• type

The logical type of the column

duckdb_bind_get_parameter_count Retrieves the number of regular (non‑named) parameters to the function.

Syntax
idx_t duckdb_bind_get_parameter_count(
duckdb_bind_info info
);

Parameters

• info

The info object

• returns

The number of parameters

102
DuckDB Documentation

duckdb_bind_get_parameter Retrieves the parameter at the given index.

The result must be destroyed with duckdb_destroy_value.

Syntax
duckdb_value duckdb_bind_get_parameter(
duckdb_bind_info info,
idx_t index
);

Parameters

• info

The info object

• index

The index of the parameter to get

• returns

The value of the parameter. Must be destroyed with duckdb_destroy_value.

duckdb_bind_get_named_parameter Retrieves a named parameter with the given name.

The result must be destroyed with duckdb_destroy_value.

Syntax
duckdb_value duckdb_bind_get_named_parameter(
duckdb_bind_info info,
const char *name
);

Parameters

• info

The info object

• name

The name of the parameter

• returns

The value of the parameter. Must be destroyed with duckdb_destroy_value.

duckdb_bind_set_bind_data Sets the user‑provided bind data in the bind object. This object can be retrieved again during exe‑
cution.

Syntax
void duckdb_bind_set_bind_data(
duckdb_bind_info info,
void *bind_data,
duckdb_delete_callback_t destroy
);

103
DuckDB Documentation

Parameters

• info

The info object

• extra_data

The bind data object.

• destroy

The callback that will be called to destroy the bind data (if any)

duckdb_bind_set_cardinality Sets the cardinality estimate for the table function, used for optimization.

Syntax

void duckdb_bind_set_cardinality(
duckdb_bind_info info,
idx_t cardinality,
bool is_exact
);

Parameters

• info

The bind data object.

• is_exact

Whether or not the cardinality estimate is exact, or an approximation

duckdb_bind_set_error Report that an error has occurred while calling bind.

Syntax

void duckdb_bind_set_error(
duckdb_bind_info info,
const char *error
);

Parameters

• info

The info object

• error

The error message

duckdb_init_get_extra_info Retrieves the extra info of the function as set in duckdb_table_function_set_extra_


info.

104
DuckDB Documentation

Syntax
void *duckdb_init_get_extra_info(
duckdb_init_info info
);

Parameters

• info

The info object

• returns

The extra info

duckdb_init_get_bind_data Gets the bind data set by duckdb_bind_set_bind_data during the bind.

Note that the bind data should be considered as read‑only. For tracking state, use the init data instead.

Syntax
void *duckdb_init_get_bind_data(
duckdb_init_info info
);

Parameters

• info

The info object

• returns

The bind data object

duckdb_init_set_init_data Sets the user‑provided init data in the init object. This object can be retrieved again during execu‑
tion.

Syntax
void duckdb_init_set_init_data(
duckdb_init_info info,
void *init_data,
duckdb_delete_callback_t destroy
);

Parameters

• info

The info object

• extra_data

The init data object.

• destroy

The callback that will be called to destroy the init data (if any)

105
DuckDB Documentation

duckdb_init_get_column_count Returns the number of projected columns.

This function must be used if projection pushdown is enabled to figure out which columns to emit.

Syntax

idx_t duckdb_init_get_column_count(
duckdb_init_info info
);

Parameters

• info

The info object

• returns

The number of projected columns.

duckdb_init_get_column_index Returns the column index of the projected column at the specified position.

This function must be used if projection pushdown is enabled to figure out which columns to emit.

Syntax

idx_t duckdb_init_get_column_index(
duckdb_init_info info,
idx_t column_index
);

Parameters

• info

The info object

• column_index

The index at which to get the projected column index, from 0..duckdb_init_get_column_count(info)

• returns

The column index of the projected column.

duckdb_init_set_max_threads Sets how many threads can process this table function in parallel (default: 1)

Syntax

void duckdb_init_set_max_threads(
duckdb_init_info info,
idx_t max_threads
);

106
DuckDB Documentation

Parameters

• info

The info object

• max_threads

The maximum amount of threads that can process this table function

duckdb_init_set_error Report that an error has occurred while calling init.

Syntax

void duckdb_init_set_error(
duckdb_init_info info,
const char *error
);

Parameters

• info

The info object

• error

The error message

duckdb_function_get_extra_info Retrieves the extra info of the function as set in duckdb_table_function_set_


extra_info.

Syntax

void *duckdb_function_get_extra_info(
duckdb_function_info info
);

Parameters

• info

The info object

• returns

The extra info

duckdb_function_get_bind_data Gets the bind data set by duckdb_bind_set_bind_data during the bind.

Note that the bind data should be considered as read‑only. For tracking state, use the init data instead.

Syntax

void *duckdb_function_get_bind_data(
duckdb_function_info info
);

107
DuckDB Documentation

Parameters

• info

The info object

• returns

The bind data object

duckdb_function_get_init_data Gets the init data set by duckdb_init_set_init_data during the init.

Syntax

void *duckdb_function_get_init_data(
duckdb_function_info info
);

Parameters

• info

The info object

• returns

The init data object

duckdb_function_get_local_init_data Gets the thread‑local init data set by duckdb_init_set_init_data during the
local_init.

Syntax

void *duckdb_function_get_local_init_data(
duckdb_function_info info
);

Parameters

• info

The info object

• returns

The init data object

duckdb_function_set_error Report that an error has occurred while executing the function.

Syntax

void duckdb_function_set_error(
duckdb_function_info info,
const char *error
);

108
DuckDB Documentation

Parameters

• info

The info object

• error

The error message

Replacement Scans

The replacement scan API can be used to register a callback that is called when a table is read that does not exist in the catalog. For example,
when a query such as SELECT * FROM my_table is executed and my_table does not exist, the replacement scan callback will be
called with my_table as parameter. The replacement scan can then insert a table function with a specific parameter to replace the read
of the table.

API Reference

void duckdb_add_replacement_scan(duckdb_database db, duckdb_replacement_callback_t replacement, void


*extra_data, duckdb_delete_callback_t delete_callback);
void duckdb_replacement_scan_set_function_name(duckdb_replacement_scan_info info, const char *function_
name);
void duckdb_replacement_scan_add_parameter(duckdb_replacement_scan_info info, duckdb_value parameter);
void duckdb_replacement_scan_set_error(duckdb_replacement_scan_info info, const char *error);

duckdb_add_replacement_scan Add a replacement scan definition to the specified database.

Syntax
void duckdb_add_replacement_scan(
duckdb_database db,
duckdb_replacement_callback_t replacement,
void *extra_data,
duckdb_delete_callback_t delete_callback
);

Parameters

• db

The database object to add the replacement scan to

• replacement

The replacement scan callback

• extra_data

Extra data that is passed back into the specified callback

• delete_callback

The delete callback to call on the extra data, if any

duckdb_replacement_scan_set_function_name Sets the replacement function name. If this function is called in the replace‑
ment callback, the replacement scan is performed. If it is not called, the replacement callback is not performed.

109
DuckDB Documentation

Syntax

void duckdb_replacement_scan_set_function_name(
duckdb_replacement_scan_info info,
const char *function_name
);

Parameters

• info

The info object

• function_name

The function name to substitute.

duckdb_replacement_scan_add_parameter Adds a parameter to the replacement scan function.

Syntax

void duckdb_replacement_scan_add_parameter(
duckdb_replacement_scan_info info,
duckdb_value parameter
);

Parameters

• info

The info object

• parameter

The parameter to add.

duckdb_replacement_scan_set_error Report that an error has occurred while executing the replacement scan.

Syntax

void duckdb_replacement_scan_set_error(
duckdb_replacement_scan_info info,
const char *error
);

Parameters

• info

The info object

• error

The error message

110
DuckDB Documentation

Complete API

API Reference

Open/Connect
duckdb_state duckdb_open(const char *path, duckdb_database *out_database);
duckdb_state duckdb_open_ext(const char *path, duckdb_database *out_database, duckdb_config config, char
**out_error);
void duckdb_close(duckdb_database *database);
duckdb_state duckdb_connect(duckdb_database database, duckdb_connection *out_connection);
void duckdb_interrupt(duckdb_connection connection);
duckdb_query_progress_type duckdb_query_progress(duckdb_connection connection);
void duckdb_disconnect(duckdb_connection *connection);
const char *duckdb_library_version();

Configuration
duckdb_state duckdb_create_config(duckdb_config *out_config);
size_t duckdb_config_count();
duckdb_state duckdb_get_config_flag(size_t index, const char **out_name, const char **out_description);
duckdb_state duckdb_set_config(duckdb_config config, const char *name, const char *option);
void duckdb_destroy_config(duckdb_config *config);

Query Execution
duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result);
void duckdb_destroy_result(duckdb_result *result);
const char *duckdb_column_name(duckdb_result *result, idx_t col);
duckdb_type duckdb_column_type(duckdb_result *result, idx_t col);
duckdb_statement_type duckdb_result_statement_type(duckdb_result result);
duckdb_logical_type duckdb_column_logical_type(duckdb_result *result, idx_t col);
idx_t duckdb_column_count(duckdb_result *result);
idx_t duckdb_row_count(duckdb_result *result);
idx_t duckdb_rows_changed(duckdb_result *result);
void *duckdb_column_data(duckdb_result *result, idx_t col);
bool *duckdb_nullmask_data(duckdb_result *result, idx_t col);
const char *duckdb_result_error(duckdb_result *result);

Result Functions
duckdb_data_chunk duckdb_result_get_chunk(duckdb_result result, idx_t chunk_index);
bool duckdb_result_is_streaming(duckdb_result result);
idx_t duckdb_result_chunk_count(duckdb_result result);
duckdb_result_type duckdb_result_return_type(duckdb_result result);

Safe fetch functions


bool duckdb_value_boolean(duckdb_result *result, idx_t col, idx_t row);
int8_t duckdb_value_int8(duckdb_result *result, idx_t col, idx_t row);
int16_t duckdb_value_int16(duckdb_result *result, idx_t col, idx_t row);
int32_t duckdb_value_int32(duckdb_result *result, idx_t col, idx_t row);
int64_t duckdb_value_int64(duckdb_result *result, idx_t col, idx_t row);
duckdb_hugeint duckdb_value_hugeint(duckdb_result *result, idx_t col, idx_t row);
duckdb_uhugeint duckdb_value_uhugeint(duckdb_result *result, idx_t col, idx_t row);
duckdb_decimal duckdb_value_decimal(duckdb_result *result, idx_t col, idx_t row);
uint8_t duckdb_value_uint8(duckdb_result *result, idx_t col, idx_t row);
uint16_t duckdb_value_uint16(duckdb_result *result, idx_t col, idx_t row);

111
DuckDB Documentation

uint32_t duckdb_value_uint32(duckdb_result *result, idx_t col, idx_t row);


uint64_t duckdb_value_uint64(duckdb_result *result, idx_t col, idx_t row);
float duckdb_value_float(duckdb_result *result, idx_t col, idx_t row);
double duckdb_value_double(duckdb_result *result, idx_t col, idx_t row);
duckdb_date duckdb_value_date(duckdb_result *result, idx_t col, idx_t row);
duckdb_time duckdb_value_time(duckdb_result *result, idx_t col, idx_t row);
duckdb_timestamp duckdb_value_timestamp(duckdb_result *result, idx_t col, idx_t row);
duckdb_interval duckdb_value_interval(duckdb_result *result, idx_t col, idx_t row);
char *duckdb_value_varchar(duckdb_result *result, idx_t col, idx_t row);
duckdb_string duckdb_value_string(duckdb_result *result, idx_t col, idx_t row);
char *duckdb_value_varchar_internal(duckdb_result *result, idx_t col, idx_t row);
duckdb_string duckdb_value_string_internal(duckdb_result *result, idx_t col, idx_t row);
duckdb_blob duckdb_value_blob(duckdb_result *result, idx_t col, idx_t row);
bool duckdb_value_is_null(duckdb_result *result, idx_t col, idx_t row);

Helpers
void *duckdb_malloc(size_t size);
void duckdb_free(void *ptr);
idx_t duckdb_vector_size();
bool duckdb_string_is_inlined(duckdb_string_t string);

Date/Time/Timestamp Helpers
duckdb_date_struct duckdb_from_date(duckdb_date date);
duckdb_date duckdb_to_date(duckdb_date_struct date);
bool duckdb_is_finite_date(duckdb_date date);
duckdb_time_struct duckdb_from_time(duckdb_time time);
duckdb_time_tz duckdb_create_time_tz(int64_t micros, int32_t offset);
duckdb_time_tz_struct duckdb_from_time_tz(duckdb_time_tz micros);
duckdb_time duckdb_to_time(duckdb_time_struct time);
duckdb_timestamp_struct duckdb_from_timestamp(duckdb_timestamp ts);
duckdb_timestamp duckdb_to_timestamp(duckdb_timestamp_struct ts);
bool duckdb_is_finite_timestamp(duckdb_timestamp ts);

Hugeint Helpers
double duckdb_hugeint_to_double(duckdb_hugeint val);
duckdb_hugeint duckdb_double_to_hugeint(double val);

Unsigned Hugeint Helpers


double duckdb_uhugeint_to_double(duckdb_uhugeint val);
duckdb_uhugeint duckdb_double_to_uhugeint(double val);

Decimal Helpers
duckdb_decimal duckdb_double_to_decimal(double val, uint8_t width, uint8_t scale);
double duckdb_decimal_to_double(duckdb_decimal val);

Prepared Statements
duckdb_state duckdb_prepare(duckdb_connection connection, const char *query, duckdb_prepared_statement
*out_prepared_statement);
void duckdb_destroy_prepare(duckdb_prepared_statement *prepared_statement);
const char *duckdb_prepare_error(duckdb_prepared_statement prepared_statement);
idx_t duckdb_nparams(duckdb_prepared_statement prepared_statement);

112
DuckDB Documentation

const char *duckdb_parameter_name(duckdb_prepared_statement prepared_statement, idx_t index);


duckdb_type duckdb_param_type(duckdb_prepared_statement prepared_statement, idx_t param_idx);
duckdb_state duckdb_clear_bindings(duckdb_prepared_statement prepared_statement);
duckdb_statement_type duckdb_prepared_statement_type(duckdb_prepared_statement statement);

Bind Values to Prepared Statements

duckdb_state duckdb_bind_value(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_


value val);
duckdb_state duckdb_bind_parameter_index(duckdb_prepared_statement prepared_statement, idx_t *param_idx_
out, const char *name);
duckdb_state duckdb_bind_boolean(duckdb_prepared_statement prepared_statement, idx_t param_idx, bool
val);
duckdb_state duckdb_bind_int8(duckdb_prepared_statement prepared_statement, idx_t param_idx, int8_t
val);
duckdb_state duckdb_bind_int16(duckdb_prepared_statement prepared_statement, idx_t param_idx, int16_t
val);
duckdb_state duckdb_bind_int32(duckdb_prepared_statement prepared_statement, idx_t param_idx, int32_t
val);
duckdb_state duckdb_bind_int64(duckdb_prepared_statement prepared_statement, idx_t param_idx, int64_t
val);
duckdb_state duckdb_bind_hugeint(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_
hugeint val);
duckdb_state duckdb_bind_uhugeint(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_
uhugeint val);
duckdb_state duckdb_bind_decimal(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_
decimal val);
duckdb_state duckdb_bind_uint8(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint8_t
val);
duckdb_state duckdb_bind_uint16(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint16_t
val);
duckdb_state duckdb_bind_uint32(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint32_t
val);
duckdb_state duckdb_bind_uint64(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint64_t
val);
duckdb_state duckdb_bind_float(duckdb_prepared_statement prepared_statement, idx_t param_idx, float
val);
duckdb_state duckdb_bind_double(duckdb_prepared_statement prepared_statement, idx_t param_idx, double
val);
duckdb_state duckdb_bind_date(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_date
val);
duckdb_state duckdb_bind_time(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_time
val);
duckdb_state duckdb_bind_timestamp(duckdb_prepared_statement prepared_statement, idx_t param_idx,
duckdb_timestamp val);
duckdb_state duckdb_bind_interval(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_
interval val);
duckdb_state duckdb_bind_varchar(duckdb_prepared_statement prepared_statement, idx_t param_idx, const
char *val);
duckdb_state duckdb_bind_varchar_length(duckdb_prepared_statement prepared_statement, idx_t param_idx,
const char *val, idx_t length);
duckdb_state duckdb_bind_blob(duckdb_prepared_statement prepared_statement, idx_t param_idx, const void
*data, idx_t length);
duckdb_state duckdb_bind_null(duckdb_prepared_statement prepared_statement, idx_t param_idx);

Execute Prepared Statements

113
DuckDB Documentation

duckdb_state duckdb_execute_prepared(duckdb_prepared_statement prepared_statement, duckdb_result *out_


result);
duckdb_state duckdb_execute_prepared_streaming(duckdb_prepared_statement prepared_statement, duckdb_
result *out_result);

Extract Statements
idx_t duckdb_extract_statements(duckdb_connection connection, const char *query, duckdb_extracted_
statements *out_extracted_statements);
duckdb_state duckdb_prepare_extracted_statement(duckdb_connection connection, duckdb_extracted_
statements extracted_statements, idx_t index, duckdb_prepared_statement *out_prepared_statement);
const char *duckdb_extract_statements_error(duckdb_extracted_statements extracted_statements);
void duckdb_destroy_extracted(duckdb_extracted_statements *extracted_statements);

Pending Result Interface


duckdb_state duckdb_pending_prepared(duckdb_prepared_statement prepared_statement, duckdb_pending_result
*out_result);
duckdb_state duckdb_pending_prepared_streaming(duckdb_prepared_statement prepared_statement, duckdb_
pending_result *out_result);
void duckdb_destroy_pending(duckdb_pending_result *pending_result);
const char *duckdb_pending_error(duckdb_pending_result pending_result);
duckdb_pending_state duckdb_pending_execute_task(duckdb_pending_result pending_result);
duckdb_pending_state duckdb_pending_execute_check_state(duckdb_pending_result pending_result);
duckdb_state duckdb_execute_pending(duckdb_pending_result pending_result, duckdb_result *out_result);
bool duckdb_pending_execution_is_finished(duckdb_pending_state pending_state);

Value Interface
void duckdb_destroy_value(duckdb_value *value);
duckdb_value duckdb_create_varchar(const char *text);
duckdb_value duckdb_create_varchar_length(const char *text, idx_t length);
duckdb_value duckdb_create_int64(int64_t val);
duckdb_value duckdb_create_struct_value(duckdb_logical_type type, duckdb_value *values);
duckdb_value duckdb_create_list_value(duckdb_logical_type type, duckdb_value *values, idx_t value_
count);
duckdb_value duckdb_create_array_value(duckdb_logical_type type, duckdb_value *values, idx_t value_
count);
char *duckdb_get_varchar(duckdb_value value);
int64_t duckdb_get_int64(duckdb_value value);

Logical Type Interface


duckdb_logical_type duckdb_create_logical_type(duckdb_type type);
char *duckdb_logical_type_get_alias(duckdb_logical_type type);
duckdb_logical_type duckdb_create_list_type(duckdb_logical_type type);
duckdb_logical_type duckdb_create_array_type(duckdb_logical_type type, idx_t array_size);
duckdb_logical_type duckdb_create_map_type(duckdb_logical_type key_type, duckdb_logical_type value_
type);
duckdb_logical_type duckdb_create_union_type(duckdb_logical_type *member_types, const char **member_
names, idx_t member_count);
duckdb_logical_type duckdb_create_struct_type(duckdb_logical_type *member_types, const char **member_
names, idx_t member_count);
duckdb_logical_type duckdb_create_enum_type(const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_decimal_type(uint8_t width, uint8_t scale);
duckdb_type duckdb_get_type_id(duckdb_logical_type type);
uint8_t duckdb_decimal_width(duckdb_logical_type type);
uint8_t duckdb_decimal_scale(duckdb_logical_type type);

114
DuckDB Documentation

duckdb_type duckdb_decimal_internal_type(duckdb_logical_type type);


duckdb_type duckdb_enum_internal_type(duckdb_logical_type type);
uint32_t duckdb_enum_dictionary_size(duckdb_logical_type type);
char *duckdb_enum_dictionary_value(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_list_type_child_type(duckdb_logical_type type);
duckdb_logical_type duckdb_array_type_child_type(duckdb_logical_type type);
idx_t duckdb_array_type_array_size(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_key_type(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_value_type(duckdb_logical_type type);
idx_t duckdb_struct_type_child_count(duckdb_logical_type type);
char *duckdb_struct_type_child_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_struct_type_child_type(duckdb_logical_type type, idx_t index);
idx_t duckdb_union_type_member_count(duckdb_logical_type type);
char *duckdb_union_type_member_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_union_type_member_type(duckdb_logical_type type, idx_t index);
void duckdb_destroy_logical_type(duckdb_logical_type *type);

Data Chunk Interface

duckdb_data_chunk duckdb_create_data_chunk(duckdb_logical_type *types, idx_t column_count);


void duckdb_destroy_data_chunk(duckdb_data_chunk *chunk);
void duckdb_data_chunk_reset(duckdb_data_chunk chunk);
idx_t duckdb_data_chunk_get_column_count(duckdb_data_chunk chunk);
duckdb_vector duckdb_data_chunk_get_vector(duckdb_data_chunk chunk, idx_t col_idx);
idx_t duckdb_data_chunk_get_size(duckdb_data_chunk chunk);
void duckdb_data_chunk_set_size(duckdb_data_chunk chunk, idx_t size);

Vector Interface

duckdb_logical_type duckdb_vector_get_column_type(duckdb_vector vector);


void *duckdb_vector_get_data(duckdb_vector vector);
uint64_t *duckdb_vector_get_validity(duckdb_vector vector);
void duckdb_vector_ensure_validity_writable(duckdb_vector vector);
void duckdb_vector_assign_string_element(duckdb_vector vector, idx_t index, const char *str);
void duckdb_vector_assign_string_element_len(duckdb_vector vector, idx_t index, const char *str, idx_t
str_len);
duckdb_vector duckdb_list_vector_get_child(duckdb_vector vector);
idx_t duckdb_list_vector_get_size(duckdb_vector vector);
duckdb_state duckdb_list_vector_set_size(duckdb_vector vector, idx_t size);
duckdb_state duckdb_list_vector_reserve(duckdb_vector vector, idx_t required_capacity);
duckdb_vector duckdb_struct_vector_get_child(duckdb_vector vector, idx_t index);
duckdb_vector duckdb_array_vector_get_child(duckdb_vector vector);

Validity Mask Functions

bool duckdb_validity_row_is_valid(uint64_t *validity, idx_t row);


void duckdb_validity_set_row_validity(uint64_t *validity, idx_t row, bool valid);
void duckdb_validity_set_row_invalid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_valid(uint64_t *validity, idx_t row);

Table Functions

duckdb_table_function duckdb_create_table_function();
void duckdb_destroy_table_function(duckdb_table_function *table_function);
void duckdb_table_function_set_name(duckdb_table_function table_function, const char *name);
void duckdb_table_function_add_parameter(duckdb_table_function table_function, duckdb_logical_type
type);

115
DuckDB Documentation

void duckdb_table_function_add_named_parameter(duckdb_table_function table_function, const char *name,


duckdb_logical_type type);
void duckdb_table_function_set_extra_info(duckdb_table_function table_function, void *extra_info,
duckdb_delete_callback_t destroy);
void duckdb_table_function_set_bind(duckdb_table_function table_function, duckdb_table_function_bind_t
bind);
void duckdb_table_function_set_init(duckdb_table_function table_function, duckdb_table_function_init_t
init);
void duckdb_table_function_set_local_init(duckdb_table_function table_function, duckdb_table_function_
init_t init);
void duckdb_table_function_set_function(duckdb_table_function table_function, duckdb_table_function_t
function);
void duckdb_table_function_supports_projection_pushdown(duckdb_table_function table_function, bool
pushdown);
duckdb_state duckdb_register_table_function(duckdb_connection con, duckdb_table_function function);

Table Function Bind

void *duckdb_bind_get_extra_info(duckdb_bind_info info);


void duckdb_bind_add_result_column(duckdb_bind_info info, const char *name, duckdb_logical_type type);
idx_t duckdb_bind_get_parameter_count(duckdb_bind_info info);
duckdb_value duckdb_bind_get_parameter(duckdb_bind_info info, idx_t index);
duckdb_value duckdb_bind_get_named_parameter(duckdb_bind_info info, const char *name);
void duckdb_bind_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t
destroy);
void duckdb_bind_set_cardinality(duckdb_bind_info info, idx_t cardinality, bool is_exact);
void duckdb_bind_set_error(duckdb_bind_info info, const char *error);

Table Function Init

void *duckdb_init_get_extra_info(duckdb_init_info info);


void *duckdb_init_get_bind_data(duckdb_init_info info);
void duckdb_init_set_init_data(duckdb_init_info info, void *init_data, duckdb_delete_callback_t
destroy);
idx_t duckdb_init_get_column_count(duckdb_init_info info);
idx_t duckdb_init_get_column_index(duckdb_init_info info, idx_t column_index);
void duckdb_init_set_max_threads(duckdb_init_info info, idx_t max_threads);
void duckdb_init_set_error(duckdb_init_info info, const char *error);

Table Function

void *duckdb_function_get_extra_info(duckdb_function_info info);


void *duckdb_function_get_bind_data(duckdb_function_info info);
void *duckdb_function_get_init_data(duckdb_function_info info);
void *duckdb_function_get_local_init_data(duckdb_function_info info);
void duckdb_function_set_error(duckdb_function_info info, const char *error);

Replacement Scans

void duckdb_add_replacement_scan(duckdb_database db, duckdb_replacement_callback_t replacement, void


*extra_data, duckdb_delete_callback_t delete_callback);
void duckdb_replacement_scan_set_function_name(duckdb_replacement_scan_info info, const char *function_
name);
void duckdb_replacement_scan_add_parameter(duckdb_replacement_scan_info info, duckdb_value parameter);
void duckdb_replacement_scan_set_error(duckdb_replacement_scan_info info, const char *error);

116
DuckDB Documentation

Appender

duckdb_state duckdb_appender_create(duckdb_connection connection, const char *schema, const char *table,


duckdb_appender *out_appender);
idx_t duckdb_appender_column_count(duckdb_appender appender);
duckdb_logical_type duckdb_appender_column_type(duckdb_appender appender, idx_t col_idx);
const char *duckdb_appender_error(duckdb_appender appender);
duckdb_state duckdb_appender_flush(duckdb_appender appender);
duckdb_state duckdb_appender_close(duckdb_appender appender);
duckdb_state duckdb_appender_destroy(duckdb_appender *appender);
duckdb_state duckdb_appender_begin_row(duckdb_appender appender);
duckdb_state duckdb_appender_end_row(duckdb_appender appender);
duckdb_state duckdb_append_bool(duckdb_appender appender, bool value);
duckdb_state duckdb_append_int8(duckdb_appender appender, int8_t value);
duckdb_state duckdb_append_int16(duckdb_appender appender, int16_t value);
duckdb_state duckdb_append_int32(duckdb_appender appender, int32_t value);
duckdb_state duckdb_append_int64(duckdb_appender appender, int64_t value);
duckdb_state duckdb_append_hugeint(duckdb_appender appender, duckdb_hugeint value);
duckdb_state duckdb_append_uint8(duckdb_appender appender, uint8_t value);
duckdb_state duckdb_append_uint16(duckdb_appender appender, uint16_t value);
duckdb_state duckdb_append_uint32(duckdb_appender appender, uint32_t value);
duckdb_state duckdb_append_uint64(duckdb_appender appender, uint64_t value);
duckdb_state duckdb_append_uhugeint(duckdb_appender appender, duckdb_uhugeint value);
duckdb_state duckdb_append_float(duckdb_appender appender, float value);
duckdb_state duckdb_append_double(duckdb_appender appender, double value);
duckdb_state duckdb_append_date(duckdb_appender appender, duckdb_date value);
duckdb_state duckdb_append_time(duckdb_appender appender, duckdb_time value);
duckdb_state duckdb_append_timestamp(duckdb_appender appender, duckdb_timestamp value);
duckdb_state duckdb_append_interval(duckdb_appender appender, duckdb_interval value);
duckdb_state duckdb_append_varchar(duckdb_appender appender, const char *val);
duckdb_state duckdb_append_varchar_length(duckdb_appender appender, const char *val, idx_t length);
duckdb_state duckdb_append_blob(duckdb_appender appender, const void *data, idx_t length);
duckdb_state duckdb_append_null(duckdb_appender appender);
duckdb_state duckdb_append_data_chunk(duckdb_appender appender, duckdb_data_chunk chunk);

Arrow Interface

duckdb_state duckdb_query_arrow(duckdb_connection connection, const char *query, duckdb_arrow *out_


result);
duckdb_state duckdb_query_arrow_schema(duckdb_arrow result, duckdb_arrow_schema *out_schema);
duckdb_state duckdb_prepared_arrow_schema(duckdb_prepared_statement prepared, duckdb_arrow_schema *out_
schema);
void duckdb_result_arrow_array(duckdb_result result, duckdb_data_chunk chunk, duckdb_arrow_array *out_
array);
duckdb_state duckdb_query_arrow_array(duckdb_arrow result, duckdb_arrow_array *out_array);
idx_t duckdb_arrow_column_count(duckdb_arrow result);
idx_t duckdb_arrow_row_count(duckdb_arrow result);
idx_t duckdb_arrow_rows_changed(duckdb_arrow result);
const char *duckdb_query_arrow_error(duckdb_arrow result);
void duckdb_destroy_arrow(duckdb_arrow *result);
void duckdb_destroy_arrow_stream(duckdb_arrow_stream *stream_p);
duckdb_state duckdb_execute_prepared_arrow(duckdb_prepared_statement prepared_statement, duckdb_arrow
*out_result);
duckdb_state duckdb_arrow_scan(duckdb_connection connection, const char *table_name, duckdb_arrow_stream
arrow);
duckdb_state duckdb_arrow_array_scan(duckdb_connection connection, const char *table_name, duckdb_arrow_
schema arrow_schema, duckdb_arrow_array arrow_array, duckdb_arrow_stream *out_stream);

117
DuckDB Documentation

Threading Information
void duckdb_execute_tasks(duckdb_database database, idx_t max_tasks);
duckdb_task_state duckdb_create_task_state(duckdb_database database);
void duckdb_execute_tasks_state(duckdb_task_state state);
idx_t duckdb_execute_n_tasks_state(duckdb_task_state state, idx_t max_tasks);
void duckdb_finish_execution(duckdb_task_state state);
bool duckdb_task_state_is_finished(duckdb_task_state state);
void duckdb_destroy_task_state(duckdb_task_state state);
bool duckdb_execution_is_finished(duckdb_connection con);

Streaming Result Interface


duckdb_data_chunk duckdb_stream_fetch_chunk(duckdb_result result);

duckdb_open Creates a new database or opens an existing database file stored at the given path. If no path is given a new in‑memory
database is created instead. The instantiated database should be closed with 'duckdb_close'.

Syntax
duckdb_state duckdb_open(
const char *path,
duckdb_database *out_database
);

Parameters

• path

Path to the database file on disk, or nullptr or :memory: to open an in‑memory database.

• out_database

The result database object.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_open_ext Extended version of duckdb_open. Creates a new database or opens an existing database file stored at the given
path. The instantiated database should be closed with 'duckdb_close'.

Syntax
duckdb_state duckdb_open_ext(
const char *path,
duckdb_database *out_database,
duckdb_config config,
char **out_error
);

Parameters

• path

Path to the database file on disk, or nullptr or :memory: to open an in‑memory database.

• out_database

118
DuckDB Documentation

The result database object.

• config

(Optional) configuration used to start up the database system.

• out_error

If set and the function returns DuckDBError, this will contain the reason why the start‑up failed. Note that the error must be freed using
duckdb_free.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_close Closes the specified database and de‑allocates all memory allocated for that database. This should be called after you
are done with any database allocated through duckdb_open or duckdb_open_ext. Note that failing to call duckdb_close (in case
of e.g., a program crash) will not cause data corruption. Still, it is recommended to always correctly close a database object after you are
done with it.

Syntax

void duckdb_close(
duckdb_database *database
);

Parameters

• database

The database object to shut down.

duckdb_connect Opens a connection to a database. Connections are required to query the database, and store transactional state
associated with the connection. The instantiated connection should be closed using 'duckdb_disconnect'.

Syntax

duckdb_state duckdb_connect(
duckdb_database database,
duckdb_connection *out_connection
);

Parameters

• database

The database file to connect to.

• out_connection

The result connection object.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_interrupt Interrupt running query

119
DuckDB Documentation

Syntax

void duckdb_interrupt(
duckdb_connection connection
);

Parameters

• connection

The connection to interrupt

duckdb_query_progress Get progress of the running query

Syntax

duckdb_query_progress_type duckdb_query_progress(
duckdb_connection connection
);

Parameters

• connection

The working connection

• returns

‑1 if no progress or a percentage of the progress

duckdb_disconnect Closes the specified connection and de‑allocates all memory allocated for that connection.

Syntax

void duckdb_disconnect(
duckdb_connection *connection
);

Parameters

• connection

The connection to close.

duckdb_library_version Returns the version of the linked DuckDB, with a version postfix for dev versions

Usually used for developing C extensions that must return this for a compatibility check.

Syntax

const char *duckdb_library_version(

);

120
DuckDB Documentation

duckdb_create_config Initializes an empty configuration object that can be used to provide start‑up options for the DuckDB in‑
stance through duckdb_open_ext. The duckdb_config must be destroyed using 'duckdb_destroy_config'

This will always succeed unless there is a malloc failure.

Syntax

duckdb_state duckdb_create_config(
duckdb_config *out_config
);

Parameters

• out_config

The result configuration object.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_config_count This returns the total amount of configuration options available for usage with duckdb_get_config_
flag.

This should not be called in a loop as it internally loops over all the options.

Syntax

size_t duckdb_config_count(

);

Parameters

• returns

The amount of config options available.

duckdb_get_config_flag Obtains a human‑readable name and description of a specific configuration option. This can be used to
e.g. display configuration options. This will succeed unless index is out of range (i.e., >= duckdb_config_count).

The result name or description MUST NOT be freed.

Syntax

duckdb_state duckdb_get_config_flag(
size_t index,
const char **out_name,
const char **out_description
);

121
DuckDB Documentation

Parameters

• index

The index of the configuration option (between 0 and duckdb_config_count)

• out_name

A name of the configuration flag.

• out_description

A description of the configuration flag.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_set_config Sets the specified option for the specified configuration. The configuration option is indicated by name. To
obtain a list of config options, see duckdb_get_config_flag.

In the source code, configuration options are defined in config.cpp.

This can fail if either the name is invalid, or if the value provided for the option is invalid.

Syntax

duckdb_state duckdb_set_config(
duckdb_config config,
const char *name,
const char *option
);

Parameters

• duckdb_config

The configuration object to set the option on.

• name

The name of the configuration flag to set.

• option

The value to set the configuration flag to.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_destroy_config Destroys the specified configuration object and de‑allocates all memory allocated for the object.

Syntax

void duckdb_destroy_config(
duckdb_config *config
);

122
DuckDB Documentation

Parameters

• config

The configuration object to destroy.

duckdb_query Executes a SQL query within a connection and stores the full (materialized) result in the out_result pointer. If the query
fails to execute, DuckDBError is returned and the error message can be retrieved by calling duckdb_result_error.

Note that after running duckdb_query, duckdb_destroy_result must be called on the result object even if the query fails, other‑
wise the error stored within the result will not be freed correctly.

Syntax

duckdb_state duckdb_query(
duckdb_connection connection,
const char *query,
duckdb_result *out_result
);

Parameters

• connection

The connection to perform the query in.

• query

The SQL query to run.

• out_result

The query result.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_destroy_result Closes the result and de‑allocates all memory allocated for that connection.

Syntax

void duckdb_destroy_result(
duckdb_result *result
);

Parameters

• result

The result to destroy.

duckdb_column_name Returns the column name of the specified column. The result should not need to be freed; the column names
will automatically be destroyed when the result is destroyed.

Returns NULL if the column is out of range.

123
DuckDB Documentation

Syntax
const char *duckdb_column_name(
duckdb_result *result,
idx_t col
);

Parameters

• result

The result object to fetch the column name from.

• col

The column index.

• returns

The column name of the specified column.

duckdb_column_type Returns the column type of the specified column.

Returns DUCKDB_TYPE_INVALID if the column is out of range.

Syntax
duckdb_type duckdb_column_type(
duckdb_result *result,
idx_t col
);

Parameters

• result

The result object to fetch the column type from.

• col

The column index.

• returns

The column type of the specified column.

duckdb_result_statement_type Returns the statement type of the statement that was executed

Syntax
duckdb_statement_type duckdb_result_statement_type(
duckdb_result result
);

Parameters

• result

The result object to fetch the statement type from.

• returns

duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID

124
DuckDB Documentation

duckdb_column_logical_type Returns the logical column type of the specified column.

The return type of this call should be destroyed with duckdb_destroy_logical_type.

Returns NULL if the column is out of range.

Syntax
duckdb_logical_type duckdb_column_logical_type(
duckdb_result *result,
idx_t col
);

Parameters

• result

The result object to fetch the column type from.

• col

The column index.

• returns

The logical column type of the specified column.

duckdb_column_count Returns the number of columns present in a the result object.

Syntax
idx_t duckdb_column_count(
duckdb_result *result
);

Parameters

• result

The result object.

• returns

The number of columns present in the result object.

duckdb_row_count Returns the number of rows present in the result object.

Syntax
idx_t duckdb_row_count(
duckdb_result *result
);

Parameters

• result

The result object.

• returns

The number of rows present in the result object.

125
DuckDB Documentation

duckdb_rows_changed Returns the number of rows changed by the query stored in the result. This is relevant only for IN‑
SERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0.

Syntax

idx_t duckdb_rows_changed(
duckdb_result *result
);

Parameters

• result

The result object.

• returns

The number of rows changed.

duckdb_column_data DEPRECATED: Prefer using duckdb_result_get_chunk instead.

Returns the data of a specific column of a result in columnar format.

The function returns a dense array which contains the result data. The exact type stored in the array depends on the corresponding duckdb_
type (as provided by duckdb_column_type). For the exact type by which the data should be accessed, see the comments in the types
section or the DUCKDB_TYPE enum.

For example, for a column of type DUCKDB_TYPE_INTEGER, rows can be accessed in the following manner:

int32_t *data = (int32_t *) duckdb_column_data(&result, 0);


printf("Data for row %d: %d\n", row, data[row]);

Syntax

void *duckdb_column_data(
duckdb_result *result,
idx_t col
);

Parameters

• result

The result object to fetch the column data from.

• col

The column index.

• returns

The column data of the specified column.

126
DuckDB Documentation

duckdb_nullmask_data DEPRECATED: Prefer using duckdb_result_get_chunk instead.

Returns the nullmask of a specific column of a result in columnar format. The nullmask indicates for every row whether or not the corre‑
sponding row is NULL. If a row is NULL, the values present in the array provided by duckdb_column_data are undefined.

int32_t *data = (int32_t *) duckdb_column_data(&result, 0);


bool *nullmask = duckdb_nullmask_data(&result, 0);
if (nullmask[row]) {
printf("Data for row %d: NULL\n", row);
} else {
printf("Data for row %d: %d\n", row, data[row]);
}

Syntax

bool *duckdb_nullmask_data(
duckdb_result *result,
idx_t col
);

Parameters

• result

The result object to fetch the nullmask from.

• col

The column index.

• returns

The nullmask of the specified column.

duckdb_result_error Returns the error message contained within the result. The error is only set if duckdb_query returns
DuckDBError.

The result of this function must not be freed. It will be cleaned up when duckdb_destroy_result is called.

Syntax

const char *duckdb_result_error(


duckdb_result *result
);

Parameters

• result

The result object to fetch the error from.

• returns

The error of the result.

127
DuckDB Documentation

duckdb_result_get_chunk Fetches a data chunk from the duckdb_result. This function should be called repeatedly until the result
is exhausted.

The result must be destroyed with duckdb_destroy_data_chunk.

This function supersedes all duckdb_value functions, as well as the duckdb_column_data and duckdb_nullmask_data func‑
tions. It results in significantly better performance, and should be preferred in newer code‑bases.

If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy
result functions).

Use duckdb_result_chunk_count to figure out how many chunks there are in the result.

Syntax
duckdb_data_chunk duckdb_result_get_chunk(
duckdb_result result,
idx_t chunk_index
);

Parameters

• result

The result object to fetch the data chunk from.

• chunk_index

The chunk index to fetch from.

• returns

The resulting data chunk. Returns NULL if the chunk index is out of bounds.

duckdb_result_is_streaming Checks if the type of the internal result is StreamQueryResult.

Syntax
bool duckdb_result_is_streaming(
duckdb_result result
);

Parameters

• result

The result object to check.

• returns

Whether or not the result object is of the type StreamQueryResult

duckdb_result_chunk_count Returns the number of data chunks present in the result.

Syntax
idx_t duckdb_result_chunk_count(
duckdb_result result
);

128
DuckDB Documentation

Parameters

• result

The result object

• returns

Number of data chunks present in the result.

duckdb_result_return_type Returns the return_type of the given result, or DUCKDB_RETURN_TYPE_INVALID on error

Syntax

duckdb_result_type duckdb_result_return_type(
duckdb_result result
);

Parameters

• result

The result object

• returns

The return_type

duckdb_value_boolean

Syntax

bool duckdb_value_boolean(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The boolean value at the specified location, or false if the value cannot be converted.

duckdb_value_int8

Syntax

int8_t duckdb_value_int8(
duckdb_result *result,
idx_t col,
idx_t row
);

129
DuckDB Documentation

Parameters

• returns

The int8_t value at the specified location, or 0 if the value cannot be converted.

duckdb_value_int16

Syntax
int16_t duckdb_value_int16(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The int16_t value at the specified location, or 0 if the value cannot be converted.

duckdb_value_int32

Syntax
int32_t duckdb_value_int32(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The int32_t value at the specified location, or 0 if the value cannot be converted.

duckdb_value_int64

Syntax
int64_t duckdb_value_int64(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The int64_t value at the specified location, or 0 if the value cannot be converted.

duckdb_value_hugeint

130
DuckDB Documentation

Syntax

duckdb_hugeint duckdb_value_hugeint(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The duckdb_hugeint value at the specified location, or 0 if the value cannot be converted.

duckdb_value_uhugeint

Syntax

duckdb_uhugeint duckdb_value_uhugeint(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The duckdb_uhugeint value at the specified location, or 0 if the value cannot be converted.

duckdb_value_decimal

Syntax

duckdb_decimal duckdb_value_decimal(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The duckdb_decimal value at the specified location, or 0 if the value cannot be converted.

duckdb_value_uint8

Syntax

uint8_t duckdb_value_uint8(
duckdb_result *result,
idx_t col,
idx_t row
);

131
DuckDB Documentation

Parameters

• returns

The uint8_t value at the specified location, or 0 if the value cannot be converted.

duckdb_value_uint16

Syntax
uint16_t duckdb_value_uint16(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The uint16_t value at the specified location, or 0 if the value cannot be converted.

duckdb_value_uint32

Syntax
uint32_t duckdb_value_uint32(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The uint32_t value at the specified location, or 0 if the value cannot be converted.

duckdb_value_uint64

Syntax
uint64_t duckdb_value_uint64(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The uint64_t value at the specified location, or 0 if the value cannot be converted.

duckdb_value_float

132
DuckDB Documentation

Syntax

float duckdb_value_float(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The float value at the specified location, or 0 if the value cannot be converted.

duckdb_value_double

Syntax

double duckdb_value_double(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The double value at the specified location, or 0 if the value cannot be converted.

duckdb_value_date

Syntax

duckdb_date duckdb_value_date(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The duckdb_date value at the specified location, or 0 if the value cannot be converted.

duckdb_value_time

Syntax

duckdb_time duckdb_value_time(
duckdb_result *result,
idx_t col,
idx_t row
);

133
DuckDB Documentation

Parameters

• returns

The duckdb_time value at the specified location, or 0 if the value cannot be converted.

duckdb_value_timestamp

Syntax
duckdb_timestamp duckdb_value_timestamp(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The duckdb_timestamp value at the specified location, or 0 if the value cannot be converted.

duckdb_value_interval

Syntax
duckdb_interval duckdb_value_interval(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The duckdb_interval value at the specified location, or 0 if the value cannot be converted.

duckdb_value_varchar

Syntax
char *duckdb_value_varchar(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• DEPRECATED

use duckdb_value_string instead. This function does not work correctly if the string contains null bytes.

• returns

The text value at the specified location as a null‑terminated string, or nullptr if the value cannot be converted. The result must be freed
with duckdb_free.

134
DuckDB Documentation

duckdb_value_string

Syntax

duckdb_string duckdb_value_string(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The string value at the specified location.

• The resulting field ”string.data” must be freed with duckdb_free.

duckdb_value_varchar_internal

Syntax

char *duckdb_value_varchar_internal(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• DEPRECATED

use duckdb_value_string_internal instead. This function does not work correctly if the string contains null bytes.

• returns

The char* value at the specified location. ONLY works on VARCHAR columns and does not auto‑cast. If the column is NOT a VARCHAR
column this function will return NULL.

The result must NOT be freed.

duckdb_value_string_internal

Syntax

duckdb_string duckdb_value_string_internal(
duckdb_result *result,
idx_t col,
idx_t row
);

135
DuckDB Documentation

Parameters

• DEPRECATED

use duckdb_value_string_internal instead. This function does not work correctly if the string contains null bytes.

• returns

The char* value at the specified location. ONLY works on VARCHAR columns and does not auto‑cast. If the column is NOT a VARCHAR
column this function will return NULL.

The result must NOT be freed.

duckdb_value_blob

Syntax

duckdb_blob duckdb_value_blob(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

The duckdb_blob value at the specified location. Returns a blob with blob.data set to nullptr if the value cannot be converted. The resulting
field ”blob.data” must be freed with duckdb_free.

duckdb_value_is_null

Syntax

bool duckdb_value_is_null(
duckdb_result *result,
idx_t col,
idx_t row
);

Parameters

• returns

Returns true if the value at the specified index is NULL, and false otherwise.

duckdb_malloc Allocate size bytes of memory using the duckdb internal malloc function. Any memory allocated in this manner
should be freed using duckdb_free.

Syntax

void *duckdb_malloc(
size_t size
);

136
DuckDB Documentation

Parameters

• size

The number of bytes to allocate.

• returns

A pointer to the allocated memory region.

duckdb_free Free a value returned from duckdb_malloc, duckdb_value_varchar, duckdb_value_blob, or duckdb_


value_string.

Syntax
void duckdb_free(
void *ptr
);

Parameters

• ptr

The memory region to de‑allocate.

duckdb_vector_size The internal vector size used by DuckDB. This is the amount of tuples that will fit into a data chunk created by
duckdb_create_data_chunk.

Syntax
idx_t duckdb_vector_size(

);

Parameters

• returns

The vector size.

duckdb_string_is_inlined Whether or not the duckdb_string_t value is inlined. This means that the data of the string does not
have a separate allocation.

Syntax
bool duckdb_string_is_inlined(
duckdb_string_t string
);

duckdb_from_date Decompose a duckdb_date object into year, month and date (stored as duckdb_date_struct).

Syntax
duckdb_date_struct duckdb_from_date(
duckdb_date date
);

137
DuckDB Documentation

Parameters

• date

The date object, as obtained from a DUCKDB_TYPE_DATE column.

• returns

The duckdb_date_struct with the decomposed elements.

duckdb_to_date Re‑compose a duckdb_date from year, month and date (duckdb_date_struct).

Syntax

duckdb_date duckdb_to_date(
duckdb_date_struct date
);

Parameters

• date

The year, month and date stored in a duckdb_date_struct.

• returns

The duckdb_date element.

duckdb_is_finite_date Test a duckdb_date to see if it is a finite value.

Syntax

bool duckdb_is_finite_date(
duckdb_date date
);

Parameters

• date

The date object, as obtained from a DUCKDB_TYPE_DATE column.

• returns

True if the date is finite, false if it is ±infinity.

duckdb_from_time Decompose a duckdb_time object into hour, minute, second and microsecond (stored as duckdb_time_
struct).

Syntax

duckdb_time_struct duckdb_from_time(
duckdb_time time
);

138
DuckDB Documentation

Parameters

• time

The time object, as obtained from a DUCKDB_TYPE_TIME column.

• returns

The duckdb_time_struct with the decomposed elements.

duckdb_create_time_tz Create a duckdb_time_tz object from micros and a timezone offset.

Syntax

duckdb_time_tz duckdb_create_time_tz(
int64_t micros,
int32_t offset
);

Parameters

• micros

The microsecond component of the time.

• offset

The timezone offset component of the time.

• returns

The duckdb_time_tz element.

duckdb_from_time_tz Decompose a TIME_TZ objects into micros and a timezone offset.

Use duckdb_from_time to further decompose the micros into hour, minute, second and microsecond.

Syntax

duckdb_time_tz_struct duckdb_from_time_tz(
duckdb_time_tz micros
);

Parameters

• micros

The time object, as obtained from a DUCKDB_TYPE_TIME_TZ column.

• out_micros

The microsecond component of the time.

• out_offset

The timezone offset component of the time.

duckdb_to_time Re‑compose a duckdb_time from hour, minute, second and microsecond (duckdb_time_struct).

139
DuckDB Documentation

Syntax
duckdb_time duckdb_to_time(
duckdb_time_struct time
);

Parameters

• time

The hour, minute, second and microsecond in a duckdb_time_struct.

• returns

The duckdb_time element.

duckdb_from_timestamp Decompose a duckdb_timestamp object into a duckdb_timestamp_struct.

Syntax
duckdb_timestamp_struct duckdb_from_timestamp(
duckdb_timestamp ts
);

Parameters

• ts

The ts object, as obtained from a DUCKDB_TYPE_TIMESTAMP column.

• returns

The duckdb_timestamp_struct with the decomposed elements.

duckdb_to_timestamp Re‑compose a duckdb_timestamp from a duckdb_timestamp_struct.

Syntax
duckdb_timestamp duckdb_to_timestamp(
duckdb_timestamp_struct ts
);

Parameters

• ts

The de‑composed elements in a duckdb_timestamp_struct.

• returns

The duckdb_timestamp element.

duckdb_is_finite_timestamp Test a duckdb_timestamp to see if it is a finite value.

Syntax
bool duckdb_is_finite_timestamp(
duckdb_timestamp ts
);

140
DuckDB Documentation

Parameters

• ts

The timestamp object, as obtained from a DUCKDB_TYPE_TIMESTAMP column.

• returns

True if the timestamp is finite, false if it is ±infinity.

duckdb_hugeint_to_double Converts a duckdb_hugeint object (as obtained from a DUCKDB_TYPE_HUGEINT column) into a
double.

Syntax
double duckdb_hugeint_to_double(
duckdb_hugeint val
);

Parameters

• val

The hugeint value.

• returns

The converted double element.

duckdb_double_to_hugeint Converts a double value to a duckdb_hugeint object.

If the conversion fails because the double value is too big the result will be 0.

Syntax
duckdb_hugeint duckdb_double_to_hugeint(
double val
);

Parameters

• val

The double value.

• returns

The converted duckdb_hugeint element.

duckdb_uhugeint_to_double Converts a duckdb_uhugeint object (as obtained from a DUCKDB_TYPE_UHUGEINT column) into
a double.

Syntax
double duckdb_uhugeint_to_double(
duckdb_uhugeint val
);

141
DuckDB Documentation

Parameters

• val

The uhugeint value.

• returns

The converted double element.

duckdb_double_to_uhugeint Converts a double value to a duckdb_uhugeint object.

If the conversion fails because the double value is too big the result will be 0.

Syntax
duckdb_uhugeint duckdb_double_to_uhugeint(
double val
);

Parameters

• val

The double value.

• returns

The converted duckdb_uhugeint element.

duckdb_double_to_decimal Converts a double value to a duckdb_decimal object.

If the conversion fails because the double value is too big, or the width/scale are invalid the result will be 0.

Syntax
duckdb_decimal duckdb_double_to_decimal(
double val,
uint8_t width,
uint8_t scale
);

Parameters

• val

The double value.

• returns

The converted duckdb_decimal element.

duckdb_decimal_to_double Converts a duckdb_decimal object (as obtained from a DUCKDB_TYPE_DECIMAL column) into a
double.

Syntax
double duckdb_decimal_to_double(
duckdb_decimal val
);

142
DuckDB Documentation

Parameters

• val

The decimal value.

• returns

The converted double element.

duckdb_prepare Create a prepared statement object from a query.

Note that after calling duckdb_prepare, the prepared statement should always be destroyed using duckdb_destroy_prepare,
even if the prepare fails.

If the prepare fails, duckdb_prepare_error can be called to obtain the reason why the prepare failed.

Syntax
duckdb_state duckdb_prepare(
duckdb_connection connection,
const char *query,
duckdb_prepared_statement *out_prepared_statement
);

Parameters

• connection

The connection object

• query

The SQL query to prepare

• out_prepared_statement

The resulting prepared statement object

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_destroy_prepare Closes the prepared statement and de‑allocates all memory allocated for the statement.

Syntax
void duckdb_destroy_prepare(
duckdb_prepared_statement *prepared_statement
);

Parameters

• prepared_statement

The prepared statement to destroy.

duckdb_prepare_error Returns the error message associated with the given prepared statement. If the prepared statement has no
error message, this returns nullptr instead.

The error message should not be freed. It will be de‑allocated when duckdb_destroy_prepare is called.

143
DuckDB Documentation

Syntax

const char *duckdb_prepare_error(


duckdb_prepared_statement prepared_statement
);

Parameters

• prepared_statement

The prepared statement to obtain the error from.

• returns

The error message, or nullptr if there is none.

duckdb_nparams Returns the number of parameters that can be provided to the given prepared statement.

Returns 0 if the query was not successfully prepared.

Syntax

idx_t duckdb_nparams(
duckdb_prepared_statement prepared_statement
);

Parameters

• prepared_statement

The prepared statement to obtain the number of parameters for.

duckdb_parameter_name Returns the name used to identify the parameter The returned string should be freed using duckdb_
free.

Returns NULL if the index is out of range for the provided prepared statement.

Syntax

const char *duckdb_parameter_name(


duckdb_prepared_statement prepared_statement,
idx_t index
);

Parameters

• prepared_statement

The prepared statement for which to get the parameter name from.

duckdb_param_type Returns the parameter type for the parameter at the given index.

Returns DUCKDB_TYPE_INVALID if the parameter index is out of range or the statement was not successfully prepared.

144
DuckDB Documentation

Syntax

duckdb_type duckdb_param_type(
duckdb_prepared_statement prepared_statement,
idx_t param_idx
);

Parameters

• prepared_statement

The prepared statement.

• param_idx

The parameter index.

• returns

The parameter type

duckdb_clear_bindings Clear the params bind to the prepared statement.

Syntax

duckdb_state duckdb_clear_bindings(
duckdb_prepared_statement prepared_statement
);

duckdb_prepared_statement_type Returns the statement type of the statement to be executed

Syntax

duckdb_statement_type duckdb_prepared_statement_type(
duckdb_prepared_statement statement
);

Parameters

• statement

The prepared statement.

• returns

duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID

duckdb_bind_value Binds a value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_value(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_value val
);

145
DuckDB Documentation

duckdb_bind_parameter_index Retrieve the index of the parameter for the prepared statement, identified by name

Syntax

duckdb_state duckdb_bind_parameter_index(
duckdb_prepared_statement prepared_statement,
idx_t *param_idx_out,
const char *name
);

duckdb_bind_boolean Binds a bool value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_boolean(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
bool val
);

duckdb_bind_int8 Binds an int8_t value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_int8(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
int8_t val
);

duckdb_bind_int16 Binds an int16_t value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_int16(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
int16_t val
);

duckdb_bind_int32 Binds an int32_t value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_int32(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
int32_t val
);

duckdb_bind_int64 Binds an int64_t value to the prepared statement at the specified index.

146
DuckDB Documentation

Syntax

duckdb_state duckdb_bind_int64(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
int64_t val
);

duckdb_bind_hugeint Binds a duckdb_hugeint value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_hugeint(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_hugeint val
);

duckdb_bind_uhugeint Binds an duckdb_uhugeint value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_uhugeint(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_uhugeint val
);

duckdb_bind_decimal Binds a duckdb_decimal value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_decimal(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_decimal val
);

duckdb_bind_uint8 Binds an uint8_t value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_uint8(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
uint8_t val
);

duckdb_bind_uint16 Binds an uint16_t value to the prepared statement at the specified index.

147
DuckDB Documentation

Syntax

duckdb_state duckdb_bind_uint16(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
uint16_t val
);

duckdb_bind_uint32 Binds an uint32_t value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_uint32(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
uint32_t val
);

duckdb_bind_uint64 Binds an uint64_t value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_uint64(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
uint64_t val
);

duckdb_bind_float Binds a float value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_float(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
float val
);

duckdb_bind_double Binds a double value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_double(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
double val
);

duckdb_bind_date Binds a duckdb_date value to the prepared statement at the specified index.

148
DuckDB Documentation

Syntax

duckdb_state duckdb_bind_date(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_date val
);

duckdb_bind_time Binds a duckdb_time value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_time(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_time val
);

duckdb_bind_timestamp Binds a duckdb_timestamp value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_timestamp(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_timestamp val
);

duckdb_bind_interval Binds a duckdb_interval value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_interval(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
duckdb_interval val
);

duckdb_bind_varchar Binds a null‑terminated varchar value to the prepared statement at the specified index.

Syntax

duckdb_state duckdb_bind_varchar(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
const char *val
);

duckdb_bind_varchar_length Binds a varchar value to the prepared statement at the specified index.

149
DuckDB Documentation

Syntax
duckdb_state duckdb_bind_varchar_length(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
const char *val,
idx_t length
);

duckdb_bind_blob Binds a blob value to the prepared statement at the specified index.

Syntax
duckdb_state duckdb_bind_blob(
duckdb_prepared_statement prepared_statement,
idx_t param_idx,
const void *data,
idx_t length
);

duckdb_bind_null Binds a NULL value to the prepared statement at the specified index.

Syntax
duckdb_state duckdb_bind_null(
duckdb_prepared_statement prepared_statement,
idx_t param_idx
);

duckdb_execute_prepared Executes the prepared statement with the given bound parameters, and returns a materialized query
result.

This method can be called multiple times for each prepared statement, and the parameters can be modified between calls to this func‑
tion.

Note that the result must be freed with duckdb_destroy_result.

Syntax
duckdb_state duckdb_execute_prepared(
duckdb_prepared_statement prepared_statement,
duckdb_result *out_result
);

Parameters

• prepared_statement

The prepared statement to execute.

• out_result

The query result.

• returns

DuckDBSuccess on success or DuckDBError on failure.

150
DuckDB Documentation

duckdb_execute_prepared_streaming Executes the prepared statement with the given bound parameters, and returns an
optionally‑streaming query result. To determine if the resulting query was in fact streamed, use duckdb_result_is_streaming

This method can be called multiple times for each prepared statement, and the parameters can be modified between calls to this func‑
tion.

Note that the result must be freed with duckdb_destroy_result.

Syntax

duckdb_state duckdb_execute_prepared_streaming(
duckdb_prepared_statement prepared_statement,
duckdb_result *out_result
);

Parameters

• prepared_statement

The prepared statement to execute.

• out_result

The query result.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_extract_statements Extract all statements from a query. Note that after calling duckdb_extract_statements, the
extracted statements should always be destroyed using duckdb_destroy_extracted, even if no statements were extracted.

If the extract fails, duckdb_extract_statements_error can be called to obtain the reason why the extract failed.

Syntax

idx_t duckdb_extract_statements(
duckdb_connection connection,
const char *query,
duckdb_extracted_statements *out_extracted_statements
);

Parameters

• connection

The connection object

• query

The SQL query to extract

• out_extracted_statements

The resulting extracted statements object

• returns

The number of extracted statements or 0 on failure.

151
DuckDB Documentation

duckdb_prepare_extracted_statement Prepare an extracted statement. Note that after calling duckdb_prepare_


extracted_statement, the prepared statement should always be destroyed using duckdb_destroy_prepare, even if the
prepare fails.

If the prepare fails, duckdb_prepare_error can be called to obtain the reason why the prepare failed.

Syntax

duckdb_state duckdb_prepare_extracted_statement(
duckdb_connection connection,
duckdb_extracted_statements extracted_statements,
idx_t index,
duckdb_prepared_statement *out_prepared_statement
);

Parameters

• connection

The connection object

• extracted_statements

The extracted statements object

• index

The index of the extracted statement to prepare

• out_prepared_statement

The resulting prepared statement object

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_extract_statements_error Returns the error message contained within the extracted statements. The result of this
function must not be freed. It will be cleaned up when duckdb_destroy_extracted is called.

Syntax

const char *duckdb_extract_statements_error(


duckdb_extracted_statements extracted_statements
);

Parameters

• result

The extracted statements to fetch the error from.

• returns

The error of the extracted statements.

duckdb_destroy_extracted De‑allocates all memory allocated for the extracted statements.

152
DuckDB Documentation

Syntax

void duckdb_destroy_extracted(
duckdb_extracted_statements *extracted_statements
);

Parameters

• extracted_statements

The extracted statements to destroy.

duckdb_pending_prepared Executes the prepared statement with the given bound parameters, and returns a pending result. The
pending result represents an intermediate structure for a query that is not yet fully executed. The pending result can be used to incremen‑
tally execute a query, returning control to the client between tasks.

Note that after calling duckdb_pending_prepared, the pending result should always be destroyed using duckdb_destroy_
pending, even if this function returns DuckDBError.

Syntax

duckdb_state duckdb_pending_prepared(
duckdb_prepared_statement prepared_statement,
duckdb_pending_result *out_result
);

Parameters

• prepared_statement

The prepared statement to execute.

• out_result

The pending query result.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_pending_prepared_streaming Executes the prepared statement with the given bound parameters, and returns a pend‑
ing result. This pending result will create a streaming duckdb_result when executed. The pending result represents an intermediate struc‑
ture for a query that is not yet fully executed.

Note that after calling duckdb_pending_prepared_streaming, the pending result should always be destroyed using duckdb_
destroy_pending, even if this function returns DuckDBError.

Syntax

duckdb_state duckdb_pending_prepared_streaming(
duckdb_prepared_statement prepared_statement,
duckdb_pending_result *out_result
);

153
DuckDB Documentation

Parameters

• prepared_statement

The prepared statement to execute.

• out_result

The pending query result.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_destroy_pending Closes the pending result and de‑allocates all memory allocated for the result.

Syntax

void duckdb_destroy_pending(
duckdb_pending_result *pending_result
);

Parameters

• pending_result

The pending result to destroy.

duckdb_pending_error Returns the error message contained within the pending result.

The result of this function must not be freed. It will be cleaned up when duckdb_destroy_pending is called.

Syntax

const char *duckdb_pending_error(


duckdb_pending_result pending_result
);

Parameters

• result

The pending result to fetch the error from.

• returns

The error of the pending result.

duckdb_pending_execute_task Executes a single task within the query, returning whether or not the query is ready.

If this returns DUCKDB_PENDING_RESULT_READY, the duckdb_execute_pending function can be called to obtain the result. If this returns
DUCKDB_PENDING_RESULT_NOT_READY, the duckdb_pending_execute_task function should be called again. If this returns DUCKDB_
PENDING_ERROR, an error occurred during execution.

The error message can be obtained by calling duckdb_pending_error on the pending_result.

154
DuckDB Documentation

Syntax

duckdb_pending_state duckdb_pending_execute_task(
duckdb_pending_result pending_result
);

Parameters

• pending_result

The pending result to execute a task within.

• returns

The state of the pending result after the execution.

duckdb_pending_execute_check_state If this returns DUCKDB_PENDING_RESULT_READY, the duckdb_execute_pending func‑


tion can be called to obtain the result. If this returns DUCKDB_PENDING_RESULT_NOT_READY, the duckdb_pending_execute_check_state
function should be called again. If this returns DUCKDB_PENDING_ERROR, an error occurred during execution.

The error message can be obtained by calling duckdb_pending_error on the pending_result.

Syntax

duckdb_pending_state duckdb_pending_execute_check_state(
duckdb_pending_result pending_result
);

Parameters

• pending_result

The pending result.

• returns

The state of the pending result.

duckdb_execute_pending Fully execute a pending query result, returning the final query result.

If duckdb_pending_execute_task has been called until DUCKDB_PENDING_RESULT_READY was returned, this will return fast. Otherwise,
all remaining tasks must be executed first.

Note that the result must be freed with duckdb_destroy_result.

Syntax

duckdb_state duckdb_execute_pending(
duckdb_pending_result pending_result,
duckdb_result *out_result
);

155
DuckDB Documentation

Parameters

• pending_result

The pending result to execute.

• out_result

The result object.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_pending_execution_is_finished Returns whether a duckdb_pending_state is finished executing. For example if


pending_state is DUCKDB_PENDING_RESULT_READY, this function will return true.

Syntax

bool duckdb_pending_execution_is_finished(
duckdb_pending_state pending_state
);

Parameters

• pending_state

The pending state on which to decide whether to finish execution.

• returns

Boolean indicating pending execution should be considered finished.

duckdb_destroy_value Destroys the value and de‑allocates all memory allocated for that type.

Syntax

void duckdb_destroy_value(
duckdb_value *value
);

Parameters

• value

The value to destroy.

duckdb_create_varchar Creates a value from a null‑terminated string

Syntax

duckdb_value duckdb_create_varchar(
const char *text
);

156
DuckDB Documentation

Parameters

• value

The null‑terminated string

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_create_varchar_length Creates a value from a string

Syntax
duckdb_value duckdb_create_varchar_length(
const char *text,
idx_t length
);

Parameters

• value

The text

• length

The length of the text

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_create_int64 Creates a value from an int64

Syntax
duckdb_value duckdb_create_int64(
int64_t val
);

Parameters

• value

The bigint value

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_create_struct_value Creates a struct value from a type and an array of values

Syntax
duckdb_value duckdb_create_struct_value(
duckdb_logical_type type,
duckdb_value *values
);

157
DuckDB Documentation

Parameters

• type

The type of the struct

• values

The values for the struct fields

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_create_list_value Creates a list value from a type and an array of values of length value_count

Syntax

duckdb_value duckdb_create_list_value(
duckdb_logical_type type,
duckdb_value *values,
idx_t value_count
);

Parameters

• type

The type of the list

• values

The values for the list

• value_count

The number of values in the list

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_create_array_value Creates a array value from a type and an array of values of length value_count

Syntax

duckdb_value duckdb_create_array_value(
duckdb_logical_type type,
duckdb_value *values,
idx_t value_count
);

Parameters

• type

The type of the array

• values

158
DuckDB Documentation

The values for the array

• value_count

The number of values in the array

• returns

The value. This must be destroyed with duckdb_destroy_value.

duckdb_get_varchar Obtains a string representation of the given value. The result must be destroyed with duckdb_free.

Syntax

char *duckdb_get_varchar(
duckdb_value value
);

Parameters

• value

The value

• returns

The string value. This must be destroyed with duckdb_free.

duckdb_get_int64 Obtains an int64 of the given value.

Syntax

int64_t duckdb_get_int64(
duckdb_value value
);

Parameters

• value

The value

• returns

The int64 value, or 0 if no conversion is possible

duckdb_create_logical_type Creates a duckdb_logical_type from a standard primitive type. The resulting type should be
destroyed with duckdb_destroy_logical_type.

This should not be used with DUCKDB_TYPE_DECIMAL.

Syntax

duckdb_logical_type duckdb_create_logical_type(
duckdb_type type
);

159
DuckDB Documentation

Parameters

• type

The primitive type to create.

• returns

The logical type.

duckdb_logical_type_get_alias Returns the alias of a duckdb_logical_type, if one is set, else NULL. The result must be de‑
stroyed with duckdb_free.

Syntax
char *duckdb_logical_type_get_alias(
duckdb_logical_type type
);

Parameters

• type

The logical type to return the alias of

• returns

The alias or NULL

duckdb_create_list_type Creates a list type from its child type. The resulting type should be destroyed with duckdb_destroy_
logical_type.

Syntax
duckdb_logical_type duckdb_create_list_type(
duckdb_logical_type type
);

Parameters

• type

The child type of list type to create.

• returns

The logical type.

duckdb_create_array_type Creates a array type from its child type. The resulting type should be destroyed with duckdb_
destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_array_type(
duckdb_logical_type type,
idx_t array_size
);

160
DuckDB Documentation

Parameters

• type

The child type of array type to create.

• array_size

The number of elements in the array.

• returns

The logical type.

duckdb_create_map_type Creates a map type from its key type and value type. The resulting type should be destroyed with
duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_create_map_type(
duckdb_logical_type key_type,
duckdb_logical_type value_type
);

Parameters

• type

The key type and value type of map type to create.

• returns

The logical type.

duckdb_create_union_type Creates a UNION type from the passed types array. The resulting type should be destroyed with
duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_create_union_type(
duckdb_logical_type *member_types,
const char **member_names,
idx_t member_count
);

Parameters

• types

The array of types that the union should consist of.

• type_amount

The size of the types array.

• returns

The logical type.

161
DuckDB Documentation

duckdb_create_struct_type Creates a STRUCT type from the passed member name and type arrays. The resulting type should
be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_struct_type(
duckdb_logical_type *member_types,
const char **member_names,
idx_t member_count
);

Parameters

• member_types

The array of types that the struct should consist of.

• member_names

The array of names that the struct should consist of.

• member_count

The number of members that were specified for both arrays.

• returns

The logical type.

duckdb_create_enum_type Creates an ENUM type from the passed member name array. The resulting type should be destroyed
with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_enum_type(
const char **member_names,
idx_t member_count
);

Parameters

• enum_name

The name of the enum.

• member_names

The array of names that the enum should consist of.

• member_count

The number of elements that were specified in the array.

• returns

The logical type.

duckdb_create_decimal_type Creates a duckdb_logical_type of type decimal with the specified width and scale. The re‑
sulting type should be destroyed with duckdb_destroy_logical_type.

162
DuckDB Documentation

Syntax
duckdb_logical_type duckdb_create_decimal_type(
uint8_t width,
uint8_t scale
);

Parameters

• width

The width of the decimal type

• scale

The scale of the decimal type

• returns

The logical type.

duckdb_get_type_id Retrieves the enum type class of a duckdb_logical_type.

Syntax
duckdb_type duckdb_get_type_id(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The type id

duckdb_decimal_width Retrieves the width of a decimal type.

Syntax
uint8_t duckdb_decimal_width(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The width of the decimal type

duckdb_decimal_scale Retrieves the scale of a decimal type.

163
DuckDB Documentation

Syntax
uint8_t duckdb_decimal_scale(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The scale of the decimal type

duckdb_decimal_internal_type Retrieves the internal storage type of a decimal type.

Syntax
duckdb_type duckdb_decimal_internal_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The internal type of the decimal type

duckdb_enum_internal_type Retrieves the internal storage type of an enum type.

Syntax
duckdb_type duckdb_enum_internal_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The internal type of the enum type

duckdb_enum_dictionary_size Retrieves the dictionary size of the enum type.

Syntax
uint32_t duckdb_enum_dictionary_size(
duckdb_logical_type type
);

164
DuckDB Documentation

Parameters

• type

The logical type object

• returns

The dictionary size of the enum type

duckdb_enum_dictionary_value Retrieves the dictionary value at the specified position from the enum.

The result must be freed with duckdb_free.

Syntax

char *duckdb_enum_dictionary_value(
duckdb_logical_type type,
idx_t index
);

Parameters

• type

The logical type object

• index

The index in the dictionary

• returns

The string value of the enum type. Must be freed with duckdb_free.

duckdb_list_type_child_type Retrieves the child type of the given list type.

The result must be freed with duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_list_type_child_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The child type of the list type. Must be destroyed with duckdb_destroy_logical_type.

duckdb_array_type_child_type Retrieves the child type of the given array type.

The result must be freed with duckdb_destroy_logical_type.

165
DuckDB Documentation

Syntax

duckdb_logical_type duckdb_array_type_child_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The child type of the array type. Must be destroyed with duckdb_destroy_logical_type.

duckdb_array_type_array_size Retrieves the array size of the given array type.

Syntax

idx_t duckdb_array_type_array_size(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The fixed number of elements the values of this array type can store.

duckdb_map_type_key_type Retrieves the key type of the given map type.

The result must be freed with duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_map_type_key_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The key type of the map type. Must be destroyed with duckdb_destroy_logical_type.

duckdb_map_type_value_type Retrieves the value type of the given map type.

The result must be freed with duckdb_destroy_logical_type.

166
DuckDB Documentation

Syntax

duckdb_logical_type duckdb_map_type_value_type(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The value type of the map type. Must be destroyed with duckdb_destroy_logical_type.

duckdb_struct_type_child_count Returns the number of children of a struct type.

Syntax

idx_t duckdb_struct_type_child_count(
duckdb_logical_type type
);

Parameters

• type

The logical type object

• returns

The number of children of a struct type.

duckdb_struct_type_child_name Retrieves the name of the struct child.

The result must be freed with duckdb_free.

Syntax

char *duckdb_struct_type_child_name(
duckdb_logical_type type,
idx_t index
);

Parameters

• type

The logical type object

• index

The child index

• returns

The name of the struct type. Must be freed with duckdb_free.

167
DuckDB Documentation

duckdb_struct_type_child_type Retrieves the child type of the given struct type at the specified index.

The result must be freed with duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_struct_type_child_type(
duckdb_logical_type type,
idx_t index
);

Parameters

• type

The logical type object

• index

The child index

• returns

The child type of the struct type. Must be destroyed with duckdb_destroy_logical_type.

duckdb_union_type_member_count Returns the number of members that the union type has.

Syntax

idx_t duckdb_union_type_member_count(
duckdb_logical_type type
);

Parameters

• type

The logical type (union) object

• returns

The number of members of a union type.

duckdb_union_type_member_name Retrieves the name of the union member.

The result must be freed with duckdb_free.

Syntax

char *duckdb_union_type_member_name(
duckdb_logical_type type,
idx_t index
);

168
DuckDB Documentation

Parameters

• type

The logical type object

• index

The child index

• returns

The name of the union member. Must be freed with duckdb_free.

duckdb_union_type_member_type Retrieves the child type of the given union member at the specified index.

The result must be freed with duckdb_destroy_logical_type.

Syntax

duckdb_logical_type duckdb_union_type_member_type(
duckdb_logical_type type,
idx_t index
);

Parameters

• type

The logical type object

• index

The child index

• returns

The child type of the union member. Must be destroyed with duckdb_destroy_logical_type.

duckdb_destroy_logical_type Destroys the logical type and de‑allocates all memory allocated for that type.

Syntax

void duckdb_destroy_logical_type(
duckdb_logical_type *type
);

Parameters

• type

The logical type to destroy.

duckdb_create_data_chunk Creates an empty DataChunk with the specified set of types.

Note that the result must be destroyed with duckdb_destroy_data_chunk.

169
DuckDB Documentation

Syntax
duckdb_data_chunk duckdb_create_data_chunk(
duckdb_logical_type *types,
idx_t column_count
);

Parameters

• types

An array of types of the data chunk.

• column_count

The number of columns.

• returns

The data chunk.

duckdb_destroy_data_chunk Destroys the data chunk and de‑allocates all memory allocated for that chunk.

Syntax
void duckdb_destroy_data_chunk(
duckdb_data_chunk *chunk
);

Parameters

• chunk

The data chunk to destroy.

duckdb_data_chunk_reset Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0.

Syntax
void duckdb_data_chunk_reset(
duckdb_data_chunk chunk
);

Parameters

• chunk

The data chunk to reset.

duckdb_data_chunk_get_column_count Retrieves the number of columns in a data chunk.

Syntax
idx_t duckdb_data_chunk_get_column_count(
duckdb_data_chunk chunk
);

170
DuckDB Documentation

Parameters

• chunk

The data chunk to get the data from

• returns

The number of columns in the data chunk

duckdb_data_chunk_get_vector Retrieves the vector at the specified column index in the data chunk.

The pointer to the vector is valid for as long as the chunk is alive. It does NOT need to be destroyed.

Syntax

duckdb_vector duckdb_data_chunk_get_vector(
duckdb_data_chunk chunk,
idx_t col_idx
);

Parameters

• chunk

The data chunk to get the data from

• returns

The vector

duckdb_data_chunk_get_size Retrieves the current number of tuples in a data chunk.

Syntax

idx_t duckdb_data_chunk_get_size(
duckdb_data_chunk chunk
);

Parameters

• chunk

The data chunk to get the data from

• returns

The number of tuples in the data chunk

duckdb_data_chunk_set_size Sets the current number of tuples in a data chunk.

Syntax

void duckdb_data_chunk_set_size(
duckdb_data_chunk chunk,
idx_t size
);

171
DuckDB Documentation

Parameters

• chunk

The data chunk to set the size in

• size

The number of tuples in the data chunk

duckdb_vector_get_column_type Retrieves the column type of the specified vector.

The result must be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_vector_get_column_type(
duckdb_vector vector
);

Parameters

• vector

The vector get the data from

• returns

The type of the vector

duckdb_vector_get_data Retrieves the data pointer of the vector.

The data pointer can be used to read or write values from the vector. How to read or write values depends on the type of the vector.

Syntax
void *duckdb_vector_get_data(
duckdb_vector vector
);

Parameters

• vector

The vector to get the data from

• returns

The data pointer

duckdb_vector_get_validity Retrieves the validity mask pointer of the specified vector.

If all values are valid, this function MIGHT return NULL!

The validity mask is a bitset that signifies null‑ness within the data chunk. It is a series of uint64_t values, where each uint64_t value contains
validity for 64 tuples. The bit is set to 1 if the value is valid (i.e., not NULL) or 0 if the value is invalid (i.e., NULL).

Validity of a specific value can be obtained like this:

idx_t entry_idx = row_idx / 64; idx_t idx_in_entry = row_idx % 64; bool is_valid = validity_mask[entry_idx] & (1 « idx_in_entry);

Alternatively, the (slower) duckdb_validity_row_is_valid function can be used.

172
DuckDB Documentation

Syntax
uint64_t *duckdb_vector_get_validity(
duckdb_vector vector
);

Parameters

• vector

The vector to get the data from

• returns

The pointer to the validity mask, or NULL if no validity mask is present

duckdb_vector_ensure_validity_writable Ensures the validity mask is writable by allocating it.

After this function is called, duckdb_vector_get_validity will ALWAYS return non‑NULL. This allows null values to be written to the
vector, regardless of whether a validity mask was present before.

Syntax
void duckdb_vector_ensure_validity_writable(
duckdb_vector vector
);

Parameters

• vector

The vector to alter

duckdb_vector_assign_string_element Assigns a string element in the vector at the specified location.

Syntax
void duckdb_vector_assign_string_element(
duckdb_vector vector,
idx_t index,
const char *str
);

Parameters

• vector

The vector to alter

• index

The row position in the vector to assign the string to

• str

The null‑terminated string

duckdb_vector_assign_string_element_len Assigns a string element in the vector at the specified location. You may also
use this function to assign BLOBs.

173
DuckDB Documentation

Syntax

void duckdb_vector_assign_string_element_len(
duckdb_vector vector,
idx_t index,
const char *str,
idx_t str_len
);

Parameters

• vector

The vector to alter

• index

The row position in the vector to assign the string to

• str

The string

• str_len

The length of the string (in bytes)

duckdb_list_vector_get_child Retrieves the child vector of a list vector.

The resulting vector is valid as long as the parent vector is valid.

Syntax

duckdb_vector duckdb_list_vector_get_child(
duckdb_vector vector
);

Parameters

• vector

The vector

• returns

The child vector

duckdb_list_vector_get_size Returns the size of the child vector of the list.

Syntax

idx_t duckdb_list_vector_get_size(
duckdb_vector vector
);

174
DuckDB Documentation

Parameters

• vector

The vector

• returns

The size of the child list

duckdb_list_vector_set_size Sets the total size of the underlying child‑vector of a list vector.

Syntax
duckdb_state duckdb_list_vector_set_size(
duckdb_vector vector,
idx_t size
);

Parameters

• vector

The list vector.

• size

The size of the child list.

• returns

The duckdb state. Returns DuckDBError if the vector is nullptr.

duckdb_list_vector_reserve Sets the total capacity of the underlying child‑vector of a list.

Syntax
duckdb_state duckdb_list_vector_reserve(
duckdb_vector vector,
idx_t required_capacity
);

Parameters

• vector

The list vector.

• required_capacity

the total capacity to reserve.

• return

The duckdb state. Returns DuckDBError if the vector is nullptr.

duckdb_struct_vector_get_child Retrieves the child vector of a struct vector.

The resulting vector is valid as long as the parent vector is valid.

175
DuckDB Documentation

Syntax

duckdb_vector duckdb_struct_vector_get_child(
duckdb_vector vector,
idx_t index
);

Parameters

• vector

The vector

• index

The child index

• returns

The child vector

duckdb_array_vector_get_child Retrieves the child vector of a array vector.

The resulting vector is valid as long as the parent vector is valid. The resulting vector has the size of the parent vector multiplied by the
array size.

Syntax

duckdb_vector duckdb_array_vector_get_child(
duckdb_vector vector
);

Parameters

• vector

The vector

• returns

The child vector

duckdb_validity_row_is_valid Returns whether or not a row is valid (i.e., not NULL) in the given validity mask.

Syntax

bool duckdb_validity_row_is_valid(
uint64_t *validity,
idx_t row
);

Parameters

• validity

The validity mask, as obtained through duckdb_vector_get_validity

• row

176
DuckDB Documentation

The row index

• returns

true if the row is valid, false otherwise

duckdb_validity_set_row_validity In a validity mask, sets a specific row to either valid or invalid.

Note that duckdb_vector_ensure_validity_writable should be called before calling duckdb_vector_get_validity, to


ensure that there is a validity mask to write to.

Syntax

void duckdb_validity_set_row_validity(
uint64_t *validity,
idx_t row,
bool valid
);

Parameters

• validity

The validity mask, as obtained through duckdb_vector_get_validity.

• row

The row index

• valid

Whether or not to set the row to valid, or invalid

duckdb_validity_set_row_invalid In a validity mask, sets a specific row to invalid.

Equivalent to duckdb_validity_set_row_validity with valid set to false.

Syntax

void duckdb_validity_set_row_invalid(
uint64_t *validity,
idx_t row
);

Parameters

• validity

The validity mask

• row

The row index

duckdb_validity_set_row_valid In a validity mask, sets a specific row to valid.

Equivalent to duckdb_validity_set_row_validity with valid set to true.

177
DuckDB Documentation

Syntax

void duckdb_validity_set_row_valid(
uint64_t *validity,
idx_t row
);

Parameters

• validity

The validity mask

• row

The row index

duckdb_create_table_function Creates a new empty table function.

The return value should be destroyed with duckdb_destroy_table_function.

Syntax

duckdb_table_function duckdb_create_table_function(

);

Parameters

• returns

The table function object.

duckdb_destroy_table_function Destroys the given table function object.

Syntax

void duckdb_destroy_table_function(
duckdb_table_function *table_function
);

Parameters

• table_function

The table function to destroy

duckdb_table_function_set_name Sets the name of the given table function.

Syntax

void duckdb_table_function_set_name(
duckdb_table_function table_function,
const char *name
);

178
DuckDB Documentation

Parameters

• table_function

The table function

• name

The name of the table function

duckdb_table_function_add_parameter Adds a parameter to the table function.

Syntax

void duckdb_table_function_add_parameter(
duckdb_table_function table_function,
duckdb_logical_type type
);

Parameters

• table_function

The table function

• type

The type of the parameter to add.

duckdb_table_function_add_named_parameter Adds a named parameter to the table function.

Syntax

void duckdb_table_function_add_named_parameter(
duckdb_table_function table_function,
const char *name,
duckdb_logical_type type
);

Parameters

• table_function

The table function

• name

The name of the parameter

• type

The type of the parameter to add.

duckdb_table_function_set_extra_info Assigns extra information to the table function that can be fetched during binding,
etc.

179
DuckDB Documentation

Syntax
void duckdb_table_function_set_extra_info(
duckdb_table_function table_function,
void *extra_info,
duckdb_delete_callback_t destroy
);

Parameters

• table_function

The table function

• extra_info

The extra information

• destroy

The callback that will be called to destroy the bind data (if any)

duckdb_table_function_set_bind Sets the bind function of the table function.

Syntax
void duckdb_table_function_set_bind(
duckdb_table_function table_function,
duckdb_table_function_bind_t bind
);

Parameters

• table_function

The table function

• bind

The bind function

duckdb_table_function_set_init Sets the init function of the table function.

Syntax
void duckdb_table_function_set_init(
duckdb_table_function table_function,
duckdb_table_function_init_t init
);

Parameters

• table_function

The table function

• init

The init function

180
DuckDB Documentation

duckdb_table_function_set_local_init Sets the thread‑local init function of the table function.

Syntax

void duckdb_table_function_set_local_init(
duckdb_table_function table_function,
duckdb_table_function_init_t init
);

Parameters

• table_function

The table function

• init

The init function

duckdb_table_function_set_function Sets the main function of the table function.

Syntax

void duckdb_table_function_set_function(
duckdb_table_function table_function,
duckdb_table_function_t function
);

Parameters

• table_function

The table function

• function

The function

duckdb_table_function_supports_projection_pushdown Sets whether or not the given table function supports projec‑
tion pushdown.

If this is set to true, the system will provide a list of all required columns in the init stage through the duckdb_init_get_column_
count and duckdb_init_get_column_index functions. If this is set to false (the default), the system will expect all columns to be
projected.

Syntax

void duckdb_table_function_supports_projection_pushdown(
duckdb_table_function table_function,
bool pushdown
);

181
DuckDB Documentation

Parameters

• table_function

The table function

• pushdown

True if the table function supports projection pushdown, false otherwise.

duckdb_register_table_function Register the table function object within the given connection.

The function requires at least a name, a bind function, an init function and a main function.

If the function is incomplete or a function with this name already exists DuckDBError is returned.

Syntax

duckdb_state duckdb_register_table_function(
duckdb_connection con,
duckdb_table_function function
);

Parameters

• con

The connection to register it in.

• function

The function pointer

• returns

Whether or not the registration was successful.

duckdb_bind_get_extra_info Retrieves the extra info of the function as set in duckdb_table_function_set_extra_


info.

Syntax

void *duckdb_bind_get_extra_info(
duckdb_bind_info info
);

Parameters

• info

The info object

• returns

The extra info

duckdb_bind_add_result_column Adds a result column to the output of the table function.

182
DuckDB Documentation

Syntax

void duckdb_bind_add_result_column(
duckdb_bind_info info,
const char *name,
duckdb_logical_type type
);

Parameters

• info

The info object

• name

The name of the column

• type

The logical type of the column

duckdb_bind_get_parameter_count Retrieves the number of regular (non‑named) parameters to the function.

Syntax

idx_t duckdb_bind_get_parameter_count(
duckdb_bind_info info
);

Parameters

• info

The info object

• returns

The number of parameters

duckdb_bind_get_parameter Retrieves the parameter at the given index.

The result must be destroyed with duckdb_destroy_value.

Syntax

duckdb_value duckdb_bind_get_parameter(
duckdb_bind_info info,
idx_t index
);

Parameters

• info

The info object

• index

183
DuckDB Documentation

The index of the parameter to get

• returns

The value of the parameter. Must be destroyed with duckdb_destroy_value.

duckdb_bind_get_named_parameter Retrieves a named parameter with the given name.

The result must be destroyed with duckdb_destroy_value.

Syntax

duckdb_value duckdb_bind_get_named_parameter(
duckdb_bind_info info,
const char *name
);

Parameters

• info

The info object

• name

The name of the parameter

• returns

The value of the parameter. Must be destroyed with duckdb_destroy_value.

duckdb_bind_set_bind_data Sets the user‑provided bind data in the bind object. This object can be retrieved again during exe‑
cution.

Syntax

void duckdb_bind_set_bind_data(
duckdb_bind_info info,
void *bind_data,
duckdb_delete_callback_t destroy
);

Parameters

• info

The info object

• extra_data

The bind data object.

• destroy

The callback that will be called to destroy the bind data (if any)

duckdb_bind_set_cardinality Sets the cardinality estimate for the table function, used for optimization.

184
DuckDB Documentation

Syntax
void duckdb_bind_set_cardinality(
duckdb_bind_info info,
idx_t cardinality,
bool is_exact
);

Parameters

• info

The bind data object.

• is_exact

Whether or not the cardinality estimate is exact, or an approximation

duckdb_bind_set_error Report that an error has occurred while calling bind.

Syntax
void duckdb_bind_set_error(
duckdb_bind_info info,
const char *error
);

Parameters

• info

The info object

• error

The error message

duckdb_init_get_extra_info Retrieves the extra info of the function as set in duckdb_table_function_set_extra_


info.

Syntax
void *duckdb_init_get_extra_info(
duckdb_init_info info
);

Parameters

• info

The info object

• returns

The extra info

duckdb_init_get_bind_data Gets the bind data set by duckdb_bind_set_bind_data during the bind.

Note that the bind data should be considered as read‑only. For tracking state, use the init data instead.

185
DuckDB Documentation

Syntax
void *duckdb_init_get_bind_data(
duckdb_init_info info
);

Parameters

• info

The info object

• returns

The bind data object

duckdb_init_set_init_data Sets the user‑provided init data in the init object. This object can be retrieved again during execu‑
tion.

Syntax
void duckdb_init_set_init_data(
duckdb_init_info info,
void *init_data,
duckdb_delete_callback_t destroy
);

Parameters

• info

The info object

• extra_data

The init data object.

• destroy

The callback that will be called to destroy the init data (if any)

duckdb_init_get_column_count Returns the number of projected columns.

This function must be used if projection pushdown is enabled to figure out which columns to emit.

Syntax
idx_t duckdb_init_get_column_count(
duckdb_init_info info
);

Parameters

• info

The info object

• returns

The number of projected columns.

186
DuckDB Documentation

duckdb_init_get_column_index Returns the column index of the projected column at the specified position.

This function must be used if projection pushdown is enabled to figure out which columns to emit.

Syntax

idx_t duckdb_init_get_column_index(
duckdb_init_info info,
idx_t column_index
);

Parameters

• info

The info object

• column_index

The index at which to get the projected column index, from 0..duckdb_init_get_column_count(info)

• returns

The column index of the projected column.

duckdb_init_set_max_threads Sets how many threads can process this table function in parallel (default: 1)

Syntax

void duckdb_init_set_max_threads(
duckdb_init_info info,
idx_t max_threads
);

Parameters

• info

The info object

• max_threads

The maximum amount of threads that can process this table function

duckdb_init_set_error Report that an error has occurred while calling init.

Syntax

void duckdb_init_set_error(
duckdb_init_info info,
const char *error
);

187
DuckDB Documentation

Parameters

• info

The info object

• error

The error message

duckdb_function_get_extra_info Retrieves the extra info of the function as set in duckdb_table_function_set_


extra_info.

Syntax

void *duckdb_function_get_extra_info(
duckdb_function_info info
);

Parameters

• info

The info object

• returns

The extra info

duckdb_function_get_bind_data Gets the bind data set by duckdb_bind_set_bind_data during the bind.

Note that the bind data should be considered as read‑only. For tracking state, use the init data instead.

Syntax

void *duckdb_function_get_bind_data(
duckdb_function_info info
);

Parameters

• info

The info object

• returns

The bind data object

duckdb_function_get_init_data Gets the init data set by duckdb_init_set_init_data during the init.

Syntax

void *duckdb_function_get_init_data(
duckdb_function_info info
);

188
DuckDB Documentation

Parameters

• info

The info object

• returns

The init data object

duckdb_function_get_local_init_data Gets the thread‑local init data set by duckdb_init_set_init_data during the
local_init.

Syntax
void *duckdb_function_get_local_init_data(
duckdb_function_info info
);

Parameters

• info

The info object

• returns

The init data object

duckdb_function_set_error Report that an error has occurred while executing the function.

Syntax
void duckdb_function_set_error(
duckdb_function_info info,
const char *error
);

Parameters

• info

The info object

• error

The error message

duckdb_add_replacement_scan Add a replacement scan definition to the specified database.

Syntax
void duckdb_add_replacement_scan(
duckdb_database db,
duckdb_replacement_callback_t replacement,
void *extra_data,
duckdb_delete_callback_t delete_callback
);

189
DuckDB Documentation

Parameters

• db

The database object to add the replacement scan to

• replacement

The replacement scan callback

• extra_data

Extra data that is passed back into the specified callback

• delete_callback

The delete callback to call on the extra data, if any

duckdb_replacement_scan_set_function_name Sets the replacement function name. If this function is called in the replace‑
ment callback, the replacement scan is performed. If it is not called, the replacement callback is not performed.

Syntax

void duckdb_replacement_scan_set_function_name(
duckdb_replacement_scan_info info,
const char *function_name
);

Parameters

• info

The info object

• function_name

The function name to substitute.

duckdb_replacement_scan_add_parameter Adds a parameter to the replacement scan function.

Syntax

void duckdb_replacement_scan_add_parameter(
duckdb_replacement_scan_info info,
duckdb_value parameter
);

Parameters

• info

The info object

• parameter

The parameter to add.

duckdb_replacement_scan_set_error Report that an error has occurred while executing the replacement scan.

190
DuckDB Documentation

Syntax
void duckdb_replacement_scan_set_error(
duckdb_replacement_scan_info info,
const char *error
);

Parameters

• info

The info object

• error

The error message

duckdb_appender_create Creates an appender object.

Note that the object must be destroyed with duckdb_appender_destroy.

Syntax
duckdb_state duckdb_appender_create(
duckdb_connection connection,
const char *schema,
const char *table,
duckdb_appender *out_appender
);

Parameters

• connection

The connection context to create the appender in.

• schema

The schema of the table to append to, or nullptr for the default schema.

• table

The table name to append to.

• out_appender

The resulting appender object.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_appender_column_count Returns the number of columns in the table that belongs to the appender.

• appender The appender to get the column count from.

Syntax
idx_t duckdb_appender_column_count(
duckdb_appender appender
);

191
DuckDB Documentation

Parameters

• returns

The number of columns in the table.

duckdb_appender_column_type Returns the type of the column at the specified index.

Note: The resulting type should be destroyed with duckdb_destroy_logical_type.

• appender The appender to get the column type from.


• col_idx The index of the column to get the type of.

Syntax
duckdb_logical_type duckdb_appender_column_type(
duckdb_appender appender,
idx_t col_idx
);

Parameters

• returns

The duckdb_logical_type of the column.

duckdb_appender_error Returns the error message associated with the given appender. If the appender has no error message, this
returns nullptr instead.

The error message should not be freed. It will be de‑allocated when duckdb_appender_destroy is called.

Syntax
const char *duckdb_appender_error(
duckdb_appender appender
);

Parameters

• appender

The appender to get the error from.

• returns

The error message, or nullptr if there is none.

duckdb_appender_flush Flush the appender to the table, forcing the cache of the appender to be cleared and the data to be ap‑
pended to the base table.

This should generally not be used unless you know what you are doing. Instead, call duckdb_appender_destroy when you are done
with the appender.

Syntax
duckdb_state duckdb_appender_flush(
duckdb_appender appender
);

192
DuckDB Documentation

Parameters

• appender

The appender to flush.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_appender_close Close the appender, flushing all intermediate state in the appender to the table and closing it for further
appends.

This is generally not necessary. Call duckdb_appender_destroy instead.

Syntax
duckdb_state duckdb_appender_close(
duckdb_appender appender
);

Parameters

• appender

The appender to flush and close.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_appender_destroy Close the appender and destroy it. Flushing all intermediate state in the appender to the table, and
de‑allocating all memory associated with the appender.

Syntax
duckdb_state duckdb_appender_destroy(
duckdb_appender *appender
);

Parameters

• appender

The appender to flush, close and destroy.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_appender_begin_row A nop function, provided for backwards compatibility reasons. Does nothing. Only duckdb_
appender_end_row is required.

Syntax
duckdb_state duckdb_appender_begin_row(
duckdb_appender appender
);

193
DuckDB Documentation

duckdb_appender_end_row Finish the current row of appends. After end_row is called, the next row can be appended.

Syntax

duckdb_state duckdb_appender_end_row(
duckdb_appender appender
);

Parameters

• appender

The appender.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_append_bool Append a bool value to the appender.

Syntax

duckdb_state duckdb_append_bool(
duckdb_appender appender,
bool value
);

duckdb_append_int8 Append an int8_t value to the appender.

Syntax

duckdb_state duckdb_append_int8(
duckdb_appender appender,
int8_t value
);

duckdb_append_int16 Append an int16_t value to the appender.

Syntax

duckdb_state duckdb_append_int16(
duckdb_appender appender,
int16_t value
);

duckdb_append_int32 Append an int32_t value to the appender.

Syntax

duckdb_state duckdb_append_int32(
duckdb_appender appender,
int32_t value
);

194
DuckDB Documentation

duckdb_append_int64 Append an int64_t value to the appender.

Syntax
duckdb_state duckdb_append_int64(
duckdb_appender appender,
int64_t value
);

duckdb_append_hugeint Append a duckdb_hugeint value to the appender.

Syntax
duckdb_state duckdb_append_hugeint(
duckdb_appender appender,
duckdb_hugeint value
);

duckdb_append_uint8 Append a uint8_t value to the appender.

Syntax
duckdb_state duckdb_append_uint8(
duckdb_appender appender,
uint8_t value
);

duckdb_append_uint16 Append a uint16_t value to the appender.

Syntax
duckdb_state duckdb_append_uint16(
duckdb_appender appender,
uint16_t value
);

duckdb_append_uint32 Append a uint32_t value to the appender.

Syntax
duckdb_state duckdb_append_uint32(
duckdb_appender appender,
uint32_t value
);

duckdb_append_uint64 Append a uint64_t value to the appender.

Syntax
duckdb_state duckdb_append_uint64(
duckdb_appender appender,
uint64_t value
);

195
DuckDB Documentation

duckdb_append_uhugeint Append a duckdb_uhugeint value to the appender.

Syntax
duckdb_state duckdb_append_uhugeint(
duckdb_appender appender,
duckdb_uhugeint value
);

duckdb_append_float Append a float value to the appender.

Syntax
duckdb_state duckdb_append_float(
duckdb_appender appender,
float value
);

duckdb_append_double Append a double value to the appender.

Syntax
duckdb_state duckdb_append_double(
duckdb_appender appender,
double value
);

duckdb_append_date Append a duckdb_date value to the appender.

Syntax
duckdb_state duckdb_append_date(
duckdb_appender appender,
duckdb_date value
);

duckdb_append_time Append a duckdb_time value to the appender.

Syntax
duckdb_state duckdb_append_time(
duckdb_appender appender,
duckdb_time value
);

duckdb_append_timestamp Append a duckdb_timestamp value to the appender.

Syntax
duckdb_state duckdb_append_timestamp(
duckdb_appender appender,
duckdb_timestamp value
);

196
DuckDB Documentation

duckdb_append_interval Append a duckdb_interval value to the appender.

Syntax

duckdb_state duckdb_append_interval(
duckdb_appender appender,
duckdb_interval value
);

duckdb_append_varchar Append a varchar value to the appender.

Syntax

duckdb_state duckdb_append_varchar(
duckdb_appender appender,
const char *val
);

duckdb_append_varchar_length Append a varchar value to the appender.

Syntax

duckdb_state duckdb_append_varchar_length(
duckdb_appender appender,
const char *val,
idx_t length
);

duckdb_append_blob Append a blob value to the appender.

Syntax

duckdb_state duckdb_append_blob(
duckdb_appender appender,
const void *data,
idx_t length
);

duckdb_append_null Append a NULL value to the appender (of any type).

Syntax

duckdb_state duckdb_append_null(
duckdb_appender appender
);

duckdb_append_data_chunk Appends a pre‑filled data chunk to the specified appender.

The types of the data chunk must exactly match the types of the table, no casting is performed. If the types do not match or the appender
is in an invalid state, DuckDBError is returned. If the append is successful, DuckDBSuccess is returned.

197
DuckDB Documentation

Syntax

duckdb_state duckdb_append_data_chunk(
duckdb_appender appender,
duckdb_data_chunk chunk
);

Parameters

• appender

The appender to append to.

• chunk

The data chunk to append.

• returns

The return state.

duckdb_query_arrow Executes a SQL query within a connection and stores the full (materialized) result in an arrow structure. If the
query fails to execute, DuckDBError is returned and the error message can be retrieved by calling duckdb_query_arrow_error.

Note that after running duckdb_query_arrow, duckdb_destroy_arrow must be called on the result object even if the query fails,
otherwise the error stored within the result will not be freed correctly.

Syntax

duckdb_state duckdb_query_arrow(
duckdb_connection connection,
const char *query,
duckdb_arrow *out_result
);

Parameters

• connection

The connection to perform the query in.

• query

The SQL query to run.

• out_result

The query result.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_query_arrow_schema Fetch the internal arrow schema from the arrow result. Remember to call release on the respective
ArrowSchema object.

198
DuckDB Documentation

Syntax

duckdb_state duckdb_query_arrow_schema(
duckdb_arrow result,
duckdb_arrow_schema *out_schema
);

Parameters

• result

The result to fetch the schema from.

• out_schema

The output schema.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_prepared_arrow_schema Fetch the internal arrow schema from the prepared statement. Remember to call release on the
respective ArrowSchema object.

Syntax

duckdb_state duckdb_prepared_arrow_schema(
duckdb_prepared_statement prepared,
duckdb_arrow_schema *out_schema
);

Parameters

• result

The prepared statement to fetch the schema from.

• out_schema

The output schema.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_result_arrow_array Convert a data chunk into an arrow struct array. Remember to call release on the respective ArrowAr‑
ray object.

Syntax

void duckdb_result_arrow_array(
duckdb_result result,
duckdb_data_chunk chunk,
duckdb_arrow_array *out_array
);

199
DuckDB Documentation

Parameters

• result

The result object the data chunk have been fetched from.

• chunk

The data chunk to convert.

• out_array

The output array.

duckdb_query_arrow_array Fetch an internal arrow struct array from the arrow result. Remember to call release on the respective
ArrowArray object.

This function can be called multiple time to get next chunks, which will free the previous out_array. So consume the out_array before
calling this function again.

Syntax
duckdb_state duckdb_query_arrow_array(
duckdb_arrow result,
duckdb_arrow_array *out_array
);

Parameters

• result

The result to fetch the array from.

• out_array

The output array.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_arrow_column_count Returns the number of columns present in the arrow result object.

Syntax
idx_t duckdb_arrow_column_count(
duckdb_arrow result
);

Parameters

• result

The result object.

• returns

The number of columns present in the result object.

duckdb_arrow_row_count Returns the number of rows present in the arrow result object.

200
DuckDB Documentation

Syntax

idx_t duckdb_arrow_row_count(
duckdb_arrow result
);

Parameters

• result

The result object.

• returns

The number of rows present in the result object.

duckdb_arrow_rows_changed Returns the number of rows changed by the query stored in the arrow result. This is relevant only
for INSERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0.

Syntax

idx_t duckdb_arrow_rows_changed(
duckdb_arrow result
);

Parameters

• result

The result object.

• returns

The number of rows changed.

duckdb_query_arrow_error Returns the error message contained within the result. The error is only set if duckdb_query_
arrow returns DuckDBError.

The error message should not be freed. It will be de‑allocated when duckdb_destroy_arrow is called.

Syntax

const char *duckdb_query_arrow_error(


duckdb_arrow result
);

Parameters

• result

The result object to fetch the error from.

• returns

The error of the result.

duckdb_destroy_arrow Closes the result and de‑allocates all memory allocated for the arrow result.

201
DuckDB Documentation

Syntax

void duckdb_destroy_arrow(
duckdb_arrow *result
);

Parameters

• result

The result to destroy.

duckdb_destroy_arrow_stream Releases the arrow array stream and de‑allocates its memory.

Syntax

void duckdb_destroy_arrow_stream(
duckdb_arrow_stream *stream_p
);

Parameters

• stream

The arrow array stream to destroy.

duckdb_execute_prepared_arrow Executes the prepared statement with the given bound parameters, and returns an arrow
query result. Note that after running duckdb_execute_prepared_arrow, duckdb_destroy_arrow must be called on the result
object.

Syntax

duckdb_state duckdb_execute_prepared_arrow(
duckdb_prepared_statement prepared_statement,
duckdb_arrow *out_result
);

Parameters

• prepared_statement

The prepared statement to execute.

• out_result

The query result.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_arrow_scan Scans the Arrow stream and creates a view with the given name.

202
DuckDB Documentation

Syntax
duckdb_state duckdb_arrow_scan(
duckdb_connection connection,
const char *table_name,
duckdb_arrow_stream arrow
);

Parameters

• connection

The connection on which to execute the scan.

• table_name

Name of the temporary view to create.

• arrow

Arrow stream wrapper.

• returns

DuckDBSuccess on success or DuckDBError on failure.

duckdb_arrow_array_scan Scans the Arrow array and creates a view with the given name. Note that after running duckdb_
arrow_array_scan, duckdb_destroy_arrow_stream must be called on the out stream.

Syntax
duckdb_state duckdb_arrow_array_scan(
duckdb_connection connection,
const char *table_name,
duckdb_arrow_schema arrow_schema,
duckdb_arrow_array arrow_array,
duckdb_arrow_stream *out_stream
);

Parameters

• connection

The connection on which to execute the scan.

• table_name

Name of the temporary view to create.

• arrow_schema

Arrow schema wrapper.

• arrow_array

Arrow array wrapper.

• out_stream

Output array stream that wraps around the passed schema, for releasing/deleting once done.

• returns

DuckDBSuccess on success or DuckDBError on failure.

203
DuckDB Documentation

duckdb_execute_tasks Execute DuckDB tasks on this thread.

Will return after max_tasks have been executed, or if there are no more tasks present.

Syntax
void duckdb_execute_tasks(
duckdb_database database,
idx_t max_tasks
);

Parameters

• database

The database object to execute tasks for

• max_tasks

The maximum amount of tasks to execute

duckdb_create_task_state Creates a task state that can be used with duckdb_execute_tasks_state to execute tasks until
duckdb_finish_execution is called on the state.

duckdb_destroy_state must be called on the result.

Syntax
duckdb_task_state duckdb_create_task_state(
duckdb_database database
);

Parameters

• database

The database object to create the task state for

• returns

The task state that can be used with duckdb_execute_tasks_state.

duckdb_execute_tasks_state Execute DuckDB tasks on this thread.

The thread will keep on executing tasks forever, until duckdb_finish_execution is called on the state. Multiple threads can share the same
duckdb_task_state.

Syntax
void duckdb_execute_tasks_state(
duckdb_task_state state
);

Parameters

• state

The task state of the executor

204
DuckDB Documentation

duckdb_execute_n_tasks_state Execute DuckDB tasks on this thread.

The thread will keep on executing tasks until either duckdb_finish_execution is called on the state, max_tasks tasks have been executed or
there are no more tasks to be executed.

Multiple threads can share the same duckdb_task_state.

Syntax
idx_t duckdb_execute_n_tasks_state(
duckdb_task_state state,
idx_t max_tasks
);

Parameters

• state

The task state of the executor

• max_tasks

The maximum amount of tasks to execute

• returns

The amount of tasks that have actually been executed

duckdb_finish_execution Finish execution on a specific task.

Syntax
void duckdb_finish_execution(
duckdb_task_state state
);

Parameters

• state

The task state to finish execution

duckdb_task_state_is_finished Check if the provided duckdb_task_state has finished execution

Syntax
bool duckdb_task_state_is_finished(
duckdb_task_state state
);

Parameters

• state

The task state to inspect

• returns

Whether or not duckdb_finish_execution has been called on the task state

205
DuckDB Documentation

duckdb_destroy_task_state Destroys the task state returned from duckdb_create_task_state.

Note that this should not be called while there is an active duckdb_execute_tasks_state running on the task state.

Syntax

void duckdb_destroy_task_state(
duckdb_task_state state
);

Parameters

• state

The task state to clean up

duckdb_execution_is_finished Returns true if the execution of the current query is finished.

Syntax

bool duckdb_execution_is_finished(
duckdb_connection con
);

Parameters

• con

The connection on which to check

duckdb_stream_fetch_chunk Fetches a data chunk from the (streaming) duckdb_result. This function should be called repeatedly
until the result is exhausted.

The result must be destroyed with duckdb_destroy_data_chunk.

This function can only be used on duckdb_results created with 'duckdb_pending_prepared_streaming'

If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy
result functions or the materialized result functions).

It is not known beforehand how many chunks will be returned by this result.

Syntax

duckdb_data_chunk duckdb_stream_fetch_chunk(
duckdb_result result
);

Parameters

• result

The result object to fetch the data chunk from.

• returns

The resulting data chunk. Returns NULL if the result has an error.

206
DuckDB Documentation

C++ API

Installation

The DuckDB C++ API can be installed as part of the libduckdb packages. Please see the installation page for details.

Basic API Usage

DuckDB implements a custom C++ API. This is built around the abstractions of a database instance (DuckDB class), multiple Connections
to the database instance and QueryResult instances as the result of queries. The header file for the C++ API is duckdb.hpp.

Note. The standard source distribution of libduckdb contains an ”amalgamation” of the DuckDB sources, which combine all
sources into two files duckdb.hpp and duckdb.cpp. The duckdb.hpp header is much larger in this case. Regardless of whether
you are using the amalgamation or not, just include duckdb.hpp.

Startup & Shutdown To use DuckDB, you must first initialize a DuckDB instance using its constructor. DuckDB() takes as parameter
the database file to read and write from. The special value nullptr can be used to create an in‑memory database. Note that for an
in‑memory database no data is persisted to disk (i.e., all data is lost when you exit the process). The second parameter to the DuckDB
constructor is an optional DBConfig object. In DBConfig, you can set various database parameters, for example the read/write mode
or memory limits. The DuckDB constructor may throw exceptions, for example if the database file is not usable.

With the DuckDB instance, you can create one or many Connection instances using the Connection() constructor. While connections
should be thread‑safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection if you
are in a multithreaded environment.

DuckDB db(nullptr);
Connection con(db);

Querying Connections expose the Query() method to send a SQL query string to DuckDB from C++. Query() fully materializes the
query result as a MaterializedQueryResult in memory before returning at which point the query result can be consumed. There is
also a streaming API for queries, see further below.

// create a table
con.Query("CREATE TABLE integers (i INTEGER, j INTEGER)");

// insert three rows into the table


con.Query("INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL)");

MaterializedQueryResult result = con.Query("SELECT * FROM integers");


if (!result->success) {
cerr << result->error;
}

The MaterializedQueryResult instance contains firstly two fields that indicate whether the query was successful. Query will not
throw exceptions under normal circumstances. Instead, invalid queries or other issues will lead to the success boolean field in the query
result instance to be set to false. In this case an error message may be available in error as a string. If successful, other fields are set:
the type of statement that was just executed (e.g., StatementType::INSERT_STATEMENT) is contained in statement_type. The
high‑level (”Logical type”/”SQL type”) types of the result set columns are in types. The names of the result columns are in the names
string vector. In case multiple result sets are returned, for example because the result set contained multiple statements, the result set can
be chained using the next field.

DuckDB also supports prepared statements in the C++ API with the Prepare() method. This returns an instance of PreparedState-
ment. This instance can be used to execute the prepared statement with parameters. Below is an example:

std::unique_ptr<PreparedStatement> prepare = con.Prepare("SELECT count(*) FROM a WHERE i = $1");


std::unique_ptr<QueryResult> result = prepare->Execute(12);

207
DuckDB Documentation

Note. Warning Do not use prepared statements to insert large amounts of data into DuckDB. See the data import documentation
for better options.

UDF API The UDF API allows the definition of user‑defined functions. It is exposed in duckdb:Connection through the methods:
CreateScalarFunction(), CreateVectorizedFunction(), and variants. These methods created UDFs into the temporary
schema (TEMP_SCHEMA) of the owner connection that is the only one allowed to use and change them.

CreateScalarFunction The user can code an ordinary scalar function and invoke the CreateScalarFunction() to register and af‑
terward use the UDF in a SELECT statement, for instance:

bool bigger_than_four(int value) {


return value > 4;
}

connection.CreateScalarFunction<bool, int>("bigger_than_four", &bigger_than_four);

connection.Query("SELECT bigger_than_four(i) FROM (VALUES(3), (5)) tbl(i)")->Print();

The CreateScalarFunction() methods automatically creates vectorized scalar UDFs so they are as efficient as built‑in functions, we
have two variants of this method interface as follows:

1.

template<typename TR, typename... Args>


void CreateScalarFunction(string name, TR (*udf_func)(Args…))

• template parameters:

– TR is the return type of the UDF function;


– Args are the arguments up to 3 for the UDF function (this method only supports until ternary functions);

• name: is the name to register the UDF function;


• udf_func: is a pointer to the UDF function.

This method automatically discovers from the template typenames the corresponding LogicalTypes:

• bool → LogicalType::BOOLEAN
• int8_t → LogicalType::TINYINT
• int16_t → LogicalType::SMALLINT
• int32_t → LogicalType::INTEGER
• int64_t →LogicalType::BIGINT
• float → LogicalType::FLOAT
• double → LogicalType::DOUBLE
• string_t → LogicalType::VARCHAR

In DuckDB some primitive types, e.g., int32_t, are mapped to the same LogicalType: INTEGER, TIME and DATE, then for disam‑
biguation the users can use the following overloaded method.

2.

template<typename TR, typename... Args>


void CreateScalarFunction(string name, vector<LogicalType> args, LogicalType ret_type, TR (*udf_
func)(Args…))

An example of use would be:

int32_t udf_date(int32_t a) {
return a;
}

208
DuckDB Documentation

con.Query("CREATE TABLE dates (d DATE)");


con.Query("INSERT INTO dates VALUES ('1992-01-01')");

con.CreateScalarFunction<int32_t, int32_t>("udf_date", {LogicalType::DATE}, LogicalType::DATE, &udf_


date);

con.Query("SELECT udf_date(d) FROM dates")->Print();

• template parameters:

– TR is the return type of the UDF function;


– Args are the arguments up to 3 for the UDF function (this method only supports until ternary functions);

• name: is the name to register the UDF function;


• args: are the LogicalType arguments that the function uses, which should match with the template Args types;
• ret_type: is the LogicalType of return of the function, which should match with the template TR type;
• udf_func: is a pointer to the UDF function.

This function checks the template types against the LogicalTypes passed as arguments and they must match as follow:

• LogicalTypeId::BOOLEAN → bool
• LogicalTypeId::TINYINT → int8_t
• LogicalTypeId::SMALLINT → int16_t
• LogicalTypeId::DATE, LogicalTypeId::TIME, LogicalTypeId::INTEGER → int32_t
• LogicalTypeId::BIGINT, LogicalTypeId::TIMESTAMP → int64_t
• LogicalTypeId::FLOAT, LogicalTypeId::DOUBLE, LogicalTypeId::DECIMAL → double
• LogicalTypeId::VARCHAR, LogicalTypeId::CHAR, LogicalTypeId::BLOB → string_t
• LogicalTypeId::VARBINARY → blob_t

CreateVectorizedFunction The CreateVectorizedFunction() methods register a vectorized UDF such as:

/*
* This vectorized function copies the input values to the result vector
*/
template<typename TYPE>
static void udf_vectorized(DataChunk &args, ExpressionState &state, Vector &result) {
// set the result vector type
result.vector_type = VectorType::FLAT_VECTOR;
// get a raw array from the result
auto result_data = FlatVector::GetData<TYPE>(result);

// get the solely input vector


auto &input = args.data[0];
// now get an orrified vector
VectorData vdata;
input.Orrify(args.size(), vdata);

// get a raw array from the orrified input


auto input_data = (TYPE *)vdata.data;

// handling the data


for (idx_t i = 0; i < args.size(); i++) {
auto idx = vdata.sel->get_index(i);
if ((*vdata.nullmask)[idx]) {
continue;
}
result_data[i] = input_data[idx];
}
}

209
DuckDB Documentation

con.Query("CREATE TABLE integers (i INTEGER)");


con.Query("INSERT INTO integers VALUES (1), (2), (3), (999)");

con.CreateVectorizedFunction<int, int>("udf_vectorized_int", &&udf_vectorized<int>);

con.Query("SELECT udf_vectorized_int(i) FROM integers")->Print();

The Vectorized UDF is a pointer of the type scalar_function_t:

typedef std::function<void(DataChunk &args, ExpressionState &expr, Vector &result)> scalar_function_t;

• args is a DataChunk that holds a set of input vectors for the UDF that all have the same length;
• expr is an ExpressionState that provides information to the query's expression state;
• result: is a Vector to store the result values.

There are different vector types to handle in a Vectorized UDF:

• ConstantVector;
• DictionaryVector;
• FlatVector;
• ListVector;
• StringVector;
• StructVector;
• SequenceVector.

The general API of the CreateVectorizedFunction() method is as follows:

1.

template<typename TR, typename... Args>


void CreateVectorizedFunction(string name, scalar_function_t udf_func, LogicalType varargs =
LogicalType::INVALID)

• template parameters:

– TR is the return type of the UDF function;


– Args are the arguments up to 3 for the UDF function.

• name is the name to register the UDF function;


• udf_func is a vectorized UDF function;
• varargs The type of varargs to support, or LogicalTypeId::INVALID (default value) if the function does not accept variable length
arguments.

This method automatically discovers from the template typenames the corresponding LogicalTypes:

• bool → LogicalType::BOOLEAN;
• int8_t → LogicalType::TINYINT;
• int16_t → LogicalType::SMALLINT
• int32_t → LogicalType::INTEGER
• int64_t → LogicalType::BIGINT
• float → LogicalType::FLOAT
• double → LogicalType::DOUBLE
• string_t → LogicalType::VARCHAR

2.

template<typename TR, typename... Args>


void CreateVectorizedFunction(string name, vector<LogicalType> args, LogicalType ret_type, scalar_
function_t udf_func, LogicalType varargs = LogicalType::INVALID)

210
DuckDB Documentation

CLI

CLI API

Installation

The DuckDB CLI (Command Line Interface) is a single, dependency‑free executable. It is precompiled for Windows, Mac, and Linux for both
the stable version and for nightly builds produced by GitHub Actions. Please see the installation page under the CLI tab for download
links.

The DuckDB CLI is based on the SQLite command line shell, so CLI‑client‑specific functionality is similar to what is described in the SQLite
documentation (although DuckDB's SQL syntax follows PostgreSQL conventions).

Note. DuckDB has a tldr page that summarizes the most common uses of the CLI client. If you have tldr installed, you can display
it by running tldr duckdb.

Getting Started

Once the CLI executable has been downloaded, unzip it and save it to any directory. Navigate to that directory in a terminal and enter the
command duckdb to run the executable. If in a PowerShell or POSIX shell environment, use the command ./duckdb instead.

Usage

The typical usage of the duckdb command is the following:

$ duckdb [OPTIONS] [FILENAME]

Options The [OPTIONS] part encodes arguments for the CLI client. Common options include:

• -csv: sets the output mode to CSV


• -json: sets the output mode to JSON
• -readonly: open the database in read‑only mode (see concurrency in DuckDB)

For a full list of options, see the command line arguments page.

In‑Memory vs. Persistent Database When no [FILENAME] argument is provided, the DuckDB CLI will open a temporary in‑memory
database. You will see DuckDB's version number, the information on the connection and a prompt starting with a D.

$ duckdb

v0.10.0 20b1486d11
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D

To open or create a persistent database, simply include a path as a command line argument like duckdb path/to/my_
database.duckdb or duckdb my_database.db.

Running SQL Statements in the CLI Once the CLI has been opened, enter a SQL statement followed by a semicolon, then hit enter and
it will be executed. Results will be displayed in a table in the terminal. If a semicolon is omitted, hitting enter will allow for multi‑line SQL
statements to be entered.

SELECT 'quack' AS my_column;

211
DuckDB Documentation

┌───────────┐
│ my_column │
│ varchar │
├───────────┤
│ quack │
└───────────┘

The CLI supports all of DuckDB's rich SQL syntax including SELECT, CREATE, and ALTER statements.

Editor Features The CLI supports autocompletion, and has sophisticated editor features and syntax highlighting on certain platforms.

Exiting the CLI To exit the CLI, press Ctrl‑D if your platform supports it. Otherwise press Ctrl‑C or use the .exit command. If used
a persistent database, DuckDB will automatically checkpoint (save the latest edits to disk) and close. This will remove the .wal file (the
Write‑Ahead‑Log) and consolidate all of your data into the single‑file database.

Dot Commands In addition to SQL syntax, special dot commands may be entered into the CLI client. To use one of these commands,
begin the line with a period (.) immediately followed by the name of the command you wish to execute. Additional arguments to the
command are entered, space separated, after the command. If an argument must contain a space, either single or double quotes may
be used to wrap that parameter. Dot commands must be entered on a single line, and no whitespace may occur before the period. No
semicolon is required at the end of the line.

Frequently‑used configurations can be stored in the file ~/.duckdbrc, which will be loaded when starting the CLI client. See the Config‑
uring the CLI section below for further information on these options.

Below, we summarize a few important dot commands. To see all available commands, see the dot commands page or use the .help
command.

Opening Database Files In addition to connecting to a database when opening the CLI, a new database connection can be made by using
the .open command. If no additional parameters are supplied, a new in‑memory database connection is created. This database will not
be persisted when the CLI connection is closed.

.open

The .open command optionally accepts several options, but the final parameter can be used to indicate a path to a persistent database
(or where one should be created). The special string :memory: can also be used to open a temporary in‑memory database.

.open persistent.duckdb

One important option accepted by .open is the --readonly flag. This disallows any editing of the database. To open in read only mode,
the database must already exist. This also means that a new in‑memory database can't be opened in read only mode since in‑memory
databases are created upon connection.

.open --readonly preexisting.duckdb

Output Formats The .mode dot command may be used to change the appearance of the tables returned in the terminal output. These
include the default duckbox mode, csv and json mode for ingestion by other tools, markdown and latex for documents, and insert
mode for generating SQL statements.

Writing Results to a File By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be modified using
either the .output or .once commands. For details, see the documentation for the output dot command.

212
DuckDB Documentation

Reading SQL from a File The DuckDB CLI can read both SQL commands and dot commands from an external file instead of the terminal
using the .read command. This allows for a number of commands to be run in sequence and allows command sequences to be saved
and reused.

The .read command requires only one argument: the path to the file containing the SQL and/or commands to execute. After running the
commands in the file, control will revert back to the terminal. Output from the execution of that file is governed by the same .output and
.once commands that have been discussed previously. This allows the output to be displayed back to the terminal, as in the first example
below, or out to another file, as in the second example.

In this example, the file select_example.sql is located in the same directory as duckdb.exe and contains the following SQL state‑
ment:

SELECT *
FROM generate_series(5);

To execute it from the CLI, the .read command is used.

.read select_example.sql

The output below is returned to the terminal by default. The formatting of the table can be adjusted using the .output or .once com‑
mands.

| generate_series |
|-----------------|
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |

Multiple commands, including both SQL and dot commands, can also be run in a single .read command. In this example, the file write_
markdown_to_file.sql is located in the same directory as duckdb.exe and contains the following commands:

.mode markdown
.output series.md
SELECT *
FROM generate_series(5);

To execute it from the CLI, the .read command is used as before.

.read write_markdown_to_file.sql

In this case, no output is returned to the terminal. Instead, the file series.md is created (or replaced if it already existed) with the
markdown‑formatted results shown here:

| generate_series |
|-----------------|
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |

Configuring the CLI

Several dot commands can be used to configure the CLI. On startup, the CLI reads and executes all commands in the file ~/.duckdbrc,
including dot commands and SQL statements. This allows you to store the configuration state of the CLI. You may also point to a different
initialization file using the -init.

213
DuckDB Documentation

Setting a Custom Prompt As an example, a file in the same directory as the DuckDB CLI named prompt.sql will change the DuckDB
prompt to be a duck head and run a SQL statement. Note that the duck head is built with Unicode characters and does not work in all
terminal environments (e.g., in Windows, unless running with WSL and using the Windows Terminal).

.prompt ' '

To invoke that file on initialization, use this command:

$ duckdb -init prompt.sql

This outputs:

-- Loading resources from prompt.sql


v version git hash
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

Non‑Interactive Usage

To read/process a file and exit immediately, pipe the file contents in to duckdb:

$ duckdb < select_example.sql

To execute a command with SQL text passed in directly from the command line, call duckdb with two arguments: the database location
(or :memory:), and a string with the SQL statement to execute.

$ duckdb :memory: "SELECT 42 AS the_answer"

Loading Extensions

To load extensions, use DuckDB's SQL INSTALL and LOAD commands as you would other SQL statements.

INSTALL fts;
LOAD fts;

For details, see the Extension docs.

Reading from stdin and Writing to stdout

When in a Unix environment, it can be useful to pipe data between multiple commands. DuckDB is able to read data from stdin as well
as write to stdout using the file location of stdin (/dev/stdin) and stdout (/dev/stdout) within SQL commands, as pipes act very
similarly to file handles.

This command will create an example CSV:

COPY (SELECT 42 AS woot UNION ALL SELECT 43 AS woot) TO 'test.csv' (HEADER);

First, read a file and pipe it to the duckdb CLI executable. As arguments to the DuckDB CLI, pass in the location of the database to open,
in this case, an in‑memory database, and a SQL command that utilizes /dev/stdin as a file location.

$ cat test.csv | duckdb :memory: "SELECT * FROM read_csv('/dev/stdin')"

┌───────┐
│ woot │
│ int32 │
├───────┤
│ 42 │
│ 43 │
└───────┘

214
DuckDB Documentation

To write back to stdout, the copy command can be used with the /dev/stdout file location.

$ cat test.csv | duckdb :memory: "COPY (SELECT * FROM read_csv('/dev/stdin')) TO '/dev/stdout' WITH
(FORMAT 'csv', HEADER)"

woot
42
43

Reading Environment Variables

The getenv function can read environment variables.

Examples To retrieve the home directory's path from the HOME environment variable, use:

SELECT getenv('HOME') AS home;

┌──────────────────┐
│ home │
│ varchar │
├──────────────────┤
│ /Users/user_name │
└──────────────────┘

The output of the getenv function can be used to set configuration options. For example, to set the NULL order based on the environment
variable DEFAULT_NULL_ORDER, use:

SET default_null_order = getenv('DEFAULT_NULL_ORDER');

Restrictions for Reading Environment Variables The getenv function can only be run when the enable_external_access is set
to true (the default setting). It is only available in the CLI client and is not supported in other DuckDB clients.

Prepared Statements

The DuckDB CLI supports executing prepared statements in addition to regular SELECT statements. To create and execute a prepared
statement in the CLI client, use the PREPARE clause and the EXECUTE statement.

Command Line Arguments

The table below summarizes DuckDB's command line options. To list all command line options, use the command duckdb -help. Fot
a list of dot commands available in the CLI shell, see the Dot Commands page.

Argument Description

-append Append the database to the end of the file


-ascii Set output mode to ascii
-bail Stop after hitting an error
-batch Force batch I/O
-box Set output mode to box
-column Set output mode to column
-cmd COMMAND Run COMMAND before reading stdin
-c COMMAND Run COMMAND and exit

215
DuckDB Documentation

Argument Description

-csv Set output mode to csv


-echo Print commands before execution
-init FILENAME Run the script in FILENAME upon startup (instead of ~./duckdbrc)
-header Turn headers on
-help Show this message
-html Set output mode to HTML
-interactive Force interactive I/O
-json Set output mode to json
-line Set output mode to line
-list Set output mode to list
-markdown Set output mode to markdown
-newline SEP Set output row separator. Default: \n
-nofollow Refuse to open symbolic links to database files
-noheader Turn headers off
-no-stdin Exit after processing options instead of reading stdin
-nullvalue TEXT Set text string for NULL values. Default: empty string
-quote Set output mode to quote
-readonly Open the database read‑only
-s COMMAND Run COMMAND and exit
-separator SEP Set output column separator to SEP. Default: |
-stats Print memory stats before each finalize
-table Set output mode to table
-unsigned Allow loading of unsigned extensions
-version Show DuckDB version

Dot Commands

Dot commands are available in the DuckDB CLI client. To use one of these commands, begin the line with a period (.) immediately followed
by the name of the command you wish to execute. Additional arguments to the command are entered, space separated, after the command.
If an argument must contain a space, either single or double quotes may be used to wrap that parameter. Dot commands must be entered
on a single line, and no whitespace may occur before the period. No semicolon is required at the end of the line. To see available commands,
use the .help command.

Dot Commands

Command Description

.bail on|off Stop after hitting an error. Default: off


.binary on|off Turn binary output on or off. Default: off
.cd DIRECTORY Change the working directory to DIRECTORY
.changes on|off Show number of rows changed by SQL

216
DuckDB Documentation

Command Description

.check GLOB Fail if output since .testcase does not match


.columns Column‑wise rendering of query results
.constant ?COLOR? Sets the syntax highlighting color used for constant values
.constantcode ?CODE? Sets the syntax highlighting terminal code used for constant values
.databases List names and files of attached databases
.echo on|off Turn command echo on or off
.excel Display the output of next command in spreadsheet
.exit ?CODE? Exit this program with return‑code CODE
.explain ?on|off|auto? Change the EXPLAIN formatting mode. Default: auto
.fullschema ?--indent? Show schema and the content of sqlite_stat tables
.headers on|off Turn display of headers on or off
.help ?-all? ?PATTERN? Show help text for PATTERN
.highlight [on|off] Toggle syntax highlighting in the shell on/off
.import FILE TABLE Import data from FILE into TABLE
.indexes ?TABLE? Show names of indexes
.keyword ?COLOR? Sets the syntax highlighting color used for keywords
.keywordcode ?CODE? Sets the syntax highlighting terminal code used for keywords
.lint OPTIONS Report potential schema issues.
.log FILE|off Turn logging on or off. FILE can be stderr/stdout
.maxrows COUNT Sets the maximum number of rows for display. Only for duckbox mode
.maxwidth COUNT Sets the maximum width in characters. 0 defaults to terminal width. Only for duckbox
mode
.mode MODE ?TABLE? Set output mode
.nullvalue STRING Use STRING in place of NULL values
.once ?OPTIONS? ?FILE? Output for the next SQL command only to FILE
.open ?OPTIONS? ?FILE? Close existing database and reopen FILE
.output ?FILE? Send output to FILE or stdout if FILE is omitted
.parameter CMD ... Manage SQL parameter bindings
.print STRING... Print literal STRING
.prompt MAIN CONTINUE Replace the standard prompts
.quit Exit this program
.read FILE Read input from FILE
.rows Row‑wise rendering of query results (default)
.schema ?PATTERN? Show the CREATE statements matching PATTERN
.separator COL ?ROW? Change the column and row separators
.sha3sum ... Compute a SHA3 hash of database content
.shell CMD ARGS... Run CMD ARGS... in a system shell
.show Show the current values for various settings
.system CMD ARGS... Run CMD ARGS... in a system shell
.tables ?TABLE? List names of tables matching LIKE pattern TABLE

217
DuckDB Documentation

Command Description

.testcase NAME Begin redirecting output to NAME


.timer on|off Turn SQL timer on or off
.width NUM1 NUM2 ... Set minimum column widths for columnar output

Using the .help Commmand

The .help text may be filtered by passing in a text string as the second argument.

.help m

.maxrows COUNT Sets the maximum number of rows for display (default: 40). Only for duckbox mode.
.maxwidth COUNT Sets the maximum width in characters. 0 defaults to terminal width. Only for duckbox
mode.
.mode MODE ?TABLE? Set output mode

.output: Writing Results to a File By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be
modified using either the .output or .once commands. Pass in the desired output file location as a parameter. The .once command will
only output the next set of results and then revert to standard out, but .output will redirect all subsequent output to that file location.
Note that each result will overwrite the entire file at that destination. To revert back to standard output, enter .output with no file
parameter.

In this example, the output format is changed to markdown, the destination is identified as a Markdown file, and then DuckDB will write
the output of the SQL statement to that file. Output is then reverted to standard output using .output with no parameter.

.mode markdown
.output my_results.md
SELECT 'taking flight' AS output_column;
.output
SELECT 'back to the terminal' AS displayed_column;

The file my_results.md will then contain:

| output_column |
|---------------|
| taking flight |

The terminal will then display:

| displayed_column |
|----------------------|
| back to the terminal |

A common output format is CSV, or comma separated values. DuckDB supports SQL syntax to export data as CSV or Parquet, but the CLI‑
specific commands may be used to write a CSV instead if desired.

.mode csv
.once my_output_file.csv
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;

The file my_output_file.csv will then contain:

col_1,col_2
1,2
10,20

218
DuckDB Documentation

By passing special options (flags) to the .once command, query results can also be sent to a temporary file and automatically opened in
the user's default program. Use either the -e flag for a text file (opened in the default text editor), or the -x flag for a CSV file (opened in
the default spreadsheet editor). This is useful for more detailed inspection of query results, especially if there is a relatively large result set.
The .excel command is equivalent to .once -x.

.once -e
SELECT 'quack' AS hello;

The results then open in the default text file editor of the system, for example:

Querying the Database Schema

All DuckDB clients support querying the database schema with SQL, but the CLI has additional dot commands that can make it easier to
understand the contents of a database. The .tables command will return a list of tables in the database. It has an optional argument
that will filter the results according to a LIKE pattern.

CREATE TABLE swimmers AS SELECT 'duck' AS animal;


CREATE TABLE fliers AS SELECT 'duck' AS animal;
CREATE TABLE walkers AS SELECT 'duck' AS animal;
.tables

fliers swimmers walkers

For example, to filter to only tables that contain an ”l”, use the LIKE pattern %l%.

.tables %l%

fliers walkers

The .schema command will show all of the SQL statements used to define the schema of the database.

.schema

CREATE TABLE fliers (animal VARCHAR);


CREATE TABLE swimmers (animal VARCHAR);
CREATE TABLE walkers (animal VARCHAR);

Configuring the Syntax Highlighter

By default the shell includes support for syntax highlighting. The CLI's syntax highlighter can be configured using the following com‑
mands.

To turn off the highlighter:

.highlight on

To turn on the highlighter:

.highlight off

To configure the color used to highlight constants:

.constant
[red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta

.constantcode [terminal_code]

To configure the color used to highlight keywords:

.keyword
[red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta

.keywordcode [terminal_code]

219
DuckDB Documentation

Importing Data from CSV

Note. Deprecated This feature is only included for compatibility reasons and may be removed in the future. Use the read_csv
function or the COPY statement to load CSV files.

DuckDB supports SQL syntax to directly query or import CSV files, but the CLI‑specific commands may be used to import a CSV instead if
desired. The .import command takes two arguments and also supports several options. The first argument is the path to the CSV file,
and the second is the name of the DuckDB table to create. Since DuckDB requires stricter typing than SQLite (upon which the DuckDB CLI
is based), the destination table must be created before using the .import command. To automatically detect the schema and create a
table from a CSV, see the read_csv examples in the import docs.

In this example, a CSV file is generated by changing to CSV mode and setting an output file location:

.mode csv
.output import_example.csv
SELECT 1 AS col_1, 2 AS col_2 UNION ALL SELECT 10 AS col1, 20 AS col_2;

Now that the CSV has been written, a table can be created with the desired schema and the CSV can be imported. The output is reset to the
terminal to avoid continuing to edit the output file specified above. The --skip N option is used to ignore the first row of data since it is
a header row and the table has already been created with the correct column names.

.mode csv
.output
CREATE TABLE test_table (col_1 INT, col_2 INT);
.import import_example.csv test_table --skip 1

Note that the .import command utilizes the current .mode and .separator settings when identifying the structure of the data to
import. The --csv option can be used to override that behavior.

.import import_example.csv test_table --skip 1 --csv

Output Formats

The .mode dot command may be used to change the appearance of the tables returned in the terminal output. In addition to customizing
the appearance, these modes have additional benefits. This can be useful for presenting DuckDB output elsewhere by redirecting the
terminal output to a file. Using the insert mode will build a series of SQL statements that can be used to insert the data at a later point.
The markdown mode is particularly useful for building documentation and the latex mode is useful for writing academic papers.

Mode Description

ascii Columns/rows delimited by 0x1F and 0x1E


box Tables using unicode box‑drawing characters
csv Comma‑separated values
column Output in columns. (See .width)
duckbox Tables with extensive features
html HTML <table> code
insert SQL insert statements for TABLE
json Results in a JSON array
jsonlines Results in a NDJSON
latex LaTeX tabular environment code
line One value per line
list Values delimited by ”|”
markdown Markdown table format

220
DuckDB Documentation

Mode Description

quote Escape answers as for SQL


table ASCII‑art table
tabs Tab‑separated values
tcl TCL list elements
trash No output

.mode markdown
SELECT 'quacking intensifies' AS incoming_ducks;

| incoming_ducks |
|----------------------|
| quacking intensifies |

The output appearance can also be adjusted with the .separator command. If using an export mode that relies on a separator (csv or
tabs for example), the separator will be reset when the mode is changed. For example, .mode csv will set the separator to a comma (,).
Using .separator "|" will then convert the output to be pipe‑separated.

.mode csv
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;

col_1,col_2
1,2
10,20

.separator "|"
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;

col_1|col_2
1|2
10|20

Editing
Note. The linenoise‑based CLI editor is currently only available for macOS and Linux.

DuckDB's CLI uses a line‑editing library based on linenoise, which has short‑cuts that are based on Emacs mode of readline. Below is a list
of available commands.

Moving

Key Action

Left Move back a character


Right Move forward a character
Up Move up a line. When on the first line, move to previous history entry
Down Move down a line. When on last line, move to next history entry
Home Move to beginning of buffer

221
DuckDB Documentation

Key Action

End Move to end of buffer


Ctrl+Left Move back a word
Ctrl+Right Move forward a word
Ctrl+A Move to beginning of buffer
Ctrl+B Move back a character
Ctrl+E Move to end of buffer
Ctrl+F Move forward a character
Alt+Left Move back a word
Alt+Right Move forward a word

History

Key Action

Ctrl+P Move to previous history entry


Ctrl+N Move to next history entry
Ctrl+R Search the history
Ctrl+S Search the history
Alt+< Move to first history entry
Alt+> Move to last history entry
Alt+N Search the history
Alt+P Search the history

Changing Text

Key Action

Backspace Delete previous character


Delete Delete next character
Ctrl+D Delete next character. When buffer is empty, end editing
Ctrl+H Delete previous character
Ctrl+K Delete everything after the cursor
Ctrl+T Swap current and next character
Ctrl+U Delete all text
Ctrl+W Delete previous word
Alt+C Convert next word to titlecase
Alt+D Delete next word
Alt+L Convert next word to lowercase
Alt+R Delete all text
Alt+T Swap current and next word

222
DuckDB Documentation

Key Action

Alt+U Convert next word to uppercase


Alt+Backspace Delete previous word
Alt+\ Delete spaces around cursor

Completing

Key Action

Tab Autocomplete. When autocompleting, cycle to next entry


Shift+Tab When autocompleting, cycle to previous entry
ESC+ESC When autocompleting, revert autocompletion

Miscellaneous

Key Action

Enter Execute query. If query is not complete, insert a newline at the end of the buffer
Ctrl+J Execute query. If query is not complete, insert a newline at the end of the buffer
Ctrl+C Cancel editing of current query
Ctrl+G Cancel editing of current query
Ctrl+L Clear screen
Ctrl+O Cancel editing of current query
Ctrl+X Insert a newline after the cursor
Ctrl+Z Suspend CLI and return to shell, use fg to re‑open

Using Read‑Line

If you prefer, you can use rlwrap to use read‑line directly with the shell. Then, use Shift+Enter to insert a newline and Enter to execute
the query:

rlwrap --substitute-prompt="D " duckdb -batch

Autocomplete

The shell offers context‑aware autocomplete of SQL queries through the autocomplete extension. autocomplete is triggered by pressing
Tab.

Multiple autocomplete suggestions can be present. You can cycle forwards through the suggestions by repeatedly pressing Tab, or
Shift+Tab to cycle backwards. autocompletion can be reverted by pressing ESC twice.

The shell autocompletes four different groups:

• Keywords
• Table names and table functions
• Column names and scalar functions

223
DuckDB Documentation

• File names

The shell looks at the position in the SQL statement to determine which of these autocompletions to trigger. For example:

SELECT s -> student_id

SELECT student_id F -> FROM

SELECT student_id FROM g -> grades

SELECT student_id FROM 'd -> data/

SELECT student_id FROM 'data/ -> data/grades.csv

Syntax Highlighting
Note. Syntax highlighting in the CLI is currently only available for macOS and Linux.

SQL queries that are written in the shell are automatically highlighted using syntax highlighting.

There are several components of a query that are highlighted in different colors. The colors can be configured using dot commands. Syntax
highlighting can also be disabled entirely using the .highlight off command.

Below is a list of components that can be configured.

Type Command Default Color

Keywords .keyword green


Constants ad literals .constant yellow
Comments .comment brightblack
Errors .error red

224
DuckDB Documentation

Type Command Default Color

Continuation .cont brightblack


Continuation (Selected) .cont_sel green

The components can be configured using either a supported color name (e.g., .keyword red), or by directly providing a terminal code to
use for rendering (e.g., .keywordcode \033[31m). Below is a list of supported color names and their corresponding terminal codes.

Color Terminal Code

red \033[31m
green \033[32m
yellow \033[33m
blue \033[34m
magenta \033[35m
cyan \033[36m
white \033[37m
brightblack \033[90m
brightred \033[91m
brightgreen \033[92m
brightyellow \033[93m
brightblue \033[94m
brightmagenta \033[95m
brightcyan \033[96m
brightwhite \033[97m

For example, here is an alternative set of syntax highlighting colors:

.keyword brightred
.constant brightwhite
.comment cyan
.error yellow
.cont blue
.cont_sel brightblue

If you wish to start up the CLI with a different set of colors every time, you can place these commands in the ~/.duckdbrc file that is
loaded on start‑up of the CLI.

Error Highlighting

The shell has support for highlighting certain errors. In particular, mismatched brackets and unclosed quotes are highlighted in red (or
another color if specified). This highlighting is automatically disabled for large queries. In addition, it can be disabled manually using the
.render_errors off command.

Go

The DuckDB Go driver, go-duckdb, allows using DuckDB via the database/sql interface. For examples on how to use this interface,
see the official documentation and tutorial.

225
DuckDB Documentation

Note. The Go client is provided as a third‑party library.

Installation

To install the go-duckdb client, run:

go get github.com/marcboeker/go-duckdb

Importing

To import the DuckDB Go package, add the following entries to your imports:

import (
"database/sql"
_ "github.com/marcboeker/go-duckdb"
)

Appender

The DuckDB Go client supports the DuckDB Appender API for bulk inserts. You can obtain a new Appender by supplying a DuckDB connec‑
tion to NewAppenderFromConn(). For example:

connector, err := duckdb.NewConnector("test.db", nil)


if err != nil {
...
}
conn, err := connector.Connect(context.Background())
if err != nil {
...
}
defer conn.Close()

// Retrieve appender from connection (note that you have to create the table 'test' beforehand).
appender, err := NewAppenderFromConn(conn, "", "test")
if err != nil {
...
}
defer appender.Close()

err = appender.AppendRow(...)
if err != nil {
...
}

// Optional, if you want to access the appended rows immediately.


err = appender.Flush()
if err != nil {
...
}

Examples

Simple Example An example for using the Go API is as follows:

226
DuckDB Documentation

package main

import (
"database/sql"
"errors"
"fmt"
"log"

_ "github.com/marcboeker/go-duckdb"
)

func main() {
db, err := sql.Open("duckdb", "")
if err != nil {
log.Fatal(err)
}
defer db.Close()

_, err = db.Exec(` CREATE TABLE people (id INTEGER, name VARCHAR)`)


if err != nil {
log.Fatal(err)
}
_, err = db.Exec(` INSERT INTO people VALUES (42, 'John')`)
if err != nil {
log.Fatal(err)
}

var (
id int
name string
)
row := db.QueryRow(` SELECT id, name FROM people`)
err = row.Scan(&id, &name)
if errors.Is(err, sql.ErrNoRows) {
log.Println("no rows")
} else if err != nil {
log.Fatal(err)
}

fmt.Printf("id: %d, name: %s\n", id, name)


}

More Examples For more examples, see the examples in the duckdb-go repository.

Java JDBC API

Installation

The DuckDB Java JDBC API can be installed from Maven Central. Please see the installation page for details.

Basic API Usage

DuckDB's JDBC API implements the main parts of the standard Java Database Connectivity (JDBC) API, version 4.1. Describing JDBC is
beyond the scope of this page, see the official documentation for details. Below we focus on the DuckDB‑specific parts.

227
DuckDB Documentation

Refer to the externally hosted API Reference for more information about our extensions to the JDBC specification, or the below Arrow
Methods.

Startup & Shutdown In JDBC, database connections are created through the standard java.sql.DriverManager class. The driver
should auto‑register in the DriverManager, if that does not work for some reason, you can enforce registration like so:

Class.forName("org.duckdb.DuckDBDriver");

To create a DuckDB connection, call DriverManager with the jdbc:duckdb: JDBC URL prefix, like so:

import java.sql.Connection;
import java.sql.DriverManager;

Connection conn = DriverManager.getConnection("jdbc:duckdb:");

To use DuckDB‑specific features such as the Appender, cast the object to a DuckDBConnection:

import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");

When using the jdbc:duckdb: URL alone, an in‑memory database is created. Note that for an in‑memory database no data is persisted
to disk (i.e., all data is lost when you exit the Java program). If you would like to access or create a persistent database, append its file name
after the path. For example, if your database is stored in /tmp/my_database, use the JDBC URL jdbc:duckdb:/tmp/my_database
to create a connection to it.

It is possible to open a DuckDB database file in read‑only mode. This is for example useful if multiple Java processes want to read the same
database file at the same time. To open an existing database file in read‑only mode, set the connection property duckdb.read_only
like so:

Properties ro_prop = new Properties();


ro_prop.setProperty("duckdb.read_only", "true");
Connection conn_ro = DriverManager.getConnection("jdbc:duckdb:/tmp/my_database", ro_prop);

Additional connections can be created using the DriverManager. A more efficient mechanism is to call the DuckDBConnec-
tion#duplicate() method like so:

Connection conn2 = ((DuckDBConnection) conn).duplicate();

Multiple connections are allowed, but mixing read‑write and read‑only connections is unsupported.

Configuring Connections Configuration options can be provided to change different settings of the database system. Note that many
of these settings can be changed later on using PRAGMA statements as well.

Properties connectionProperties = new Properties();


connectionProperties.setProperty("temp_directory", "/path/to/temp/dir/");
Connection conn = DriverManager.getConnection("jdbc:duckdb:/tmp/my_database", connectionProperties);

Querying DuckDB supports the standard JDBC methods to send queries and retrieve result sets. First a Statement object has to be
created from the Connection, this object can then be used to send queries using execute and executeQuery. execute() is meant
for queries where no results are expected like CREATE TABLE or UPDATE etc. and executeQuery() is meant to be used for queries
that produce results (e.g., SELECT). Below two examples. See also the JDBC Statement and ResultSet documentations.

// create a table
Statement stmt = conn.createStatement();
stmt.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)");
// insert two items into the table
stmt.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)");
stmt.close();

228
DuckDB Documentation

try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) {


while (rs.next()) {
System.out.println(rs.getString(1));
System.out.println(rs.getInt(3));
}
}
// jeans
// 1
// hammer
// 2

DuckDB also supports prepared statements as per the JDBC API:

try (PreparedStatement p_stmt = conn.prepareStatement("INSERT INTO items VALUES (?, ?, ?);")) {


p_stmt.setString(1, "chainsaw");
p_stmt.setDouble(2, 500.0);
p_stmt.setInt(3, 42);
p_stmt.execute();
// more calls to execute() possible
}

Note. Warning Do not use prepared statements to insert large amounts of data into DuckDB. See the data import documentation
for better options.

Arrow Methods Refer to the API Reference for type signatures

Arrow Export The following demonstrates exporting an arrow stream and consuming it using the java arrow bindings

import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.duckdb.DuckDBResultSet;

try (var conn = DriverManager.getConnection("jdbc:duckdb:");


var p_stmt = conn.prepareStatement("SELECT * FROM generate_series(2000)");
var resultset = (DuckDBResultSet) p_stmt.executeQuery();
var allocator = new RootAllocator()) {
try (var reader = (ArrowReader) resultset.arrowExportStream(allocator, 256)) {
while (reader.loadNextBatch()) {
System.out.println(reader.getVectorSchemaRoot().getVector("generate_series"));
}
}
p_stmt.close();
}

Arrow Import The following demonstrates consuming an arrow stream from the java arrow bindings

import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.duckdb.DuckDBConnection;

// Arrow stuff
try (var allocator = new RootAllocator();
ArrowStreamReader reader = null; // should not be null of course
var arrow_array_stream = ArrowArrayStream.allocateNew(allocator)) {
Data.exportArrayStream(allocator, reader, arrow_array_stream);

// DuckDB stuff

229
DuckDB Documentation

try (var conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:")) {


conn.registerArrowStream("asdf", arrow_array_stream);

// run a query
try (var stmt = conn.createStatement();
var rs = (DuckDBResultSet) stmt.executeQuery("SELECT count(*) FROM asdf")) {
while (rs.next()) {
System.out.println(rs.getInt(1));
}
}
}
}

Streaming Results Result streaming is opt‑in in the JDBC driver ‑ by setting the jdbc_stream_results config to true before run‑
ning a query. The easiest way do that is to pass it in the Properties object.

Properties props = new Properties();


props.setProperty(DuckDBDriver.JDBC_STREAM_RESULTS, String.valueOf(true));

Connection conn = DriverManager.getConnection("jdbc:duckdb:", props);

Appender The Appender is available in the DuckDB JDBC driver via the org.duckdb.DuckDBAppender class. The constructor of the
class requires the schema name and the table name it is applied to. The Appender is flushed when the close() method is called.

Example:

import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");


Statement stmt = conn.createStatement();
stmt.execute("CREATE TABLE tbl (x BIGINT, y FLOAT, s VARCHAR)");

// using try-with-resources to automatically close the appender at the end of the scope
try (var appender = conn.createAppender(DuckDBConnection.DEFAULT_SCHEMA, "tbl")) {
appender.beginRow();
appender.append(10);
appender.append(3.2);
appender.append("hello");
appender.endRow();
appender.beginRow();
appender.append(20);
appender.append(-8.1);
appender.append("world");
appender.endRow();
}
stmt.close();

Batch Writer The DuckDB JDBC driver offers batch write functionality. The batch writer supports prepared statements to mitigate the
overhead of query parsing.

Note. The preferred method for bulk inserts is to use the Appender due to its higher performance. However, when using the Ap‑
pender is not possbile, the batch writer is available as alternative.

Batch Writer with Prepared Statements


import org.duckdb.DuckDBConnection;

230
DuckDB Documentation

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");


PreparedStatement stmt = conn.prepareStatement("INSERT INTO test (x, y, z) VALUES (?, ?, ?);");

stmt.setObject(1, 1);
stmt.setObject(2, 2);
stmt.setObject(3, 3);
stmt.addBatch();

stmt.setObject(1, 4);
stmt.setObject(2, 5);
stmt.setObject(3, 6);
stmt.addBatch();

stmt.executeBatch();
stmt.close();

Batch Writer with Vanilla Statements The batch writer also supports vanilla SQL statements:

import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");


Statement stmt = conn.createStatement();

stmt.execute("CREATE TABLE test (x INT, y INT, z INT)");

stmt.addBatch("INSERT INTO test (x, y, z) VALUES (1, 2, 3);");


stmt.addBatch("INSERT INTO test (x, y, z) VALUES (4, 5, 6);");

stmt.executeBatch();
stmt.close();

Julia Package

The DuckDB Julia package provides a high‑performance front‑end for DuckDB. Much like SQLite, DuckDB runs in‑process within the Julia
client, and provides a DBInterface front‑end.

The package also supports multi‑threaded execution. It uses Julia threads/tasks for this purpose. If you wish to run queries in parallel, you
must launch Julia with multi‑threading support (by e.g., setting the JULIA_NUM_THREADS environment variable).

Installation

Install DuckDB as follows:

using Pkg
Pkg.add("DuckDB")

Alternatively, enter the package manager using the ] key, and issue the following command:

pkg> add DuckDB

Basics

using DuckDB

# create a new in-memory database


con = DBInterface.connect(DuckDB.DB, ":memory:")

231
DuckDB Documentation

# create a table
DBInterface.execute(con, "CREATE TABLE integers (i INTEGER)")

# insert data using a prepared statement


stmt = DBInterface.prepare(con, "INSERT INTO integers VALUES(?)")
DBInterface.execute(stmt, [42])

# query the database


results = DBInterface.execute(con, "SELECT 42 a")
print(results)

Scanning DataFrames

The DuckDB Julia package also provides support for querying Julia DataFrames. Note that the DataFrames are directly read by DuckDB ‑
they are not inserted or copied into the database itself.

If you wish to load data from a DataFrame into a DuckDB table you can run a CREATE TABLE ... AS or INSERT INTO query.

using DuckDB
using DataFrames

# create a new in-memory dabase


con = DBInterface.connect(DuckDB.DB)

# create a DataFrame
df = DataFrame(a = [1, 2, 3], b = [42, 84, 42])

# register it as a view in the database


DuckDB.register_data_frame(con, df, "my_df")

# run a SQL query over the DataFrame


results = DBInterface.execute(con, "SELECT * FROM my_df")
print(results)

Appender API

The DuckDB Julia package also supports the Appender api, which is much faster than using prepared statements or individual INSERT
INTO statements. Appends are made in row‑wise format. For every column, an append() call should be made, after which the row should
be finished by calling flush(). After all rows have been appended, close() should be used to finalize the appender and clean up the resulting
memory.

using DuckDB, DataFrames, Dates


db = DuckDB.DB()
# create a table
DBInterface.execute(db, "CREATE OR REPLACE
TABLE data(id INT PRIMARY KEY, value FLOAT,
timestamp TIMESTAMP, date DATE)")
# create data to insert
len = 100
df = DataFrames.DataFrame(id=collect(1:len),
value=rand(len),
timestamp=Dates.now() + Dates.Second.(1:len),
date=Dates.today() + Dates.Day.(1:len))
# append data by row
appender = DuckDB.Appender(db, "data")
for i in eachrow(df)

232
DuckDB Documentation

for j in i
DuckDB.append(appender, j)
end
DuckDB.end_row(appender)
end
# flush the appender after all rows
DuckDB.flush(appender)
DuckDB.close(appender)

Concurrency

Within a julia process, tasks are able to concurrently read and write to the database, as long as each task maintains its own connection to
the database. In the example below, a single task is spawned to periodically read the database and many tasks are spawned to write to the
database using both INSERT statements as well as the appender api.

using Dates, DataFrames, DuckDB


db = DuckDB.DB()
DBInterface.connect(db)
DBInterface.execute(db, "CREATE OR REPLACE TABLE data (date TIMESTAMP, id INT)")

function run_reader(db)
# create a DuckDB connection specifically for this task
conn = DBInterface.connect(db)
while true
println(DBInterface.execute(conn,
"SELECT id, count(date) as count, max(date) as max_date
FROM data group by id order by id") |> DataFrames.DataFrame)
Threads.sleep(1)
end
DBInterface.close(conn)
end
# spawn one reader task
Threads.@spawn run_reader(db)

function run_inserter(db, id)


# create a DuckDB connection specifically for this task
conn = DBInterface.connect(db)
for i in 1:1000
Threads.sleep(0.01)
DuckDB.execute(conn, "INSERT INTO data VALUES (current_timestamp, ?)"; id);
end
DBInterface.close(conn)
end
# spawn many insert tasks
for i in 1:100
Threads.@spawn run_inserter(db, 1)
end

function run_appender(db, id)


# create a DuckDB connection specifically for this task
appender = DuckDB.Appender(db, "data")
for i in 1:1000
Threads.sleep(0.01)
row = (Dates.now(Dates.UTC), id)
for j in row
DuckDB.append(appender, j);
end
DuckDB.end_row(appender);

233
DuckDB Documentation

DuckDB.flush(appender);
end
DuckDB.close(appender);
end
# spawn many appender tasks
for i in 1:100
Threads.@spawn run_appender(db, 2)
end

Original Julia Connector

Credits to kimmolinna for the original DuckDB Julia connector.

Node.js

Node.js API

This package provides a Node.js API for DuckDB. The API for this client is somewhat compliant to the SQLite Node.js client for easier tran‑
sition.

For TypeScript wrappers, see the duckdb‑async project.

Initializing

Load the package and create a database object:

const duckdb = require('duckdb');


const db = new duckdb.Database(':memory:'); // or a file name for a persistent DB

All options as described on Database configuration can be (optionally) supplied to the Database constructor as second argument. The
third argument can be optionally supplied to get feedback on the given options.

const db = new duckdb.Database(':memory:', {


"access_mode": "READ_WRITE",
"max_memory": "512MB",
"threads": "4"
}, (err) => {
if (err) {
console.error(err);
}
});

Running a Query

The following code snippet runs a simple query using the Database.all() method.

db.all('SELECT 42 AS fortytwo', function(err, res) {


if (err) {
console.warn(err);
return;
}
console.log(res[0].fortytwo)
});

234
DuckDB Documentation

Other available methods are each, where the callback is invoked for each row, run to execute a single statement without results and
exec, which can execute several SQL commands at once but also does not return results. All those commands can work with prepared
statements, taking the values for the parameters as additional arguments. For example like so:

db.all('SELECT ?::INTEGER AS fortytwo, ?::STRING AS hello', 42, 'Hello, World', function(err, res) {
if (err) {
console.warn(err);
return;
}
console.log(res[0].fortytwo)
console.log(res[0].hello)
});

Connections

A database can have multiple Connections, those are created using db.connect().

const con = db.connect();

You can create multiple connections, each with their own transaction context.

Connection objects also contain shorthands to directly call run(), all() and each() with parameters and callbacks, respectively,
for example:

con.all('SELECT 42 AS fortytwo', function(err, res) {


if (err) {
console.warn(err);
return;
}
console.log(res[0].fortytwo)
});

Prepared Statements

From connections, you can create prepared statements (and only that) using con.prepare():

const stmt = con.prepare('SELECT ?::INTEGER AS fortytwo');

To execute this statement, you can call for example all() on the stmt object:

stmt.all(42, function(err, res) {


if (err) {
console.warn(err);
} else {
console.log(res[0].fortytwo)
}
});

You can also execute the prepared statement multiple times. This is for example useful to fill a table with data:

con.run('CREATE TABLE a (i INTEGER)');


const stmt = con.prepare('INSERT INTO a VALUES (?)');
for (let i = 0; i < 10; i++) {
stmt.run(i);
}
stmt.finalize();
con.all('SELECT * FROM a', function(err, res) {
if (err) {
console.warn(err);
} else {

235
DuckDB Documentation

console.log(res)
}
});

prepare() can also take a callback which gets the prepared statement as an argument:

const stmt = con.prepare('SELECT ?::INTEGER AS fortytwo', function(err, stmt) {


stmt.all(42, function(err, res) {
if (err) {
console.warn(err);
} else {
console.log(res[0].fortytwo)
}
});
});

Inserting Data via Arrow

Apache Arrow can be used to insert data into DuckDB without making a copy:

const arrow = require('apache-arrow');


const db = new duckdb.Database(':memory:');

const jsonData = [
{"userId":1,"id":1,"title":"delectus aut autem","completed":false},
{"userId":1,"id":2,"title":"quis ut nam facilis et officia qui","completed":false}
];

// note; doesn't work on Windows yet


db.exec(` INSTALL arrow; LOAD arrow;`, (err) => {
if (err) {
console.warn(err);
return;
}

const arrowTable = arrow.tableFromJSON(jsonData);


db.register_buffer("jsonDataTable", [arrow.tableToIPC(arrowTable)], true, (err, res) => {
if (err) {
console.warn(err);
return;
}

// `SELECT * FROM jsonDataTable` would return the entries in `jsonData`


});
});

Loading Unsigned Extensions

To load unsigned extensions, instantiate the database as follows:

db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"});

236
DuckDB Documentation

Node.js API

Modules

Typedefs

duckdb

Summary: DuckDB is an embeddable SQL OLAP Database Management System

• duckdb

– ~Connection

* .run(sql, ...params, callback) ⇒ void


* .all(sql, ...params, callback) ⇒ void
* .arrowIPCAll(sql, ...params, callback) ⇒ void
* .arrowIPCStream(sql, ...params, callback) ⇒
* .each(sql, ...params, callback) ⇒ void
* .stream(sql, ...params)
* .register_udf(name, return_type, fun) ⇒ void
* .prepare(sql, ...params, callback) ⇒ Statement
* .exec(sql, ...params, callback) ⇒ void
* .register_udf_bulk(name, return_type, callback) ⇒ void
* .unregister_udf(name, return_type, callback) ⇒ void
* .register_buffer(name, array, force, callback) ⇒ void
* .unregister_buffer(name, callback) ⇒ void
* .close(callback) ⇒ void
– ~Statement

* .sql ⇒
* .get()
* .run(sql, ...params, callback) ⇒ void
* .all(sql, ...params, callback) ⇒ void
* .arrowIPCAll(sql, ...params, callback) ⇒ void
* .each(sql, ...params, callback) ⇒ void
* .finalize(sql, ...params, callback) ⇒ void
* .stream(sql, ...params)
* .columns() ⇒ Array.<ColumnInfo>
– ~QueryResult

* .nextChunk() ⇒
* .nextIpcBuffer() ⇒
* .asyncIterator()
– ~Database

* .close(callback) ⇒ void
* .close_internal(callback) ⇒ void
* .wait(callback) ⇒ void
* .serialize(callback) ⇒ void
* .parallelize(callback) ⇒ void
* .connect(path) ⇒ Connection
* .interrupt(callback) ⇒ void
* .prepare(sql) ⇒ Statement
* .run(sql, ...params, callback) ⇒ void
* .scanArrowIpc(sql, ...params, callback) ⇒ void
* .each(sql, ...params, callback) ⇒ void
* .all(sql, ...params, callback) ⇒ void

237
DuckDB Documentation

* .arrowIPCAll(sql, ...params, callback) ⇒ void


* .arrowIPCStream(sql, ...params, callback) ⇒ void
* .exec(sql, ...params, callback) ⇒ void
* .register_udf(name, return_type, fun) ⇒ this
* .register_buffer(name) ⇒ this
* .unregister_buffer(name) ⇒ this
* .unregister_udf(name) ⇒ this
* .registerReplacementScan(fun) ⇒ this
* .tokenize(text) ⇒ ScriptTokens
* .get()
– ~TokenType
– ~ERROR : number
– ~OPEN_READONLY : number
– ~OPEN_READWRITE : number
– ~OPEN_CREATE : number
– ~OPEN_FULLMUTEX : number
– ~OPEN_SHAREDCACHE : number
– ~OPEN_PRIVATECACHE : number

duckdb~Connection Kind: inner class of duckdb

• ~Connection

– .run(sql, ...params, callback) ⇒ void


– .all(sql, ...params, callback) ⇒ void
– .arrowIPCAll(sql, ...params, callback) ⇒ void
– .arrowIPCStream(sql, ...params, callback) ⇒
– .each(sql, ...params, callback) ⇒ void
– .stream(sql, ...params)
– .register_udf(name, return_type, fun) ⇒ void
– .prepare(sql, ...params, callback) ⇒ Statement
– .exec(sql, ...params, callback) ⇒ void
– .register_udf_bulk(name, return_type, callback) ⇒ void
– .unregister_udf(name, return_type, callback) ⇒ void
– .register_buffer(name, array, force, callback) ⇒ void
– .unregister_buffer(name, callback) ⇒ void
– .close(callback) ⇒ void

connection.run(sql, ...params, callback) ⇒ void Run a SQL statement and trigger a callback when done

Kind: instance method of Connection

Param Type

sql
...params *
callback

connection.all(sql, ...params, callback) ⇒ void Run a SQL query and triggers the callback once for all result rows

Kind: instance method of Connection

238
DuckDB Documentation

Param Type

sql
...params *
callback

connection.arrowIPCAll(sql, ...params, callback) ⇒ void Run a SQL query and serialize the result into the Apache Arrow IPC format
(requires arrow extension to be loaded)

Kind: instance method of Connection

Param Type

sql
...params *
callback

connection.arrowIPCStream(sql, ...params, callback) ⇒ Run a SQL query, returns a IpcResultStreamIterator that allows streaming the
result into the Apache Arrow IPC format (requires arrow extension to be loaded)

Kind: instance method of Connection


Returns: Promise

Param Type

sql
...params *
callback

connection.each(sql, ...params, callback) ⇒ void Runs a SQL query and triggers the callback for each result row

Kind: instance method of Connection

Param Type

sql
...params *
callback

connection.stream(sql, ...params) Kind: instance method of Connection

Param Type

sql
...params *

239
DuckDB Documentation

connection.register_udf(name, return_type, fun) ⇒ void Register a User Defined Function

Kind: instance method of Connection


Note: this follows the wasm udfs somewhat but is simpler because we can pass data much more cleanly

Param

name
return_type
fun

connection.prepare(sql, ...params, callback) ⇒ Statement Prepare a SQL query for execution

Kind: instance method of Connection

Param Type

sql
...params *
callback

connection.exec(sql, ...params, callback) ⇒ void Execute a SQL query

Kind: instance method of Connection

Param Type

sql
...params *
callback

connection.register_udf_bulk(name, return_type, callback) ⇒ void Register a User Defined Function

Kind: instance method of Connection

Param

name
return_type
callback

connection.unregister_udf(name, return_type, callback) ⇒ void Unregister a User Defined Function

Kind: instance method of Connection

Param

name
return_type

240
DuckDB Documentation

Param

callback

connection.register_buffer(name, array, force, callback) ⇒ void Register a Buffer to be scanned using the Apache Arrow IPC scanner
(requires arrow extension to be loaded)

Kind: instance method of Connection

Param

name
array
force
callback

connection.unregister_buffer(name, callback) ⇒ void Unregister the Buffer

Kind: instance method of Connection

Param

name
callback

connection.close(callback) ⇒ void Closes connection

Kind: instance method of Connection

Param

callback

duckdb~Statement Kind: inner class of duckdb

• ~Statement
– .sql ⇒
– .get()
– .run(sql, ...params, callback) ⇒ void
– .all(sql, ...params, callback) ⇒ void
– .arrowIPCAll(sql, ...params, callback) ⇒ void
– .each(sql, ...params, callback) ⇒ void
– .finalize(sql, ...params, callback) ⇒ void
– .stream(sql, ...params)
– .columns() ⇒ Array.<ColumnInfo>

statement.sql ⇒ Kind: instance property of Statement


Returns: sql contained in statement
Field:

241
DuckDB Documentation

statement.get() Not implemented

Kind: instance method of Statement

statement.run(sql, ...params, callback) ⇒ void Kind: instance method of Statement

Param Type

sql
...params *
callback

statement.all(sql, ...params, callback) ⇒ void Kind: instance method of Statement

Param Type

sql
...params *
callback

statement.arrowIPCAll(sql, ...params, callback) ⇒ void Kind: instance method of Statement

Param Type

sql
...params *
callback

statement.each(sql, ...params, callback) ⇒ void Kind: instance method of Statement

Param Type

sql
...params *
callback

statement.finalize(sql, ...params, callback) ⇒ void Kind: instance method of Statement

Param Type

sql
...params *
callback

242
DuckDB Documentation

statement.stream(sql, ...params) Kind: instance method of Statement

Param Type

sql
...params *

statement.columns() ⇒ Array.<ColumnInfo> Kind: instance method of Statement


Returns: Array.<ColumnInfo> ‑ ‑ Array of column names and types

duckdb~QueryResult Kind: inner class of duckdb

• ~QueryResult

– .nextChunk() ⇒
– .nextIpcBuffer() ⇒
– .asyncIterator()

queryResult.nextChunk() ⇒ Kind: instance method of QueryResult


Returns: data chunk

queryResult.nextIpcBuffer() ⇒ Function to fetch the next result blob of an Arrow IPC Stream in a zero‑copy way. (requires arrow exten‑
sion to be loaded)

Kind: instance method of QueryResult


Returns: data chunk

queryResult.asyncIterator() Kind: instance method of QueryResult

duckdb~Database Main database interface

Kind: inner property of duckdb

Param Description

path path to database file or :memory: for in‑memory database


access_mode access mode
config the configuration object
callback callback function

• ~Database

– .close(callback) ⇒ void
– .close_internal(callback) ⇒ void
– .wait(callback) ⇒ void
– .serialize(callback) ⇒ void
– .parallelize(callback) ⇒ void

243
DuckDB Documentation

– .connect(path) ⇒ Connection
– .interrupt(callback) ⇒ void
– .prepare(sql) ⇒ Statement
– .run(sql, ...params, callback) ⇒ void
– .scanArrowIpc(sql, ...params, callback) ⇒ void
– .each(sql, ...params, callback) ⇒ void
– .all(sql, ...params, callback) ⇒ void
– .arrowIPCAll(sql, ...params, callback) ⇒ void
– .arrowIPCStream(sql, ...params, callback) ⇒ void
– .exec(sql, ...params, callback) ⇒ void
– .register_udf(name, return_type, fun) ⇒ this
– .register_buffer(name) ⇒ this
– .unregister_buffer(name) ⇒ this
– .unregister_udf(name) ⇒ this
– .registerReplacementScan(fun) ⇒ this
– .tokenize(text) ⇒ ScriptTokens
– .get()

database.close(callback) ⇒ void Closes database instance

Kind: instance method of Database

Param

callback

database.close_internal(callback) ⇒ void Internal method. Do not use, call Connection#close instead

Kind: instance method of Database

Param

callback

database.wait(callback) ⇒ void Triggers callback when all scheduled database tasks have completed.

Kind: instance method of Database

Param

callback

database.serialize(callback) ⇒ void Currently a no‑op. Provided for SQLite compatibility

Kind: instance method of Database

Param

callback

244
DuckDB Documentation

database.parallelize(callback) ⇒ void Currently a no‑op. Provided for SQLite compatibility

Kind: instance method of Database

Param

callback

database.connect(path) ⇒ Connection Create a new database connection

Kind: instance method of Database

Param Description

path the database to connect to, either a file path, or :memory:

database.interrupt(callback) ⇒ void Supposedly interrupt queries, but currently does not do anything.

Kind: instance method of Database

Param

callback

database.prepare(sql) ⇒ Statement Prepare a SQL query for execution

Kind: instance method of Database

Param

sql

database.run(sql, ...params, callback) ⇒ void Convenience method for Connection#run using a built‑in default connection

Kind: instance method of Database

Param Type

sql
...params *
callback

database.scanArrowIpc(sql, ...params, callback) ⇒ void Convenience method for Connection#scanArrowIpc using a built‑in default
connection

Kind: instance method of Database

Param Type

sql
...params *

245
DuckDB Documentation

Param Type

callback

database.each(sql, ...params, callback) ⇒ void Kind: instance method of Database

Param Type

sql
...params *
callback

database.all(sql, ...params, callback) ⇒ void Convenience method for Connection#apply using a built‑in default connection

Kind: instance method of Database

Param Type

sql
...params *
callback

database.arrowIPCAll(sql, ...params, callback) ⇒ void Convenience method for Connection#arrowIPCAll using a built‑in default con‑
nection

Kind: instance method of Database

Param Type

sql
...params *
callback

database.arrowIPCStream(sql, ...params, callback) ⇒ void Convenience method for Connection#arrowIPCStream using a built‑in de‑
fault connection

Kind: instance method of Database

Param Type

sql
...params *
callback

database.exec(sql, ...params, callback) ⇒ void Kind: instance method of Database

246
DuckDB Documentation

Param Type

sql
...params *
callback

database.register_udf(name, return_type, fun) ⇒ this Register a User Defined Function

Convenience method for Connection#register_udf

Kind: instance method of Database

Param

name
return_type
fun

database.register_buffer(name) ⇒ this Register a buffer containing serialized data to be scanned from DuckDB.

Convenience method for Connection#unregister_buffer

Kind: instance method of Database

Param

name

database.unregister_buffer(name) ⇒ this Unregister a Buffer

Convenience method for Connection#unregister_buffer

Kind: instance method of Database

Param

name

database.unregister_udf(name) ⇒ this Unregister a UDF

Convenience method for Connection#unregister_udf

Kind: instance method of Database

Param

name

database.registerReplacementScan(fun) ⇒ this Register a table replace scan function

Kind: instance method of Database

247
DuckDB Documentation

Param Description

fun Replacement scan function

database.tokenize(text) ⇒ ScriptTokens Return positions and types of tokens in given text

Kind: instance method of Database

Param

text

database.get() Not implemented

Kind: instance method of Database

duckdb~TokenType Types of tokens return by tokenize.

Kind: inner property of duckdb

duckdb~ERROR : number Check that errno attribute equals this to check for a duckdb error

Kind: inner constant of duckdb

duckdb~OPEN_READONLY : number Open database in readonly mode

Kind: inner constant of duckdb

duckdb~OPEN_READWRITE : number Currently ignored

Kind: inner constant of duckdb

duckdb~OPEN_CREATE : number Currently ignored

Kind: inner constant of duckdb

duckdb~OPEN_FULLMUTEX : number Currently ignored

Kind: inner constant of duckdb

duckdb~OPEN_SHAREDCACHE : number Currently ignored

Kind: inner constant of duckdb

248
DuckDB Documentation

duckdb~OPEN_PRIVATECACHE : number Currently ignored

Kind: inner constant of duckdb

ColumnInfo : object

Kind: global typedef


Properties

Name Type Description

name string Column name


type TypeInfo Column type

TypeInfo : object

Kind: global typedef


Properties

Name Type Description

id string Type ID
[alias] string SQL type alias
sql_type string SQL type name

DuckDbError : object

Kind: global typedef


Properties

Name Type Description

errno number ‑1 for DuckDB errors


message string Error message
code string 'DUCKDB_NODEJS_ERROR' for DuckDB
errors
errorType string DuckDB error type code (eg, HTTP, IO,
Catalog)

HTTPError : object

Kind: global typedef


Extends: DuckDbError
Properties

249
DuckDB Documentation

Name Type Description

statusCode number HTTP response status code


reason string HTTP response reason
response string HTTP response body
headers object HTTP headers

Python

Python API

Installation

The DuckDB Python API can be installed using pip: pip install duckdb. Please see the installation page for details. It is also possible
to install DuckDB using conda: conda install python-duckdb -c conda-forge.

Python version: DuckDB requires Python 3.7 or newer.

Basic API Usage

The most straight‑forward manner of running SQL queries using DuckDB is using the duckdb.sql command.

import duckdb
duckdb.sql("SELECT 42").show()

This will run queries using an in‑memory database that is stored globally inside the Python module. The result of the query is returned
as a Relation. A relation is a symbolic representation of the query. The query is not executed until the result is fetched or requested to be
printed to the screen.

Relations can be referenced in subsequent queries by storing them inside variables, and using them as tables. This way queries can be
constructed incrementally.

import duckdb
r1 = duckdb.sql("SELECT 42 AS i")
duckdb.sql("SELECT i * 2 AS k FROM r1").show()

Data Input

DuckDB can ingest data from a wide variety of formats – both on‑disk and in‑memory. See the data ingestion page for more information.

import duckdb
duckdb.read_csv("example.csv") # read a CSV file into a Relation
duckdb.read_parquet("example.parquet") # read a Parquet file into a Relation
duckdb.read_json("example.json") # read a JSON file into a Relation

duckdb.sql("SELECT * FROM 'example.csv'") # directly query a CSV file


duckdb.sql("SELECT * FROM 'example.parquet'") # directly query a Parquet file
duckdb.sql("SELECT * FROM 'example.json'") # directly query a JSON file

DataFrames DuckDB can also directly query Pandas DataFrames, Polars DataFrames and Arrow tables.

250
DuckDB Documentation

import duckdb

# directly query a Pandas DataFrame


import pandas as pd
pandas_df = pd.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM pandas_df")

# directly query a Polars DataFrame


import polars as pl
polars_df = pl.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM polars_df")

# directly query a pyarrow table


import pyarrow as pa
arrow_table = pa.Table.from_pydict({"a": [42]})
duckdb.sql("SELECT * FROM arrow_table")

Result Conversion

DuckDB supports converting query results efficiently to a variety of formats. See the result conversion page for more information.

import duckdb
duckdb.sql("SELECT 42").fetchall() # Python objects
duckdb.sql("SELECT 42").df() # Pandas DataFrame
duckdb.sql("SELECT 42").pl() # Polars DataFrame
duckdb.sql("SELECT 42").arrow() # Arrow Table
duckdb.sql("SELECT 42").fetchnumpy() # NumPy Arrays

Writing Data to Disk

DuckDB supports writing Relation objects directly to disk in a variety of formats. The COPY statement can be used to write data to disk
using SQL as an alternative.

import duckdb
duckdb.sql("SELECT 42").write_parquet("out.parquet") # Write to a Parquet file
duckdb.sql("SELECT 42").write_csv("out.csv") # Write to a CSV file
duckdb.sql("COPY (SELECT 42) TO 'out.parquet'") # Copy to a Parquet file

Using an In‑Memory Database

When using DuckDB through duckdb.sql(), it operates on an in‑memory database, i.e., no tables are persisted on disk. Invoking the
duckdb.connect() method without arguments returns a connection, which also uses an in‑memory database:

import duckdb

con = duckdb.connect()
con.sql("SELECT 42 AS x").show()

Persistent Storage

The duckdb.connect( dbname) creates a connection to a persistent database. Any data written to that connection will be persisted,
and can be reloaded by re‑connecting to the same file, both from Python and from other DuckDB clients.

import duckdb

# create a connection to a file called 'file.db'

251
DuckDB Documentation

con = duckdb.connect("file.db")
# create a table and load data into it
con.sql("CREATE TABLE test (i INTEGER)")
con.sql("INSERT INTO test VALUES (42)")
# query the table
con.table("test").show()
# explicitly close the connection
con.close()
# Note: connections also closed implicitly when they go out of scope

You can also use a context manager to ensure that the connection is closed:

import duckdb

with duckdb.connect("file.db") as con:


con.sql("CREATE TABLE test (i INTEGER)")
con.sql("INSERT INTO test VALUES (42)")
con.table("test").show()
# the context manager closes the connection automatically

Connection Object and Module

The connection object and the duckdb module can be used interchangeably – they support the same methods. The only difference is that
when using the duckdb module a global in‑memory database is used.

Note that if you are developing a package designed for others to use, and use DuckDB in the package, it is recommend that you create con‑
nection objects instead of using the methods on the duckdb module. That is because the duckdb module uses a shared global database
– which can cause hard to debug issues if used from within multiple different packages.

Using Connections in Parallel Python Programs

The DuckDBPyConnection object is not thread‑safe. If you would like to write to the same database from multiple threads, create a
cursor for each thread with the DuckDBPyConnection.cursor() method.

Loading and Installing Extensions

DuckDB's Python API provides functions for installing and loading extensions, which perform the equivalent operations to running the
INSTALL and LOAD SQL commands, respectively. An example that installs and loads the spatial extension looks like follows:

import duckdb

con = duckdb.connect()
con.install_extension("spatial")
con.load_extension("spatial")

Note. To load unsigned extensions, add the config = {"allow_unsigned_extensions": "true"} argument to the
duckdb.connect() method.

Data Ingestion

CSV Files

CSV files can be read using the read_csv function, called either from within Python or directly from within SQL. By default, the read_
csv function attempts to auto‑detect the CSV settings by sampling from the provided file.

252
DuckDB Documentation

import duckdb
# read from a file using fully auto-detected settings
duckdb.read_csv("example.csv")
# read multiple CSV files from a folder
duckdb.read_csv("folder/*.csv")
# specify options on how the CSV is formatted internally
duckdb.read_csv("example.csv", header = False, sep = ",")
# override types of the first two columns
duckdb.read_csv("example.csv", dtype = ["int", "varchar"])
# use the (experimental) parallel CSV reader
duckdb.read_csv("example.csv", parallel = True)
# directly read a CSV file from within SQL
duckdb.sql("SELECT * FROM 'example.csv'")
# call read_csv from within SQL
duckdb.sql("SELECT * FROM read_csv('example.csv')")

See the CSV Import page for more information.

Parquet Files

Parquet files can be read using the read_parquet function, called either from within Python or directly from within SQL.
import duckdb
# read from a single Parquet file
duckdb.read_parquet("example.parquet")
# read multiple Parquet files from a folder
duckdb.read_parquet("folder/*.parquet")
# read a Parquet over https
duckdb.read_parquet("https://some.url/some_file.parquet")
# read a list of Parquet files
duckdb.read_parquet(["file1.parquet", "file2.parquet", "file3.parquet"])
# directly read a Parquet file from within SQL
duckdb.sql("SELECT * FROM 'example.parquet'")
# call read_parquet from within SQL
duckdb.sql("SELECT * FROM read_parquet('example.parquet')")

See the Parquet Loading page for more information.

JSON Files

JSON files can be read using the read_json function, called either from within Python or directly from within SQL. By default, the read_
json function will automatically detect if a file contains newline‑delimited JSON or regular JSON, and will detect the schema of the objects
stored within the JSON file.
import duckdb
# read from a single JSON file
duckdb.read_json("example.json")
# read multiple JSON files from a folder
duckdb.read_json("folder/*.json")
# directly read a JSON file from within SQL
duckdb.sql("SELECT * FROM 'example.json'")
# call read_json from within SQL
duckdb.sql("SELECT * FROM read_json_auto('example.json')")

DataFrames & Arrow Tables

DuckDB is automatically able to query a Pandas DataFrame, Polars DataFrame, or Arrow object that is stored in a Python variable by name.
Accessing these is made possible by replacement scans.

253
DuckDB Documentation

DuckDB supports querying multiple types of Apache Arrow objects including tables, datasets, RecordBatchReaders, and scanners. See the
Python guides for more examples.

import duckdb
import pandas as pd
test_df = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]})
duckdb.sql("SELECT * FROM test_df").fetchall()
# [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]

DuckDB also supports ”registering” a DataFrame or Arrow object as a virtual table, comparable to a SQL VIEW. This is useful when querying
a DataFrame/Arrow object that is stored in another way (as a class variable, or a value in a dictionary). Below is a Pandas example:

If your Pandas DataFrame is stored in another location, here is an example of manually registering it:

import duckdb
import pandas as pd
my_dictionary = {}
my_dictionary["test_df"] = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three",
"four"]})
duckdb.register("test_df_view", my_dictionary["test_df"])
duckdb.sql("SELECT * FROM test_df_view").fetchall()
# [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]

You can also create a persistent table in DuckDB from the contents of the DataFrame (or the view):

# create a new table from the contents of a DataFrame


con.execute("CREATE TABLE test_df_table AS SELECT * FROM test_df")
# insert into an existing table from the contents of a DataFrame
con.execute("INSERT INTO test_df_table SELECT * FROM test_df")

Pandas DataFrames – object Columns pandas.DataFrame columns of an object dtype require some special care, since this
stores values of arbitrary type. To convert these columns to DuckDB, we first go through an analyze phase before converting the values. In
this analyze phase a sample of all the rows of the column are analyzed to determine the target type. This sample size is by default set to
1000. If the type picked during the analyze step is incorrect, this will result in a ”Failed to cast value:” error, in which case you will need to
increase the sample size. The sample size can be changed by setting the pandas_analyze_sample config option.

# example setting the sample size to 100k


duckdb.default_connection.execute("SET GLOBAL pandas_analyze_sample = 100_000")

Object Conversion

This is a mapping of Python object types to DuckDB Logical Types:

• None ‑> NULL


• bool ‑> BOOLEAN
• datetime.timedelta ‑> INTERVAL
• str ‑> VARCHAR
• bytearray ‑> BLOB
• memoryview ‑> BLOB
• decimal.Decimal ‑> DECIMAL / DOUBLE
• uuid.UUID ‑> UUID

The rest of the conversion rules are as follows.

int Since integers can be of arbitrary size in Python, there is not a one‑to‑one conversion possible for ints. Intead we perform these casts
in order until one succeeds:

• BIGINT

254
DuckDB Documentation

• INTEGER
• UBIGINT
• UINTEGER
• DOUBLE

When using the DuckDB Value class, it's possible to set a target type, which will influence the conversion.

float These casts are tried in order until one succeeds:

• DOUBLE
• FLOAT

datetime.datetime For datetime we will check pandas.isnull if it's available and return NULL if it returns true. We check
against datetime.datetime.min and datetime.datetime.max to convert to -inf and +inf respectively.

If the datetime has tzinfo, we will use TIMESTAMPTZ, otherwise it becomes TIMESTAMP.

datetime.time If the time has tzinfo, we will use TIMETZ, otherwise it becomes TIME.

datetime.date date converts to the DATE type. We check against datetime.date.min and datetime.date.max to convert
to -inf and +inf respectively.

bytes bytes converts to BLOB by default, when it's used to construct a Value object of type BITSTRING, it maps to BITSTRING
instead.

list list becomes a LIST type of the ”most permissive” type of its children, for example:

my_list_value = [
12345,
"test"
]

Will become VARCHAR[] because 12345 can convert to VARCHAR but test can not convert to INTEGER.

[12345, test]

dict The dict object can convert to either STRUCT(...) or MAP(..., ...) depending on its structure. If the dict has a structure
similar to:

my_map_dict = {
"key": [
1, 2, 3
],
"value": [
"one", "two", "three"
]
}

Then we'll convert it to a MAP of key‑value pairs of the two lists zipped together. The example above becomes a MAP(INTEGER, VAR-
CHAR):

{1=one, 2=two, 3=three}

255
DuckDB Documentation

Note. The names of the fields matter and the two lists need to have the same size.

Otherwise we'll try to convert it to a STRUCT.

my_struct_dict = {
1: "one",
"2": 2,
"three": [1, 2, 3],
False: True
}

Becomes:

{'1': one, '2': 2, 'three': [1, 2, 3], 'False': true}

Note. Every key of the dictionary is converted to string.

tuple tuple converts to LIST by default, when it's used to construct a Value object of type STRUCT it will convert to STRUCT in‑
stead.

numpy.ndarray and numpy.datetime64 ndarray and datetime64 are converted by calling tolist() and converting the
result of that.

Result Conversion

DuckDB's Python client provides multiple additional methods that can be used to efficiently retrieve data.

NumPy

• fetchnumpy() fetches the data as a dictionary of NumPy arrays

Pandas

• df() fetches the data as a Pandas DataFrame


• fetchdf() is an alias of df()
• fetch_df() is an alias of df()
• fetch_df_chunk(vector_multiple) fetches a portion of the results into a DataFrame. The number of rows returned in each
chunk is the vector size (2048 by default) * vector_multiple (1 by default).

Apache Arrow

• arrow() fetches the data as an Arrow table


• fetch_arrow_table() is an alias of arrow()
• fetch_record_batch(chunk_size) returns an Arrow record batch reader with chunk_size rows per batch

Polars

• pl() fetches the data as a Polars DataFrame

Below are some examples using this functionality. See the Python guides for more examples.

256
DuckDB Documentation

# fetch as Pandas DataFrame


df = con.execute("SELECT * FROM items").fetchdf()
print(df)
# item value count
# 0 jeans 20.0 1
# 1 hammer 42.2 2
# 2 laptop 2000.0 1
# 3 chainsaw 500.0 10
# 4 iphone 300.0 2

# fetch as dictionary of numpy arrays


arr = con.execute("SELECT * FROM items").fetchnumpy()
print(arr)
# {'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'],
# mask=[False, False, False, False, False],
# fill_value='?',
# dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0],
# mask=[False, False, False, False, False],
# fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2],
# mask=[False, False, False, False, False],
# fill_value=999999,
# dtype=int32)}

# fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing
tbl = con.execute("SELECT * FROM items").fetch_arrow_table()
print(tbl.to_pandas())
# item value count
# 0 jeans 20.00 1
# 1 hammer 42.20 2
# 2 laptop 2000.00 1
# 3 chainsaw 500.00 10
# 4 iphone 300.00 2

Python DB API

The standard DuckDB Python API provides a SQL interface compliant with the DB‑API 2.0 specification described by PEP 249 similar to the
SQLite Python API.

Connection

To use the module, you must first create a DuckDBPyConnection object that represents the database. The connection object takes as
a parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be .db,
.duckdb, or anything else). The special value :memory: (the default) can be used to create an in‑memory database. Note that for an
in‑memory database no data is persisted to disk (i.e., all data is lost when you exit the Python process). If you would like to connect to an
existing database in read‑only mode, you can set the read_only flag to True. Read‑only mode is required if multiple Python processes
want to access the same database file at the same time.

By default we create an in‑memory‑database that lives inside the duckdb module. Every method of DuckDBPyConnection is also
available on the duckdb module, this connection is what's used by these methods. You can also get a reference to this connection by
providing the special value :default: to connect.

import duckdb

duckdb.execute("CREATE TABLE tbl AS SELECT 42 a")


con = duckdb.connect(":default:")
con.sql("SELECT * FROM tbl")

257
DuckDB Documentation

┌───────┐
│ a │
│ int32 │
├───────┤
│ 42 │
└───────┘

import duckdb
# to start an in-memory database
con = duckdb.connect(database = ":memory:")
# to use a database file (not shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = False)
# to use a database file (shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = True)
# to explicitly get the default connection
con = duckdb.connect(database = ":default:")

If you want to create a second connection to an existing database, you can use the cursor() method. This might be useful for example
to allow parallel threads running queries independently. A single connection is thread‑safe but is locked for the duration of the queries,
effectively serializing database access in this case.

Connections are closed implicitly when they go out of scope or if they are explicitly closed using close(). Once the last connection to a
database instance is closed, the database instance is closed as well.

Querying

SQL queries can be sent to DuckDB using the execute() method of connections. Once a query has been executed, results can be re‑
trieved using the fetchone and fetchall methods on the connection. fetchall will retrieve all results and complete the transaction.
fetchone will retrieve a single row of results each time that it is invoked until no more results are available. The transaction will only close
once fetchone is called and there are no more results remaining (the return value will be None). As an example, in the case of a query
only returning a single row, fetchone should be called once to retrieve the results and a second time to close the transaction. Below are
some short examples:

# create a table
con.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")

# retrieve the items again


con.execute("SELECT * FROM items")
print(con.fetchall())
# [('jeans', Decimal('20.00'), 1), ('hammer', Decimal('42.20'), 2)]

# retrieve the items one at a time


con.execute("SELECT * FROM items")
print(con.fetchone())
# ('jeans', Decimal('20.00'), 1)
print(con.fetchone())
# ('hammer', Decimal('42.20'), 2)
print(con.fetchone()) # This closes the transaction. Any subsequent calls to .fetchone will return None
# None

The description property of the connection object contains the column names as per the standard.

Prepared Statements DuckDB also supports prepared statements in the API with the execute and executemany methods. The val‑
ues may be passed as an additional parameter after a query that contains ? or $1 (dollar symbol and a number) placeholders. Using the ?
notation adds the values in the same sequence as passed within the Python parameter. Using the $ notation allows for values to be reused
within the SQL statement based on the number and index of the value found within the Python parameter.

258
DuckDB Documentation

Here are some examples:

# insert a row using prepared statements


con.execute("INSERT INTO items VALUES (?, ?, ?)", ["laptop", 2000, 1])

# insert several rows using prepared statements


con.executemany("INSERT INTO items VALUES (?, ?, ?)", [["chainsaw", 500, 10], ["iphone", 300, 2]] )

# query the database using a prepared statement


con.execute("SELECT item FROM items WHERE value > ?", [400])
print(con.fetchall())
# [('laptop',), ('chainsaw',)]

# query using $ notation for prepared statement and reused values


con.execute("SELECT $1, $1, $2", ["duck", "goose"])
print(con.fetchall())
# [('duck', 'duck', 'goose')]

Note. Warning Do not use executemany to insert large amounts of data into DuckDB. See the data ingestion page for better
options.

Named Parameters

Besides the standard unnamed parameters, like $1, $2 etc, it's also possible to supply named parameters, like $my_parameter. When
using named parameters, you have to provide a dictionary mapping of str to value in the parameters argument
An example use:

import duckdb

res = duckdb.execute("""
SELECT
$my_param,
$other_param,
$also_param
""",
{
"my_param": 5,
"other_param": "DuckDB",
"also_param": [42]
}
).fetchall()
print(res)
# [(5, 'DuckDB', [42])]

Relational API

The Relational API is an alternative API that can be used to incrementally construct queries. The API is centered around DuckDBPyRela-
tion nodes. The relations can be seen as symbolic representations of SQL queries. They do not hold any data ‑ and nothing is executed ‑
until a method that triggers execution is called.

Constructing Relations

Relations can be created from SQL queries using the duckdb.sql method. Alternatively, they can be created from the various data inges‑
tion methods (read_parquet, read_csv, read_json).

For example, here we create a relation from a SQL query:

259
DuckDB Documentation

import duckdb
rel = duckdb.sql("SELECT * FROM range(10_000_000_000) tbl(id)")
rel.show()

┌────────────────────────┐
│ id │
│ int64 │
├────────────────────────┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
│ · │
│ · │
│ · │
│ 9990 │
│ 9991 │
│ 9992 │
│ 9993 │
│ 9994 │
│ 9995 │
│ 9996 │
│ 9997 │
│ 9998 │
│ 9999 │
├────────────────────────┤
│ ? rows │
│ (>9999 rows, 20 shown) │
└────────────────────────┘

Note how we are constructing a relation that computes an immense amount of data (10B rows, or 74GB of data). The relation is constructed
instantly ‑ and we can even print the relation instantly.

When printing a relation using show or displaying it in the terminal, the first 10K rows are fetched. If there are more than 10K rows, the
output window will show >9999 rows (as the amount of rows in the relation is unknown).

Data Ingestion

Outside of SQL queries, the following methods are provided to construct relation objects from external data.

• from_arrow
• from_df
• read_csv
• read_json
• read_parquet

SQL Queries

Relation objects can be queried through SQL through so‑called replacement scans. If you have a relation object stored in a variable, you
can refer to that variable as if it was a SQL table (in the FROM clause). This allows you to incrementally build queries using relation objects.

260
DuckDB Documentation

import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
duckdb.sql("SELECT sum(id) FROM rel").show()

┌──────────────┐
│ sum(id) │
│ int128 │
├──────────────┤
│ 499999500000 │
└──────────────┘

Operations

There are a number of operations that can be performed on relations. These are all short‑hand for running the SQL queries ‑ and will return
relations again themselves.

aggregate(expr, groups = {}) Apply an (optionally grouped) aggregate over the relation. The system will automatically group
by any columns that are not aggregates.

import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.aggregate("id % 2 AS g, sum(id), min(id), max(id)")

┌───────┬──────────────┬─────────┬─────────┐
│ g │ sum(id) │ min(id) │ max(id) │
│ int64 │ int128 │ int64 │ int64 │
├───────┼──────────────┼─────────┼─────────┤
│ 0 │ 249999500000 │ 0 │ 999998 │
│ 1 │ 250000000000 │ 1 │ 999999 │
└───────┴──────────────┴─────────┴─────────┘

except_(rel) Select all rows in the first relation, that do not occur in the second relation. The relations must have the same number
of columns.

import duckdb
r1 = duckdb.sql("SELECT * FROM range(10) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r1.except_(r2).show()

┌───────┐
│ id │
│ int64 │
├───────┤
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───────┘

filter(condition) Apply the given condition to the relation, filtering any rows that do not satisfy the condition.

import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.filter("id > 5").limit(3).show()

261
DuckDB Documentation

┌───────┐
│ id │
│ int64 │
├───────┤
│ 6 │
│ 7 │
│ 8 │
└───────┘

intersect(rel) Select the intersection of two relations ‑ returning all rows that occur in both relations. The relations must have the
same number of columns.

import duckdb
r1 = duckdb.sql("SELECT * FROM range(10) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r1.intersect(r2).show()

┌───────┐
│ id │
│ int64 │
├───────┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
└───────┘

join(rel, condition, type = "inner") Combine two relations, joining them based on the provided condition.

import duckdb
r1 = duckdb.sql("SELECT * FROM range(5) tbl(id)").set_alias("r1")
r2 = duckdb.sql("SELECT * FROM range(10, 15) tbl(id)").set_alias("r2")
r1.join(r2, "r1.id + 10 = r2.id").show()

┌───────┬───────┐
│ id │ id │
│ int64 │ int64 │
├───────┼───────┤
│ 0 │ 10 │
│ 1 │ 11 │
│ 2 │ 12 │
│ 3 │ 13 │
│ 4 │ 14 │
└───────┴───────┘

limit(n, offset = 0) Select the first n rows, optionally offset by offset.

import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.limit(3).show()

┌───────┐
│ id │
│ int64 │
├───────┤
│ 0 │
│ 1 │

262
DuckDB Documentation

│ 2 │
└───────┘

order(expr) Sort the relation by the given set of expressions.

import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.order("id DESC").limit(3).show()

┌────────┐
│ id │
│ int64 │
├────────┤
│ 999999 │
│ 999998 │
│ 999997 │
└────────┘

project(expr) Apply the given expression to each row in the relation.

import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.project("id + 10 AS id_plus_ten").limit(3).show()

┌─────────────┐
│ id_plus_ten │
│ int64 │
├─────────────┤
│ 10 │
│ 11 │
│ 12 │
└─────────────┘

union(rel) Combine two relations, returning all rows in r1 followed by all rows in r2. The relations must have the same number of
columns.

import duckdb
r1 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(10, 15) tbl(id)")
r1.union(r2).show()

┌───────┐
│ id │
│ int64 │
├───────┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 10 │
│ 11 │
│ 12 │
│ 13 │
│ 14 │
└───────┘

263
DuckDB Documentation

Result Output

The result of relations can be converted to various types of Python structures, see the result conversion page for more information.

The result of relations can also be directly written to files using the below methods.

• write_csv
• write_parquet

Python Function API

You can create a DuckDB user‑defined function (UDF) out of a Python function so it can be used in SQL queries. Similarly to regular functions,
they need to have a name, a return type and parameter types.

Here is an example using a Python function that calls a third‑party library.

import duckdb
from duckdb.typing import *
from faker import Faker

def random_name():
fake = Faker()
return fake.name()

duckdb.create_function("random_name", random_name, [], VARCHAR)


res = duckdb.sql("SELECT random_name()").fetchall()
print(res)
# [('Gerald Ashley',)]

Creating Functions

To register a Python UDF, simply use the create_function method from a DuckDB connection. Here is the syntax:

import duckdb
con = duckdb.connect()
con.create_function(name, function, argument_type_list, return_type, type, null_handling)

The create_function method requires the following parameters:

1. name: A string representing the unique name of the UDF within the connection catalog.
2. function: The Python function you wish to register as a UDF.
3. return_type: Scalar functions return one element per row. This parameter specifies the return type of the function.
4. parameters: Scalar functions can operate on one or more columns. This parameter takes a list of column types used as input.
5. type (Optional): DuckDB supports both built‑in Python types and PyArrow Tables. By default, built‑in types are assumed, but you
can specify type = 'arrow' to use PyArrow Tables.
6. null_handling (Optional): By default, null values are automatically handled as Null‑In Null‑Out. Users can specify a desired behavior
for null values by setting null_handling = 'special'.
7. exception_handling (Optional): By default, when an exception is thrown from the Python function, it will be re‑thrown in Python.
Users can disable this behavior, and instead return null, by set this parameter to 'return_null'
8. side_effects (Optional): By default, functions are expected to produce the same result for the same input. If the result of a function
is impacted by any type of randomness, side_effects must be set to True.

To unregister a UDF, you can call the remove_function method with the UDF name:

con.remove_function(name)

264
DuckDB Documentation

Type Annotation

When the function has type annotation it's often possible to leave out all of the optional parameters. Using DuckDBPyType we can im‑
plicitly convert many known types to DuckDBs type system. For example:

import duckdb

def my_function(x: int) -> str:


return x

duckdb.create_function("my_func", my_function)
duckdb.sql("SELECT my_func(42)")

┌─────────────┐
│ my_func(42) │
│ varchar │
├─────────────┤
│ 42 │
└─────────────┘

If only the parameter list types can be inferred, you'll need to pass in None as argument_type_list.

Null Handling

By default when functions receive a NULL value, this instantly returns NULL, as part of the default NULL‑handling. When this is not desired,
you need to explicitly set this parameter to "special".

import duckdb
from duckdb.typing import *

def dont_intercept_null(x):
return 5

duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT)


res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall()
print(res)
# [(None,)]

duckdb.remove_function("dont_intercept")
duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT, null_handling="special")
res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall()
print(res)
# [(5,)]

Exception Handling

By default, when an exception is thrown from the Python function, we'll forward (re‑throw) the exception. If you want to disable this
behavior, and instead return null, you'll need to set this parameter to "return_null"

import duckdb
from duckdb.typing import *

def will_throw():
raise ValueError("ERROR")

duckdb.create_function("throws", will_throw, [], BIGINT)


try:
res = duckdb.sql("SELECT throws()").fetchall()

265
DuckDB Documentation

except duckdb.InvalidInputException as e:
print(e)

duckdb.create_function("doesnt_throw", will_throw, [], BIGINT, exception_handling="return_null")


res = duckdb.sql("SELECT doesnt_throw()").fetchall()
print(res)
# [(None,)]

Side Effects

By default DuckDB will assume the created function is a pure function, meaning it will produce the same output when given the same
input. If your function does not follow that rule, for example when your function makes use of randomness, then you will need to mark this
function as having side_effects.

For example, this function will produce a new count for every invocation

def count() -> int:


old = count.counter;
count.counter += 1
return old

count.counter = 0

If we create this function without marking it as having side effects, the result will be the following:

con = duckdb.connect()
con.create_function("my_counter", count, side_effects = False)
res = con.sql("SELECT my_counter() FROM range(10)").fetchall()
print(res)
# [(0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,)]

Which is obviously not the desired result, when we add side_effects = True, the result is as we would expect:

con.remove_function("my_counter")
count.counter = 0
con.create_function("my_counter", count, side_effects = True)
res = con.sql("SELECT my_counter() FROM range(10)").fetchall()
print(res)
# [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]

Python Function Types

Currently, two function types are supported, native (default) and arrow.

Arrow If the function is expected to receive arrow arrays, set the type parameter to 'arrow'.

This will let the system know to provide arrow arrays of up to STANDARD_VECTOR_SIZE tuples to the function, and also expect an array
of the same amount of tuples to be returned from the function.

Native When the function type is set to native the function will be provided with a single tuple at a time, and expect only a single value
to be returned. This can be useful to interact with Python libraries that don't operate on Arrow, such as faker:

import duckdb

from duckdb.typing import *


from faker import Faker

266
DuckDB Documentation

def random_date():
fake = Faker()
return fake.date_between()

duckdb.create_function("random_date", random_date, [], DATE, type="native")


res = duckdb.sql("SELECT random_date()").fetchall()
print(res)
# [(datetime.date(2019, 5, 15),)]

Types API

The DuckDBPyType class represents a type instance of our data types.

Converting from Other Types

To make the API as easy to use as possible, we have added implicit conversions from existing type objects to a DuckDBPyType instance.
This means that wherever a DuckDBPyType object is expected, it is also possible to provide any of the options listed below.

Python Built‑ins The table below shows the mapping of Python Built‑in types to DuckDB type.

Built‑in types DuckDB type

bool BOOLEAN
bytearray BLOB
bytes BLOB
float DOUBLE
int BIGINT
str VARCHAR

Numpy DTypes The table below shows the mapping of Numpy DType to DuckDB type.

Type DuckDB type

bool BOOLEAN
float32 FLOAT
float64 DOUBLE
int16 SMALLINT
int32 INTEGER
int64 BIGINT
int8 TINYINT
uint16 USMALLINT
uint32 UINTEGER
uint64 UBIGINT
uint8 UTINYINT

Nested Types

267
DuckDB Documentation

list[child_type] list type objects map to a LIST type of the child type. Which can also be arbitrarily nested.

import duckdb
from typing import Union

duckdb.typing.DuckDBPyType(list[dict[Union[str, int], str])


# MAP(UNION(u1 VARCHAR, u2 BIGINT), VARCHAR)[]

dict[key_type, value_type] dict type objects map to a MAP type of the key type and the value type.

import duckdb

duckdb.typing.DuckDBPyType(dict[str, int])
# MAP(VARCHAR, BIGINT)

{'a': field_one, 'b': field_two, .., 'n': field_n} dict objects map to a STRUCT composed of the keys and
values of the dict.

import duckdb

duckdb.typing.DuckDBPyType({'a': str, 'b': int})


# STRUCT(a VARCHAR, b BIGINT)

Union[ type_1 , ... type_n ] typing.Union objects map to a UNION type of the provided types.

import duckdb
from typing import Union

duckdb.typing.DuckDBPyType(Union[int, str, bool, bytearray])


# UNION(u1 BIGINT, u2 VARCHAR, u3 BOOLEAN, u4 BLOB)

Creation Functions For the built‑in types, you can use the constants defined in duckdb.typing:

DuckDB type

BIGINT
BIT
BLOB
BOOLEAN
DATE
DOUBLE
FLOAT
HUGEINT
INTEGER
INTERVAL
SMALLINT
SQLNULL
TIME_TZ
TIME
TIMESTAMP_MS

268
DuckDB Documentation

DuckDB type

TIMESTAMP_NS
TIMESTAMP_S
TIMESTAMP_TZ
TIMESTAMP
TINYINT
UBIGINT
UHUGEINT
UINTEGER
USMALLINT
UTINYINT
UUID
VARCHAR

For the complex types there are methods available on the DuckDBPyConnection object or the duckdb module. Anywhere a Duck-
DBPyType is accepted, we will also accept one of the type objects that can implicitly convert to a DuckDBPyType.

list_type | array_type Parameters:

• child_type: DuckDBPyType

struct_type | row_type Parameters:

• fields: Union[list[DuckDBPyType], dict[str, DuckDBPyType]]

map_type Parameters:

• key_type: DuckDBPyType
• value_type: DuckDBPyType

decimal_type Parameters:

• width: int
• scale: int

union_type Parameters:

• members: Union[list[DuckDBPyType], dict[str, DuckDBPyType]]

string_type Parameters:

• collation: Optional[str]

Expression API

The Expression class represents an instance of an expression.

269
DuckDB Documentation

Why Would I Use the Expression API?

Using this API makes it possible to dynamically build up expressions, which are typically created by the parser from the query string. This
allows you to skip that and have more fine‑grained control over the used expressions.

Below is a list of currently supported expressions that can be created through the API.

Column Expression

This expression references a column by name.

import duckdb
import pandas as pd

df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})

# selecting a single column


col = duckdb.ColumnExpression('a')
res = duckdb.df(df).select(col).fetchall()
print(res)
# [(1,), (2,), (3,), (4,)]

# selecting multiple columns


col_list = [duckdb.ColumnExpression('a'), duckdb.ColumnExpression('c')]
res = duckdb.df(df).select(*col_list).fetchall()
print(res)
# [(1, 42), (2, 21), (3, 13), (4, 14)]

Star Expression

This expression selects all columns of the input source.

Optionally it's possible to provide an exclude list to filter out columns of the table. This exclude list can contain either strings or
Expressions.

import duckdb
import pandas as pd

df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})

star = duckdb.StarExpression(exclude = ['b'])


res = duckdb.df(df).select(star).fetchall()
print(res)
# [(1, 42), (2, 21), (3, 13), (4, 14)]

Constant Expression

This expression contains a single value.

270
DuckDB Documentation

import duckdb
import pandas as pd

df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})

const = duckdb.ConstantExpression('hello')
res = duckdb.df(df).select(const).fetchall()
print(res)
# [('hello',), ('hello',), ('hello',), ('hello',)]

Case Expression

This expression contains a CASE WHEN (...) THEN (...) ELSE (...) END expression. By default ELSE is NULL and it can be
set using .else(value = ...). Additional WHEN (...) THEN (...) blocks can be added with .when(condition = ...,
value = ...).

import duckdb
import pandas as pd
from duckdb import (
ConstantExpression,
ColumnExpression,
CaseExpression
)

df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})

hello = ConstantExpression('hello')
world = ConstantExpression('world')

case = \
CaseExpression(condition = ColumnExpression('b') == False, value = world) \
.otherwise(hello)
res = duckdb.df(df).select(case).fetchall()
print(res)
# [('hello',), ('hello',), ('world',), ('hello',)]

Function Expression

This expression contains a function call. It can be constructed by providing the function name and an arbitrary amount of Expressions as
arguments.

import duckdb
import pandas as pd
from duckdb import (
ConstantExpression,
ColumnExpression,
FunctionExpression
)

271
DuckDB Documentation

df = pd.DataFrame({
'a': [
'test',
'pest',
'text',
'rest',
]
})

ends_with = FunctionExpression('ends_with', ColumnExpression('a'), ConstantExpression('est'))


res = duckdb.df(df).select(ends_with).fetchall()
print(res)
# [(True,), (True,), (False,), (True,)]

Common Operations

The Expression class also contains many operations that can be applied to any Expression type.

.cast(type: DuckDBPyType)
Applies a cast to the provided type on the expression.

.alias(name: str)
Apply an alias to the expression.

.isin(*exprs: Expression)
Create a IN expression against the provided expressions as the list.

.isnotin(*exprs: Expression)
Create a NOT IN expression against the provided expressions as the list.

Order Operations When expressions are provided to DuckDBPyRelation.order() these take effect:

.asc()
Indicates that this expression should be sorted in ascending order.

.desc()
Indicates that this expression should be sorted in descending order.

.nulls_first()
Indicates that the nulls in this expression should preceed the non‑null values.

.nulls_last()
Indicates that the nulls in this expression should come after the non‑null values.

Spark API

The DuckDB Spark API implements the PySpark API, allowing you to use the familiar Spark API to interact with DuckDB. All statements are
translated to DuckDB's internal plans using our relational API and executed using DuckDB's query engine.

Note. Warning The DuckDB Spark API is currently experimental and features are still missing. We are very interested in feedback.
Please report any functionality that you are missing, either through Discord or on GitHub.

Example

from duckdb.experimental.spark.sql import SparkSession as session


from duckdb.experimental.spark.sql.functions import lit, col
import pandas as pd

272
DuckDB Documentation

spark = session.builder.getOrCreate()

pandas_df = pd.DataFrame({
'age': [34, 45, 23, 56],
'name': ['Joan', 'Peter', 'John', 'Bob']
})

df = spark.createDataFrame(pandas_df)
df = df.withColumn(
'location', lit('Seattle')
)
res = df.select(
col('age'),
col('location')
).collect()

print(res)

[
Row(age=34, location='Seattle'),
Row(age=45, location='Seattle'),
Row(age=23, location='Seattle'),
Row(age=56, location='Seattle')
]

Contribution Guidelines

Contributions to the experimental Spark API are welcome. When making a contribution, please follow these guidelines:

• Instead of using temporary files, use our pytest testing framework.


• When adding new functions, ensure that method signatures comply with those in the PySpark API.

Python Client API

Known Python Issues

Unfortunately there are some issues that are either beyond our control or are very elusive / hard to track down. Below is a list of these
issues that you might have to be aware of, depending on your workflow.

Numpy Import Multithreading

When making use of multi threading and fetching results either directly as Numpy arrays or indirectly through a Pandas DataFrame, it might
be necessary to ensure that numpy.core.multiarray is imported. If this module has not been imported from the main thread, and a
different thread during execution attempts to import it this causes either a deadlock or a crash.

To avoid this, it's recommended to import numpy.core.multiarray before starting up threads.

Running EXPLAIN Renders Newlines in Jupyter and IPython

When DuckDB is run in Jupyter notebooks or in the IPython shell, the output of the EXPLAIN statement contains hard line breaks (\n):

In [1]: import duckdb


...: duckdb.sql("EXPLAIN SELECT 42 AS x")

273
DuckDB Documentation

Out[1]:
┌───────────────┬────────────────────────────────────────────────────────────────────────────────────────────────
│ explain_key │ explain_value

│ varchar │ varchar

├───────────────┼────────────────────────────────────────────────────────────────────────────────────────────────
│ physical_plan │ ┌───────────────────────────┐\n│ PROJECTION │\n│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│\n│ x … │
└───────────────┴────────────────────────────────────────────────────────────────────────────────────────────────

To work around this, print the output of the explain() function:

In [2]: print(duckdb.sql("SELECT 42 AS x").explain())

Out[2]:
┌───────────────────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ x │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ DUMMY_SCAN │
└───────────────────────────┘

Please also check out the Jupyter guide for tips on using Jupyter with JupySQL.

Error When Importing the DuckDB Python Package on Windows

When importing DuckDB on Windows, the Python runtime may return the following error:

import duckdb

ImportError: DLL load failed while importing duckdb: The specified module could not be found.

The solution is to install the Microsoft Visual C++ Redistributable package.

R API

Installation

The DuckDB R API can be installed using install.packages("duckdb"). Please see the installation page for details.

Reference Manual

The reference manual for the DuckDB R API is available at R.duckdb.org.

Basic API Usage

The standard DuckDB R API implements the DBI interface for R. If you are not familiar with DBI yet, see here for an introduction.

274
DuckDB Documentation

Startup & Shutdown To use DuckDB, you must first create a connection object that represents the database. The connection object
takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be
.db, .duckdb, or anything else). The special value :memory: (the default) can be used to create an in‑memory database. Note that
for an in‑memory database no data is persisted to disk (i.e., all data is lost when you exit the R process). If you would like to connect to an
existing database in read‑only mode, set the read_only flag to TRUE. Read‑only mode is required if multiple R processes want to access
the same database file at the same time.

library("duckdb")
# to start an in-memory database
con <- dbConnect(duckdb())
# or
con <- dbConnect(duckdb(), dbdir = ":memory:")
# to use a database file (not shared between processes)
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = FALSE)
# to use a database file (shared between processes)
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = TRUE)

Connections are closed implicitly when they go out of scope or if they are explicitly closed using dbDisconnect(). To shut down the
database instance associated with the connection, use dbDisconnect(con, shutdown = TRUE)

Querying DuckDB supports the standard DBI methods to send queries and retrieve result sets. dbExecute() is meant for queries
where no results are expected like CREATE TABLE or UPDATE etc. and dbGetQuery() is meant to be used for queries that produce
results (e.g., SELECT). Below an example.

# create a table
dbExecute(con, "CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
dbExecute(con, "INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")

# retrieve the items again


res <- dbGetQuery(con, "SELECT * FROM items")
print(res)
# item value count
# 1 jeans 20.0 1
# 2 hammer 42.2 2

DuckDB also supports prepared statements in the R API with the dbExecute and dbGetQuery methods. Here is an example:

# prepared statement parameters are given as a list


dbExecute(con, "INSERT INTO items VALUES (?, ?, ?)", list('laptop', 2000, 1))

# if you want to reuse a prepared statement multiple times, use dbSendStatement() and dbBind()
stmt <- dbSendStatement(con, "INSERT INTO items VALUES (?, ?, ?)")
dbBind(stmt, list('iphone', 300, 2))
dbBind(stmt, list('android', 3.5, 1))
dbClearResult(stmt)

# query the database using a prepared statement


res <- dbGetQuery(con, "SELECT item FROM items WHERE value > ?", list(400))
print(res)
# item
# 1 laptop

Note. Warning Do not use prepared statements to insert large amounts of data into DuckDB. See below for better options.

275
DuckDB Documentation

Efficient Transfer

To write a R data frame into DuckDB, use the standard DBI function dbWriteTable(). This creates a table in DuckDB and populates it
with the data frame contents. For example:

dbWriteTable(con, "iris_table", iris)


res <- dbGetQuery(con, "SELECT * FROM iris_table LIMIT 1")
print(res)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa

It is also possible to ”register” a R data frame as a virtual table, comparable to a SQL VIEW. This does not actually transfer data into DuckDB
yet. Below is an example:

duckdb_register(con, "iris_view", iris)


res <- dbGetQuery(con, "SELECT * FROM iris_view LIMIT 1")
print(res)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa

Note. DuckDB keeps a reference to the R data frame after registration. This prevents the data frame from being garbage‑collected.
The reference is cleared when the connection is closed, but can also be cleared manually using the duckdb_unregister()
method.

Also refer to the data import documentation for more options of efficiently importing data.

dbplyr

DuckDB also plays well with the dbplyr / dplyr packages for programmatic query construction from R. Here is an example:

library("duckdb")
library("dplyr")
con <- dbConnect(duckdb())
duckdb_register(con, "flights", nycflights13::flights)

tbl(con, "flights") |>


group_by(dest) |>
summarise(delay = mean(dep_time, na.rm = TRUE)) |>
collect()

When using dbplyr, CSV and Parquet files can be read using the dplyr::tbl function.

# Establish a CSV for the sake of this example


write.csv(mtcars, "mtcars.csv")

# Summarize the dataset in DuckDB to avoid reading the entire CSV into R's memory
tbl(con, "mtcars.csv") |>
group_by(cyl) |>
summarise(across(disp:wt, .fns = mean)) |>
collect()

# Establish a set of Parquet files


dbExecute(con, "COPY flights TO 'dataset' (FORMAT PARQUET, PARTITION_BY (year, month))")

# Summarize the dataset in DuckDB to avoid reading 12 Parquet files into R's memory
tbl(con, "read_parquet('dataset/**/*.parquet', hive_partitioning = true)") |>
filter(month == "3") |>
summarise(delay = mean(dep_time, na.rm = TRUE)) |>
collect()

276
DuckDB Documentation

Rust API

Installation

The DuckDB Rust API can be installed from crates.io. Please see the docs.rs for details.

Basic API Usage

duckdb‑rs is an ergonomic wrapper based on the DuckDB C API, please refer to the README for details.

Startup & Shutdown To use duckdb, you must first initialize a Connection handle using Connection::open(). Connec-
tion::open() takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file
extension may be .db, .duckdb, or anything else). You can also use Connection::open_in_memory() to create an in‑memory
database. Note that for an in‑memory database no data is persisted to disk (i.e., all data is lost when you exit the process).

use duckdb::{params, Connection, Result};


let conn = Connection::open_in_memory()?;

You can conn.close() the Connection manually, or just leave it out of scope, we had implement the Drop trait which will automati‑
cally close the underlining db connection for you.

Querying SQL queries can be sent to DuckDB using the execute() method of connections, or we can also prepare the statement and
then query on that.

#[derive(Debug)]
struct Person {
id: i32,
name: String,
data: Option<Vec<u8>>,
}

conn.execute(
"INSERT INTO person (name, data) VALUES (?, ?)",
params![me.name, me.data],
)?;

let mut stmt = conn.prepare("SELECT id, name, data FROM person")?;


let person_iter = stmt.query_map([], |row| {
Ok(Person {
id: row.get(0)?,
name: row.get(1)?,
data: row.get(2)?,
})
})?;

for person in person_iter {


println!("Found person {:?}", person.unwrap());
}

Appender

The Rust client supports the DuckDB Appender API for bulk inserts. For example:

277
DuckDB Documentation

fn insert_rows(conn: &Connection) -> Result<()> {


let mut app = conn.appender("foo")?;
app.append_rows([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])?;
Ok(())
}

Swift API

DuckDB offers a Swift API. See the announcement post for details.

Instantiating DuckDB

DuckDB supports both in‑memory and persistent databases. To work with an in‑memory datatabase, run:

let database = try Database(store: .inMemory)

To work with a persistent database, run:

let database = try Database(store: .file(at: "test.db"))

Queries can be issued through a database connection.

let connection = try database.connect()

DuckDB supports multiple connections per database.

Application Example

The rest of the page is based on the example of our announcement post, which uses raw data from NASA's Exoplanet Archive loaded directly
into DuckDB.

Creating an Application‑Specific Type We first create an application‑specific type that we'll use to house our database and connection
and through which we'll eventually define our app‑specific queries.

import DuckDB

final class ExoplanetStore {

let database: Database


let connection: Connection

init(database: Database, connection: Connection) {


self.database = database
self.connection = connection
}
}

Loading a CSV File We load the data from NASA's Exoplanet Archive:

wget https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_
year+from+pscomppars&format=csv -O downloaded_exoplanets.csv

Once we have our CSV downloaded locally, we can use the following SQL command to load it as a new table to DuckDB:

CREATE TABLE exoplanets AS


SELECT * FROM read_csv('downloaded_exoplanets.csv');

278
DuckDB Documentation

Let's package this up as a new asynchronous factory method on our ExoplanetStore type:

import DuckDB
import Foundation

final class ExoplanetStore {

// Factory method to create and prepare a new ExoplanetStore


static func create() async throws -> ExoplanetStore {

// Create our database and connection as described above


let database = try Database(store: .inMemory)
let connection = try database.connect()

// Download the CSV from the exoplanet archive


let (csvFileURL, _) = try await URLSession.shared.download(
from: URL(string: "https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_
year+from+pscomppars&format=csv")!)

// Issue our first query to DuckDB


try connection.execute("""
CREATE TABLE exoplanets AS
SELECT * FROM read_csv('\(csvFileURL.path)');
""")

// Create our pre-populated ExoplanetStore instance


return ExoplanetStore(
database: database,
connection: connection
)
}

// Let's make the initializer we defined previously


// private. This prevents anyone accidentally instantiating
// the store without having pre-loaded our Exoplanet CSV
// into the database
private init(database: Database, connection: Connection) {
...
}
}

Querying the Database The following example queires DuckDB from within Swift via an async function. This means the callee won't be
blocked while the query is executing. We'll then cast the result columns to Swift native types using DuckDB's ResultSet cast(to:)
family of methods, before finally wrapping them up in a DataFrame from the TabularData framework.

...

import TabularData

extension ExoplanetStore {

// Retrieves the number of exoplanets discovered by year


func groupedByDiscoveryYear() async throws -> DataFrame {

// Issue the query we described above


let result = try connection.query("""
SELECT disc_year, count(disc_year) AS Count
FROM exoplanets

279
DuckDB Documentation

GROUP BY disc_year
ORDER BY disc_year
""")

// Cast our DuckDB columns to their native Swift


// equivalent types
let discoveryYearColumn = result[0].cast(to: Int.self)
let countColumn = result[1].cast(to: Int.self)

// Use our DuckDB columns to instantiate TabularData


// columns and populate a TabularData DataFrame
return DataFrame(columns: [
TabularData.Column(discoveryYearColumn).eraseToAnyColumn(),
TabularData.Column(countColumn).eraseToAnyColumn(),
])
}
}

Complete Project For the complete example project, clone the DuckDB Swift repo and open up the runnable app project located in
Examples/SwiftUI/ExoplanetExplorer.xcodeproj.

Wasm

DuckDB Wasm

DuckDB has been compiled to WebAssembly, so it can run inside any browser on any device.

{% include iframe.html src=”https://shell.duckdb.org” %}

DuckDB‑Wasm offers a layered API, it can be embedded as a JavaScript + WebAssembly library, as a Web shell, or built from source according
to your needs.

Getting Started with DuckDB‑Wasm

A great starting point is to read the DuckDB‑Wasm launch blog post!

Another great resource is the GitHub repository.

For details, see the full DuckDB‑Wasm API Documentation.

Instantiation

Instantiation

DuckDB‑Wasm has multiple ways to be instantiated depending on the use case.

cdn(jsdelivr)

import * as duckdb from '@duckdb/duckdb-wasm';

const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();

// Select a bundle based on browser checks


const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);

280
DuckDB Documentation

const worker_url = URL.createObjectURL(


new Blob([`importScripts("${bundle.mainWorker!}");`], {type: 'text/javascript'})
);

// Instantiate the asynchronus version of DuckDB-Wasm


const worker = new Worker(worker_url);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
URL.revokeObjectURL(worker_url);

webpack

import * as duckdb from '@duckdb/duckdb-wasm';


import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm';
import duckdb_wasm_next from '@duckdb/duckdb-wasm/dist/duckdb-eh.wasm';
const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
mvp: {
mainModule: duckdb_wasm,
mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js',
import.meta.url).toString(),
},
eh: {
mainModule: duckdb_wasm_next,
mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js',
import.meta.url).toString(),
},
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronus version of DuckDB-Wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);

vite

import * as duckdb from '@duckdb/duckdb-wasm';


import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm?url';
import mvp_worker from '@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js?url';
import duckdb_wasm_eh from '@duckdb/duckdb-wasm/dist/duckdb-eh.wasm?url';
import eh_worker from '@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js?url';

const MANUAL_BUNDLES: duckdb.DuckDBBundles = {


mvp: {
mainModule: duckdb_wasm,
mainWorker: mvp_worker,
},
eh: {
mainModule: duckdb_wasm_eh,
mainWorker: eh_worker,
},
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronus version of DuckDB-wasm

281
DuckDB Documentation

const worker = new Worker(bundle.mainWorker!);


const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);

Statically Served

It is possible to manually download the files from https://cdn.jsdelivr.net/npm/@duckdb/duckdb‑wasm/dist/.

import * as duckdb from '@duckdb/duckdb-wasm';

const MANUAL_BUNDLES: duckdb.DuckDBBundles = {


mvp: {
mainModule: 'change/me/../duckdb-mvp.wasm',
mainWorker: 'change/me/../duckdb-browser-mvp.worker.js',
},
eh: {
mainModule: 'change/m/../duckdb-eh.wasm',
mainWorker: 'change/m/../duckdb-browser-eh.worker.js',
},
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
// Instantiate the asynchronous version of DuckDB-Wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);

Data Ingestion

DuckDB‑Wasm has multiple ways to import data, depending on the format of the data.

There are two steps to import data into DuckDB.

First, the data file is imported into a local file system using register functions (registerEmptyFileBuffer, registerFileBuffer, registerFileHandle,
registerFileText, registerFileURL).

Then, the data file is imported into DuckDB using insert functions (insertArrowFromIPCStream, insertArrowTable, insertCSVFromPath, in‑
sertJSONFromPath) or directly using FROM SQL query (using extensions like Parquet or Wasm‑flavored httpfs).

Insert statements can also be used to import data.

Data Import

Open & Close Connection

// Create a new connection


const c = await db.connect();

// ... import data

// Close the connection to release memory


await c.close();

282
DuckDB Documentation

Apache Arrow

// Data can be inserted from an existing arrow.Table


// More Example https://arrow.apache.org/docs/js/
import { tableFromArrays } from 'apache-arrow';

// EOS signal according to Arrorw IPC streaming format


// See https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format
const EOS = new Uint8Array([255, 255, 255, 255, 0, 0, 0, 0]);

const arrowTable = tableFromArrays({


id: [1, 2, 3],
name: ['John', 'Jane', 'Jack'],
age: [20, 21, 22],
});

await c.insertArrowTable(arrowTable, { name: 'arrow_table' });


// Write EOS
await c.insertArrowTable(EOS, { name: 'arrow_table' });

// ..., from a raw Arrow IPC stream


const streamResponse = await fetch(` someapi`);
const streamReader = streamResponse.body.getReader();
const streamInserts = [];
while (true) {
const { value, done } = await streamReader.read();
if (done) break;
streamInserts.push(c.insertArrowFromIPCStream(value, { name: 'streamed' }));
}

// Write EOS
streamInserts.push(c.insertArrowFromIPCStream(EOS, { name: 'streamed' }));

await Promise.all(streamInserts);

CSV

// ..., from CSV files


// (interchangeable: registerFile{Text,Buffer,URL,Handle})
const csvContent = '1|foo\n2|bar\n';
await db.registerFileText(` data.csv`, csvContent);
// ... with typed insert options
await c.insertCSVFromPath('data.csv', {
schema: 'main',
name: 'foo',
detect: false,
header: false,
delimiter: '|',
columns: {
col1: new arrow.Int32(),
col2: new arrow.Utf8(),
},
});

JSON

// ..., from JSON documents in row-major format


const jsonRowContent = [

283
DuckDB Documentation

{ "col1": 1, "col2": "foo" },


{ "col1": 2, "col2": "bar" },
];
await db.registerFileText(
'rows.json',
JSON.stringify(jsonRowContent),
);
await c.insertJSONFromPath('rows.json', { name: 'rows' });

// ... or column-major format


const jsonColContent = {
"col1": [1, 2],
"col2": ["foo", "bar"]
};
await db.registerFileText(
'columns.json',
JSON.stringify(jsonColContent),
);
await c.insertJSONFromPath('columns.json', { name: 'columns' });

// From API
const streamResponse = await fetch(` someapi/content.json`);
await db.registerFileBuffer('file.json', new Uint8Array(await streamResponse.arrayBuffer()))
await c.insertJSONFromPath('file.json', { name: 'JSONContent' });

Parquet

// from Parquet files


// ...Local
const pickedFile: File = letUserPickFile();
await db.registerFileHandle('local.parquet', pickedFile, DuckDBDataProtocol.BROWSER_FILEREADER, true);
// ...Remote
await db.registerFileURL('remote.parquet', 'https://origin/remote.parquet', DuckDBDataProtocol.HTTP,
false);
// ... Using Fetch
const res = await fetch('https://origin/remote.parquet');
await db.registerFileBuffer('buffer.parquet', new Uint8Array(await res.arrayBuffer()));

// ..., by specifying URLs in the SQL text


await c.query(`
CREATE TABLE direct AS
SELECT * FROM 'https://origin/remote.parquet'
`);
// ..., or by executing raw insert statements
await c.query(`
INSERT INTO existing_table
VALUES (1, 'foo'), (2, 'bar')`);

httpfs (Wasm‑flavored)

// ..., by specifying URLs in the SQL text


await c.query(`
CREATE TABLE direct AS
SELECT * FROM 'https://origin/remote.parquet'
`);

Insert Statement

284
DuckDB Documentation

// ..., or by executing raw insert statements


await c.query(`
INSERT INTO existing_table
VALUES (1, 'foo'), (2, 'bar')`);

Query

DuckDB‑Wasm provides functions for querying data. Queries are run sequentially.

First, a connection need to be created by calling connect. Then, queries can be run by calling query or send.

Query Execution

// Create a new connection


const conn = await db.connect();

// Either materialize the query result


await conn.query<{ v: arrow.Int }>(`
SELECT * FROM generate_series(1, 100) t(v)
`);
// ..., or fetch the result chunks lazily
for await (const batch of await conn.send<{ v: arrow.Int }>(`
SELECT * FROM generate_series(1, 100) t(v)
`)) {
// ...
}

// Close the connection to release memory


await conn.close();

Prepared Statements

// Create a new connection


const conn = await db.connect();
// Prepare query
const stmt = await conn.prepare(` SELECT v + ? FROM generate_series(0, 10000) AS t(v);`);
// ... and run the query with materialized results
await stmt.query(234);
// ... or result chunks
for await (const batch of await stmt.send(234)) {
// ...
}
// Close the statement to release memory
await stmt.close();
// Closing the connection will release statements as well
await conn.close();

Arrow Table to JSON

// Create a new connection


const conn = await db.connect();

// Query
const arrowResult = await conn.query<{ v: arrow.Int }>(`
SELECT * FROM generate_series(1, 100) t(v)

285
DuckDB Documentation

`);

// Convert arrow table to json


const result = arrowResult.toArray().map((row) => row.toJSON());

// Close the connection to release memory


await conn.close();

Export Parquet

// Create a new connection


const conn = await db.connect();

// Export Parquet
conn.send(` COPY (SELECT * FROM tbl) TO 'result-snappy.parquet' (FORMAT 'parquet');`);
const parquet_buffer = await this._db.copyFileToBuffer('result-snappy.parquet');

// Generate a download link


const link = URL.createObjectURL(new Blob([parquet_buffer]));

// Close the connection to release memory


await conn.close();

Extensions

DuckDB‑Wasm's (dynamic) extension loading is modeled after the regular DuckDB's extension loading, with a few relevant differences due
to the difference in platform.

Format

Extensions in DuckDB are binaries to be dynamically loaded via dlopen. A cryptographical signature is appended to the binary. An exten‑
sion in DuckDB‑Wasm is a regular Wasm file to be dynamically loaded via Emscripten's dlopen. A cryptographical signature is appended
to the Wasm file as a WebAssembly custom section called duckdb_signature. This ensures the file remains a valid WebAssembly file.

Note. Currently, we require this custom section to be the last one, but this can be potentially relaxed in the future.

INSTALL and LOAD

The INSTALL semantic in native embeddings of DuckDB is to fetch, decompress from gzip and store data in local disk. The LOAD semantic
in native embeddings of DuckDB is to (optionally) perform signature checks and dynamic load the binary with the main DuckDB binary.

In DuckDB‑Wasm, INSTALL is a no‑op given there is no durable cross‑session storage. The LOAD operation will fetch (and decompress on
the fly), perform signature checks and dynamically load via the Emscripten implementation of dlopen.

Autoloading

Autoloading, i.e., the possibility for DuckDB to add extension functionality on‑the‑fly, is enabled by default in DuckDB‑Wasm.

List of Officially Available Extensions

286
DuckDB Documentation

Extension name Description Aliases

autocomplete Adds support for autocomplete in the shell


excel Adds support for Excel‑like format strings
fts Adds support for Full‑Text Search Indexes
icu Adds support for time zones and collations using the ICU library
inet Adds support for IP‑related data types and functions
json Adds support for JSON operations
parquet Adds support for reading and writing Parquet files
sqlite GitHub Adds support for reading SQLite database files sqlite, sqlite3
sqlsmith
substrait GitHub Adds support for the Substrait integration
tpcds Adds TPC‑DS data generation and query support
tpch Adds TPC‑H data generation and query support

WebAssembly is basically an additional platform, and there might be platform‑specific limitations that make some extensions not able
to match their native capabilities or to perform them in a different way. We will document here relevant differences for DuckDB‑hosted
extensions.

HTTPFS The HTTPFS extension is, at the moment, not available in DuckDB‑Wasm. Https protocol capabilities needs to go through an
additional layer, the browser, which adds both differences and some restrictions to what is doable from native.

Instead, DuckDB‑Wasm has a separate implementation that for most purposes is interchangable, but does not support all use cases (as
it must follow security rules imposed by the browser, such as CORS). Due to this CORS restriction, any requests for data made using the
HTTPFS extension must be to websites that allow (using CORS headers) the website hosting the DuckDB‑Wasm instance to access that data.
The MDN website is a great resource for more information regarding CORS.

Extension Signing

As with regular DuckDB extensions, DuckDB‑Wasm extension are by default checked on LOAD to verify the signature confirm the extension
has not been tampered with. Extension signature verification can be disabled via a configuration option. Signing is a property of the binary
itself, so copying a DuckDB extension (say to serve it from a different location) will still keep a valid signature (e.g., for local development).

Fetching DuckDB‑Wasm Extensions

Official DuckDB extensions are served at extensions.duckdb.org, and this is also the default value for the default_extension_
repository option. When installing extensions, a relevant URL will be built that will look like extensions.duckdb.org/$duckdb_
version_hash/$duckdb_platform/$name.duckdb_extension.gz.

DuckDB‑Wasm extension are fetched only on load, and the URL will look like: extensions.duckdb.org/duckdb-wasm/$duckdb_
version_hash/$duckdb_platform/$name.duckdb_extension.wasm.

Note that an additional duckdb-wasm is added to the folder structure, and the file is served as a .wasm file.

DuckDB‑Wasm extensions are served pre‑compressed using Brotli compression. While fetched from a browser, extensions will be
transparently uncompressed. If you want to fetch the duckdb-wasm extension manually, you can use curl --compress exten-
sions.duckdb.org/<...>/icu.duckdb_extension.wasm.

287
DuckDB Documentation

Serving Extensions from a Third‑Party Repository

As with regular DuckDB, if you use SET custom_extension_repository = some.url.com, subsequent loads will be attempted
at some.url.com/duckdb-wasm/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.wasm.

Note that GET requests on the extensions needs to be CORS enabled for a browser to allow the connection.

Tooling

Both DuckDB‑Wasm and its extensions have been compiled using latest packaged Emscripten toolchain.

{% include iframe.html src=”https://shell.duckdb.org” %}

ADBC API

Arrow Database Connectivity (ADBC), similarly to ODBC and JDBC, is a C‑style API that enables code portability between different database
systems. This allows developers to effortlessly build applications that communicate with database systems without using code specific to
that system. The main difference between ADBC and ODBC/JDBC is that ADBC uses Arrow to transfer data between the database system and
the application. DuckDB has an ADBC driver, which takes advantage of the zero‑copy integration between DuckDB and Arrow to efficiently
transfer data.

DuckDB's ADBC driver currently supports version 0.7 of ADBC.

Please refer to the ADBC documentation page for a more extensive discussion on ADBC and a detailed API explanation.

Implemented Functionality

The DuckDB‑ADBC driver implements the full ADBC specification, with the exception of the ConnectionReadPartition and State-
mentExecutePartitions functions. Both of these functions exist to support systems that internally partition the query results, which
does not apply to DuckDB. In this section, we will describe the main functions that exist in ADBC, along with the arguments they take and
provide examples for each function.

Database Set of functions that operate on a database.

Function name Description Arguments Example

DatabaseNew Allocate a new (AdbcDatabase AdbcDatabaseNew(&adbc_


(but *database, AdbcError database, &adbc_error)
uninitialized) *error)
database.
DatabaseSetOption Set a char* (AdbcDatabase AdbcDatabaseSetOption(&adbc_
option. *database, const char database, "path", "test.db",
*key, const char &adbc_error)
*value, AdbcError
*error)
DatabaseInit Finish setting (AdbcDatabase AdbcDatabaseInit(&adbc_
options and *database, AdbcError database, &adbc_error)
initialize the *error)
database.
DatabaseRelease Destroy the (AdbcDatabase AdbcDatabaseRelease(&adbc_
database. *database, AdbcError database, &adbc_error)
*error)

288
DuckDB Documentation

Connection A set of functions that create and destroy a connection to interact with a database.

Function name Description Arguments Example

ConnectionNew Allocate a new (AdbcConnection*, AdbcConnectionNew(&adbc_


(but AdbcError*) connection, &adbc_error)
uninitialized)
connection.
ConnectionSetOption Options may be (AdbcConnection*, const AdbcConnectionSetOption(&adbc_
set before char*, const char*, connection, ADBC_CONNECTION_
ConnectionInit. AdbcError*) OPTION_AUTOCOMMIT, ADBC_
OPTION_VALUE_DISABLED, &adbc_
error)
ConnectionInit Finish setting (AdbcConnection*, AdbcConnectionInit(&adbc_
options and AdbcDatabase*, connection, &adbc_database,
initialize the AdbcError*) &adbc_error)
connection.
ConnectionRelease Destroy this (AdbcConnection*, AdbcConnectionRelease(&adbc_
connection. AdbcError*) connection, &adbc_error)

A set of functions that retrieve metadata about the database. In general, these functions will return Arrow objects, specifically an ArrowAr‑
rayStream.

Function name Description Arguments Example

ConnectionGetObjects Get a (AdbcConnection*, int, AdbcDatabaseInit(&adbc_


hierarchical view const char*, const database, &adbc_error)
of all catalogs, char*, const char*,
database const char**, const
schemas, tables, char*,
and columns. ArrowArrayStream*,
AdbcError*)
ConnectionGetTableSchema Get the Arrow (AdbcConnection*, const AdbcDatabaseRelease(&adbc_
schema of a char*, const char*, database, &adbc_error)
table. const char*,
ArrowSchema*,
AdbcError*)
ConnectionGetTableTypes Get a list of table (AdbcConnection*, AdbcDatabaseNew(&adbc_
types in the ArrowArrayStream*, database, &adbc_error)
database. AdbcError*)

A set of functions with transaction semantics for the connection. By default, all connections start with auto‑commit mode on, but this can
be turned off via the ConnectionSetOption function.

Function name Description Arguments Example

ConnectionCommit Commit any (AdbcConnection*, AdbcConnectionCommit(&adbc_


pending AdbcError*) connection, &adbc_error)
transactions.

289
DuckDB Documentation

Function name Description Arguments Example

ConnectionRollback Rollback any (AdbcConnection*, AdbcConnectionRollback(&adbc_


pending AdbcError*) connection, &adbc_error)
transactions.

Statement Statements hold state related to query execution. They represent both one‑off queries and prepared statements. They can
be reused; however, doing so will invalidate prior result sets from that statement.

The functions used to create, destroy, and set options for a statement:

Function name Description Arguments Example

StatementNew Create a new (AdbcConnection*, AdbcStatementNew(&adbc_


statement for a AdbcStatement*, connection, &adbc_statement,
given AdbcError*) &adbc_error)
connection.
StatementRelease Destroy a (AdbcStatement*, AdbcStatementRelease(&adbc_
statement. AdbcError*) statement, &adbc_error)
StatementSetOption Set a string (AdbcStatement*, const StatementSetOption(&adbc_
option on a char*, const char*, statement, ADBC_INGEST_
statement. AdbcError*) OPTION_TARGET_TABLE, "TABLE_
NAME", &adbc_error)

Functions related to query execution:

Function name Description Arguments Example

StatementSetSqlQuery Set the SQL (AdbcStatement*, const AdbcStatementSetSqlQuery(&adbc_


query to execute. char*, AdbcError*) statement, "SELECT * FROM
The query can TABLE", &adbc_error)
then be
executed with
StatementExe‑
cuteQuery.
StatementSetSubstraitPlan Set a substrait (AdbcStatement*, const AdbcStatementSetSubstraitPlan(&adbc_
plan to execute. uint8_t*, size_t, statement, substrait_plan,
The query can AdbcError*) length, &adbc_error)
then be
executed with
StatementExe‑
cuteQuery.
StatementExecuteQuery Execute a (AdbcStatement*, AdbcStatementExecuteQuery(&adbc_
statement and ArrowArrayStream*, statement, &arrow_stream,
get the results. int64_t*, AdbcError*) &rows_affected, &adbc_error)

290
DuckDB Documentation

Function name Description Arguments Example

StatementPrepare Turn this (AdbcStatement*, AdbcStatementPrepare(&adbc_


statement into a AdbcError*) statement, &adbc_error)
prepared
statement to be
executed
multiple times.

Functions related to binding, used for bulk insertion or in prepared statements.

Function name Description Arguments Example

StatementBindStream Bind Arrow (AdbcStatement*, StatementBindStream(&adbc_


Stream. This can ArrowArrayStream*, statement, &input_data,
be used for bulk AdbcError*) &adbc_error)
inserts or
prepared
statements.

Examples

Regardless of the programming language being used, there are two database options which will be required to utilize ADBC with DuckDB.
The first one is the driver, which takes a path to the DuckDB library. The second option is the entrypoint, which is an exported function
from the DuckDB‑ADBC driver that initializes all the ADBC functions. Once we have configured these two options, we can optionally set the
path option, providing a path on disk to store our DuckDB database. If not set, an in‑memory database is created. After configuring all the
necessary options, we can proceed to initialize our database. Below is how you can do so with various different language environments.

C++ We begin our C++ example by declaring the essential variables for querying data through ADBC. These variables include Error,
Database, Connection, Statement handling, and an Arrow Stream to transfer data between DuckDB and the application.

AdbcError adbc_error;
AdbcDatabase adbc_database;
AdbcConnection adbc_connection;
AdbcStatement adbc_statement;
ArrowArrayStream arrow_stream;

We can then initialize our database variable. Before initializing the database, we need to set the driver and entrypoint
options as mentioned above. Then we set the path option and initialize the database. With the example below, the string
"path/to/libduckdb.dylib" should be the path to the dynamic library for DuckDB. This will be .dylib on macOS, and
.so on Linux.

AdbcDatabaseNew(&adbc_database, &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "driver", "path/to/libduckdb.dylib", &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "entrypoint", "duckdb_adbc_init", &adbc_error);
// By default, we start an in-memory database, but you can optionally define a path to store it on disk.
AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error);
AdbcDatabaseInit(&adbc_database, &adbc_error);

After initializing the database, we must create and initialize a connection to it.

AdbcConnectionNew(&adbc_connection, &adbc_error);
AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error);

291
DuckDB Documentation

We can now initialize our statement and run queries through our connection. After the AdbcStatementExecuteQuery the arrow_
stream is populated with the result.

AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error);


AdbcStatementSetSqlQuery(&adbc_statement, "SELECT 42", &adbc_error);
int64_t rows_affected;
AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error);
arrow_stream.release(arrow_stream)

Besides running queries, we can also ingest data via arrow_streams. For this we need to set an option with the table name we want to
insert to, bind the stream and then execute the query.

StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "AnswerToEverything", &adbc_error);


StatementBindStream(&adbc_statement, &arrow_stream, &adbc_error);
StatementExecuteQuery(&adbc_statement, nullptr, nullptr, &adbc_error);

Python The first thing to do is to use pip and install the ADBC Driver manager. You will also need to install the pyarrow to directly
access Apache Arrow formatted result sets (such as using fetch_arrow_table).

pip install adbc_driver_manager pyarrow

Note. For details on the adbc_driver_manager package, see the adbc_driver_manager package documentation.

As with C++, we need to provide initialization options consisting of the location of the libduckdb shared object and entrypoint function.
Notice that the path argument for DuckDB is passed in through the db_kwargs dictionary.

import adbc_driver_duckdb.dbapi

with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur:


cur.execute("SELECT 42")
# fetch a pyarrow table
tbl = cur.fetch_arrow_table()
print(tbl)

Alongside fetch_arrow_table, other methods from DBApi are also implemented on the cursor, such as fetchone and fetchall.
Data can also be ingested via arrow_streams. We just need to set options on the statement to bind the stream of data and execute the
query.

import adbc_driver_duckdb.dbapi
import pyarrow

data = pyarrow.record_batch(
[[1, 2, 3, 4], ["a", "b", "c", "d"]],
names = ["ints", "strs"],
)

with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur:


cur.adbc_ingest("AnswerToEverything", data)

ODBC

ODBC API ‑ Overview

The ODBC (Open Database Connectivity) is a C‑style API that provides access to different flavors of Database Management Systems (DBMSs).
The ODBC API consists of the Driver Manager (DM) and the ODBC drivers.

The DM is part of the system library, e.g., unixODBC, which manages the communications between the user applications and the ODBC
drivers. Typically, applications are linked against the DM, which uses Data Source Name (DSN) to look up the correct ODBC driver.

292
DuckDB Documentation

The ODBC driver is a DBMS implementation of the ODBC API, which handles all the internals of that DBMS.

The DM maps user application calls of ODBC functions to the correct ODBC driver that performs the specified function and returns the
proper values.

DuckDB ODBC Driver

DuckDB supports the ODBC version 3.0 according to the Core Interface Conformance.

We release the ODBC driver as assets for Linux and Windows. Users can download them from the Latest Release of DuckDB.

Operating Systems

Operating system Supported versions

Linux Ubuntu 20.04 or later


Microsoft Windows Microsoft Windows 10 or later

ODBC API ‑ Linux

A driver manager is required to manage communication between applications and the ODBC driver. We tested and support unixODBC
that is a complete ODBC driver manager for Linux. Users can install it from the command line:

Debian Flavors

sudo apt-get install unixodbc odbcinst

Fedora Flavors

sudo yum install unixODBC

Step 1: Download ODBC Driver

DuckDB releases the ODBC driver as asset. For linux, download it from ODBC Linux Asset that contains the following artifacts:

libduckdb_odbc.so: the DuckDB driver compiled to Ubuntu 16.04.

unixodbc_setup.sh: a setup script to aid the configuration on Linux.

Step 2: Extracting ODBC Artifacts

Run unzip to extract the files to a permanent directory:

mkdir duckdb_odbc
unzip duckdb_odbc-linux-amd64.zip -d duckdb_odbc

Step 3: Configuring with unixODBC

The unixodbc_setup.sh script aids the configuration of the DuckDB ODBC Driver. It is based on the unixODBC package that provides
some commands to handle the ODBC setup and test like odbcinst and isql.

In a terminal window, change to the duckdb_odbc permanent directory, and run the following commands with level options -u or -s
either to configure DuckDB ODBC.

293
DuckDB Documentation

User‑Level ODBC Setup (‑u) The -u option based on the user home directory to setup the ODBC init files.

./unixodbc_setup.sh -u

The default configuration consists of a database :memory:.

System‑Level ODBC setup (‑s) The ‑s changes the system level files that will be visible for all users, because of that it requires root
privileges.

sudo unixodbc_setup.sh -s

The default configuration consists of a database :memory:.

Show Usage (‑‑help) The option --help shows the usage of unixodbc_setup.sh that provides alternative options for a customer
configuration, like -db and -D.

unixodbc_setup.sh --help

Usage: ./unixodbc_setup.sh <level> [options]

Example: ./unixodbc_setup.sh -u -db ~/database_path -D ~/driver_path/libduckdb_odbc.so

Level:
-s: System-level, using 'sudo' to configure DuckDB ODBC at the system-level, changing the files:
/etc/odbc[inst].ini
-u: User-level, configuring the DuckDB ODBC at the user-level, changing the files: ~/.odbc[inst].ini.

Options:
-db database_path>: the DuckDB database file path, the default is ':memory:' if not provided.
-D driver_path: the driver file path (i.e., the path for libduckdb_odbc.so), the default is using the
base script directory

Step 4 (Optional): Configure the ODBC Driver

The ODBC setup on Linux is based on files, the well‑known .odbc.ini and .odbcinst.ini. These files can be placed at the system
/etc directory or at the user home directory /home/ user (shortcut as ~/). The DM prioritizes the user configuration files and then
the system files.

The .odbc.ini File The .odbc.ini contains the DSNs for the drivers, which can have specific knobs.

An example of .odbc.ini with DuckDB would be:

[DuckDB]
Driver = DuckDB Driver
Database = :memory:

[DuckDB]: between the brackets is a DSN for the DuckDB.

Driver: it describes the driver's name, and other configurations will be placed at the .odbcinst.ini.

Database: it describes the database name used by DuckDB, and it can also be a file path to a .db in the system.

The .odbcinst.ini File The .odbcinst.ini contains general configurations for the ODBC installed drivers in the system. A driver
section starts with the driver name between brackets, and then it follows specific configuration knobs belonging to that driver.

An example of .odbcinst.ini with the DuckDB driver would be:

294
DuckDB Documentation

[ODBC]
Trace = yes
TraceFile = /tmp/odbctrace

[DuckDB Driver]
Driver = /home/ user /duckdb_odbc/libduckdb_odbc.so

[ODBC]: it is the DM configuration section.

Trace: it enables the ODBC trace file using the option yes.

TraceFile: the absolute system file path for the ODBC trace file.

[DuckDB Driver]: the section of the DuckDB installed driver.

Driver: the absolute system file path of the DuckDB driver.

ODBC API ‑ Windows

The Microsoft Windows requires an ODBC Driver Manager to manage communication between applications and the ODBC drivers. The DM
on Windows is provided in a DLL file odbccp32.dll, and other files and tools. For detailed information checkout out the Common ODBC
Component Files.

Step 1: Download ODBC Driver

DuckDB releases the ODBC driver as asset. For Windows, download it from Windows Asset that contains the following artifacts:

duckdb_odbc.dll: the DuckDB driver compiled for Windows.

duckdb_odbc_setup.dll: a setup DLL used by the Windows ODBC Data Source Administrator tool.

odbc_install.exe: an installation script to aid the configuration on Windows.

Step 2: Extracting ODBC Artifacts

Unzip the file to a permanent directory (e.g., duckdb_odbc).

An example with PowerShell and unzip command would be:

mkdir duckdb_odbc
unzip duckdb_odbc-linux-amd64.zip -d duckdb_odbc

Step 3: ODBC Windows Installer

The odbc_install.exe aids the configuration of the DuckDB ODBC Driver on Windows. It depends on the Odbccp32.dll that pro‑
vides functions to configure the ODBC registry entries.

Inside the permanent directory (e.g., duckdb_odbc), double‑click on the odbc_install.exe.

Windows administrator privileges is required, in case of a non‑administrator a User Account Control shall display:

Step 4: Configure the ODBC Driver

The odbc_install.exe adds a default DSN configuration into the ODBC registries with a default database :memory:.

DSN Windows Setup After the installation, it is possible to change the default DSN configuration or add a new one using the Windows
ODBC Data Source Administrator tool odbcad32.exe.

It also can be launched thought the Windows start:

295
DuckDB Documentation

Default DuckDB DSN The newly installed DSN is visible on the System DSN in the Windows ODBC Data Source Administrator tool:

Changing DuckDB DSN When selecting the default DSN (i.e., DuckDB) or adding a new configuration, the following setup window will
display:

296
DuckDB Documentation

This window allows you to set the DSN and the database file path associated with that DSN.

More Detailed Windows Setup

There are two ways to configure the ODBC driver, either by altering the registry keys as detailed below, or by connecting with SQLDriver-
Connect. A combination of the two is also possible.

Furthermore, the ODBC driver supports all the configuration options included in DuckDB.

Note. If a configuration is set in both the connection string passed to SQLDriverConnect and in the odbc.ini file, the one
passed to SQLDriverConnect will take precedence.

Registry Keys The ODBC setup on Windows is based on registry keys (see Registry Entries for ODBC Components). The ODBC entries can
be placed at the current user registry key (HKCU) or the system registry key (HKLM).

We have tested and used the system entries based on HKLM->SOFTWARE->ODBC. The odbc_install.exe changes this entry that has
two subkeys: ODBC.INI and ODBCINST.INI.

The ODBC.INI is where users usually insert DSN registry entries for the drivers.

For example, the DSN registry for DuckDB would look like this:

297
DuckDB Documentation

The ODBCINST.INI contains one entry for each ODBC driver and other keys predefined for Windows ODBC configuration.

ODBC API ‑ macOS

A driver manager is required to manage communication between applications and the ODBC driver. We tested and support unixODBC
that is a complete ODBC driver manager for macOS (and Linux). Users can install it from the command line:

Brew

brew install unixodbc

Step 1: Download ODBC Driver

DuckDB releases the ODBC driver as asset. For macOS, download it from the ODBC macOS asset that contains the following artifacts:

libduckdb_odbc.dylib: the DuckDB ODBC driver compiled to macOS (with Intel and Apple Silicon support).

Step 2: Extracting ODBC Artifacts

Run unzip to extract the files to a permanent directory:

mkdir duckdb_odbc
unzip duckdb_odbc-osx-universal.zip -d duckdb_odbc

Step 3: Configure the ODBC Driver

There are two ways to configure the ODBC driver, either by initializing the configuration files listed below, or by connecting with
SQLDriverConnect. A combination of the two is also possible.

Furthermore, the ODBC driver supports all the configuration options included in DuckDB.

Note. If a configuration is set in both the connection string passed to SQLDriverConnect and in the odbc.ini file, the one
passed to SQLDriverConnect will take precedence.

298
DuckDB Documentation

The odbc.ini or .odbc.ini File The .odbc.ini contains the DSNs for the drivers, which can have specific knobs.

Example of .odbc.ini with DuckDB:

[DuckDB]
Driver = DuckDB Driver
Database=:memory:
access_mode=read_only
allow_unsigned_extensions=true

• [DuckDB]: between the brackets is a DSN for the DuckDB.


• Driver: Describes the driver's name, as well as where to find the configurations in the .odbcinst.ini.
• Database: Describes the database name used by DuckDB, can also be a file path to a .db in the system.
• access_mode: The mode in which to connect to the database
• allow_unsigned_extensions: Allow the use of unsigned extensions

The .odbcinst.ini File The .odbcinst.ini contains general configurations for the ODBC installed drivers in the system. A driver
section starts with the driver name between brackets, and then it follows specific configuration knobs belonging to that driver.

Example of .odbcinst.ini with the DuckDB:

[ODBC]
Trace = yes
TraceFile = /tmp/odbctrace

[DuckDB Driver]
Driver = /User/ user /duckdb_odbc/libduckdb_odbc.dylib

• [ODBC]: it is the DM configuration section.


• Trace: it enables the ODBC trace file using the option yes.
• TraceFile: the absolute system file path for the ODBC trace file.
• [DuckDB Driver]: the section of the DuckDB installed driver.
• Driver: the absolute system file path of the DuckDB driver.

Step 4 (Optional): Test the ODBC Driver

After the configuration, for validate the installation, it is possible to use an odbc client. unixODBC use a command line tool called isql.

Use the DSN defined in odbc.ini as a parameter of isql.

isql DuckDB

+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| echo [string] |
| quit |
| |
+---------------------------------------+

SQL> SELECT 42;

+------------+
| 42 |
+------------+
| 42 |
+------------+

299
DuckDB Documentation

SQLRowCount returns -1
1 rows fetched

300
Configuration

Configuration

DuckDB has a number of configuration options that can be used to change the behavior of the system.

The configuration options can be set using either the SET statement or the PRAGMA statement. They can be reset to their original values
using the RESET statement. The values of configuration options can be queried via the current_setting() scalar function or using
the duckdb_settings() table function.

Examples

-- set the memory limit of the system to 10GB


SET memory_limit = '10GB';
-- configure the system to use 1 thread
SET threads TO 1;
-- enable printing of a progress bar during long-running queries
SET enable_progress_bar = true;
-- set the default null order to NULLS LAST
SET default_null_order = 'nulls_last';

-- return the current value of a specific setting


SELECT current_setting('threads') AS threads;

┌─────────┐
│ threads │
│ int64 │
├─────────┤
│ 10 │
└─────────┘

-- query a specific setting


SELECT *
FROM duckdb_settings()
WHERE name = 'threads';

┌─────────┬─────────┬─────────────────────────────────────────────────┬────────────┐
│ name │ value │ description │ input_type │
│ varchar │ varchar │ varchar │ varchar │
├─────────┼─────────┼─────────────────────────────────────────────────┼────────────┤
│ threads │ 10 │ The number of total threads used by the system. │ BIGINT │
└─────────┴─────────┴─────────────────────────────────────────────────┴────────────┘

-- show a list of all available settings


SELECT *
FROM duckdb_settings();

-- reset the memory limit of the system back to the default


RESET memory_limit;

Secrets Manager

DuckDB has a Secrets manager, which provides a unified user interface for secrets across all backends (e.g., AWS S3) that use them.

301
DuckDB Documentation

Configuration Reference

Below is a list of all available settings.

Name Description Input type Default value

Calendar The current calendar VARCHAR System (locale)


calendar
TimeZone The current time zone VARCHAR System (locale)
timezone
access_mode Access mode of the database (AUTOMATIC, READ_ONLY or VARCHAR automatic
READ_WRITE)
allocator_flush_ Peak allocation threshold at which to flush the allocator VARCHAR 128.0 MiB
threshold after completing a task.
allow_persistent_ Allow the creation of persistent secrets, that are stored and BOOLEAN 1
secrets loaded on restarts
allow_unsigned_ Allow to load extensions with invalid or missing signatures BOOLEAN false
extensions
arrow_large_buffer_ If arrow buffers for strings, blobs, uuids and bits should be BOOLEAN false
size exported using large buffers
autoinstall_ Overrides the custom endpoint for extension installation on VARCHAR
extension_repository autoloading
autoinstall_known_ Whether known extensions are allowed to be automatically BOOLEAN true
extensions installed when a query depends on them
autoload_known_ Whether known extensions are allowed to be automatically BOOLEAN true
extensions loaded when a query depends on them
binary_as_string In Parquet files, interpret binary data as a string. BOOLEAN
ca_cert_file Path to a custom certificate file for self‑signed certificates. VARCHAR
By default not set.
checkpoint_threshold, The WAL size threshold at which to automatically trigger a VARCHAR 16.0 MiB
wal_autocheckpoint checkpoint (e.g., 1GB)
custom_extension_ Overrides the custom endpoint for remote extension VARCHAR
repository installation
custom_user_agent Metadata from DuckDB callers VARCHAR
default_collation The collation setting used when none is specified VARCHAR
default_null_order, Null ordering used when none is specified (NULLS_FIRST or VARCHAR NULLS_LAST
null_order NULLS_LAST)
default_order The order type used when none is specified (ASC or DESC) VARCHAR ASC
default_secret_ Allows switching the default storage for secrets VARCHAR local_file
storage
disabled_filesystems Disable specific file systems preventing access (e.g., VARCHAR
LocalFileSystem)
duckdb_api DuckDB API surface VARCHAR cli
enable_external_ Allow the database to access external state (through e.g., BOOLEAN true
access loading/installing modules, COPY TO/FROM, CSV readers,
pandas replacement scans, etc)

302
DuckDB Documentation

Name Description Input type Default value

enable_fsst_vectors Allow scans on FSST compressed segments to emit BOOLEAN false


compressed vectors to utilize late decompression
enable_http_ Whether or not the global http metadata is used to cache BOOLEAN false
metadata_cache HTTP metadata
enable_object_cache Whether or not object cache is used to cache e.g., Parquet BOOLEAN false
metadata
enable_profiling Enables profiling, and sets the output format (JSON, VARCHAR NULL
QUERY_TREE, QUERY_TREE_OPTIMIZER)
enable_progress_bar_ Controls the printing of the progress bar, when 'enable_ BOOLEAN true
print progress_bar' is true
enable_progress_bar Enables the progress bar, printing progress to the terminal BOOLEAN false
for long queries
enable_server_cert_ Enable server side certificate verification, defaults to False. BOOLEAN 0
verification
errors_as_json Output error messages as structured JSON instead of as a BOOLEAN false
raw string
explain_output Output of EXPLAIN statements (ALL, OPTIMIZED_ONLY, VARCHAR physical_only
PHYSICAL_ONLY)
extension_directory Set the directory to store extensions in VARCHAR
external_threads The number of external threads that work on DuckDB tasks. BIGINT 1
file_search_path A comma separated list of directories to search for input files VARCHAR
force_download Forces upfront download of file BOOLEAN 0
home_directory Sets the home directory used by the system VARCHAR
http_keep_alive Keep alive connections. Setting this to false can help when BOOLEAN 1
running into connection failures
http_retries HTTP retries on I/O error (default 3) UBIGINT 3
http_retry_backoff Backoff factor for exponentially increasing retry wait time FLOAT 4
(default 4)
http_retry_wait_ms Time between retries (default 100ms) UBIGINT 100
http_timeout HTTP timeout read/write/connection/retry (default UBIGINT 30000
30000ms)
immediate_ Whether transactions should be started lazily when needed, BOOLEAN false
transaction_mode or immediately when BEGIN TRANSACTION is called
integer_division Whether or not the / operator defaults to integer division, or BOOLEAN 0
to floating point division
lock_configuration Whether or not the configuration can be altered BOOLEAN false
log_query_path Specifies the path to which queries should be logged VARCHAR NULL
(default: empty string, queries are not logged)
max_expression_depth The maximum expression depth limit in the parser. UBIGINT 1000
WARNING: increasing this setting and using very deep
expressions might lead to stack overflow errors.
max_memory, memory_ The maximum memory of the system (e.g., 1GB) VARCHAR 80% of RAM
limit
old_implicit_casting Allow implicit casting to/from VARCHAR BOOLEAN false

303
DuckDB Documentation

Name Description Input type Default value

ordered_aggregate_ The number of rows to accumulate before sorting, used for UBIGINT 262144
threshold tuning
password The password to use. Ignored for legacy compatibility. VARCHAR NULL
perfect_ht_threshold Threshold in bytes for when to use a perfect hash table BIGINT 12
(default: 12)
pivot_filter_ The threshold to switch from using filtered aggregates to BIGINT 10
threshold LIST with a dedicated pivot operator
pivot_limit The maximum number of pivot columns in a pivot BIGINT 100000
statement (default: 100000)
prefer_range_joins Force use of range joins with mixed predicates BOOLEAN false
preserve_identifier_ Whether or not to preserve the identifier case, instead of BOOLEAN true
case always lowercasing all non‑quoted identifiers
preserve_insertion_ Whether or not to preserve insertion order. If set to false the BOOLEAN true
order system is allowed to re‑order any results that do not contain
ORDER BY clauses.
profile_output, The file to which profile output should be saved, or empty VARCHAR
profiling_output to print to the terminal
profiling_mode The profiling mode (STANDARD or DETAILED) VARCHAR NULL
progress_bar_time Sets the time (in milliseconds) how long a query needs to BIGINT 2000
take before we start printing a progress bar
s3_access_key_id S3 Access Key ID VARCHAR
s3_endpoint S3 Endpoint (empty for default endpoint) VARCHAR
s3_region S3 Region (default us‑east‑1) VARCHAR us-east-1
s3_secret_access_key S3 Access Key VARCHAR
s3_session_token S3 Session Token VARCHAR
s3_uploader_max_ S3 Uploader max filesize (between 50GB and 5TB, default VARCHAR 800GB
filesize 800GB)
s3_uploader_max_ S3 Uploader max parts per file (between 1 and 10000, UBIGINT 10000
parts_per_file default 10000)
s3_uploader_thread_ S3 Uploader global thread limit (default 50) UBIGINT 50
limit
s3_url_ Disable Globs and Query Parameters on S3 URLs BOOLEAN 0
compatibility_mode
s3_url_style S3 URL style ('vhost' (default) or 'path') VARCHAR vhost
s3_use_ssl S3 use SSL (default true) BOOLEAN 1
schema Sets the default search schema. Equivalent to setting VARCHAR main
search_path to a single value.
search_path Sets the default catalog search path as a comma‑separated VARCHAR
list of values
secret_directory Set the directory to which persistent secrets are stored VARCHAR ~/.duckdb/stored_
secrets
temp_directory Set the directory to which to write temp files VARCHAR

304
DuckDB Documentation

Name Description Input type Default value

threads, worker_ The number of total threads used by the system. BIGINT # Cores
threads
username, user The username to use. Ignored for legacy compatibility. VARCHAR NULL

Pragmas

The PRAGMA statement is an SQL extension adopted by DuckDB from SQLite. PRAGMA statements can be issued in a similar manner to reg‑
ular SQL statements. PRAGMA commands may alter the internal state of the database engine, and can influence the subsequent execution
or behavior of the engine.

PRAGMA statements that assign a value to an option can also be issued using the SET statement and the value of an option can be retrieved
using SELECT current_setting(option_name).

For DuckDB's built in configuration options, see the Configuration Reference.

List of Supported PRAGMA Statements

Below is a list of supported PRAGMA statements.

Schema Information List all databases:

PRAGMA database_list;

List all tables:

PRAGMA show_tables;

List all tables, with extra information, similarly to DESCRIBE:

PRAGMA show_tables_expanded;

To list all functions:

PRAGMA functions;

Table Information Get info for a specific table:

PRAGMA table_info('table_name');
CALL pragma_table_info('table_name');

table_info returns information about the columns of the table with name table_name. The exact format of the table returned is given
below:

cid INTEGER, -- cid of the column


name VARCHAR, -- name of the column
type VARCHAR, -- type of the column
notnull BOOLEAN, -- if the column is marked as NOT NULL
dflt_value VARCHAR, -- default value of the column, or NULL if not specified
pk BOOLEAN -- part of the primary key or not

To also show table structure, but in a slightly different format (included for compatibility):

PRAGMA show('table_name');

305
DuckDB Documentation

Memory Limit Set the memory limit for the buffer manager:

SET memory_limit = '1GB';


SET max_memory = '1GB';

Note. Warning The specified memory limit is only applied to the buffer manager. For most queries, the buffer manager handles
the majority of the data processed. However, certain in‑memory data structures such as vectors and query results are allocated
outside of the buffer manager. Additionally, aggregate functions with complex state (e.g., list, mode, quantile, string_agg,
and approx functions) use memory outside of the buffer manager. Therefore, the actual memory consumption can be higher than
the specified memory limit.

Threads Set the amount of threads for parallel query execution:

SET threads = 4;

Database Size Get the file and memory size of each database:

SET database_size;
CALL pragma_database_size();

database_size returns information about the file and memory size of each database. The column types of the returned results are
given below:

database_name VARCHAR, -- database name


database_size VARCHAR, -- total block count times the block size
block_size BIGINT, -- database block size
total_blocks BIGINT, -- total blocks in the database
used_blocks BIGINT, -- used blocks in the database
free_blocks BIGINT, -- free blocks in the database
wal_size VARCHAR, -- write ahead log size
memory_usage VARCHAR, -- memory used by the database buffer manager
memory_limit VARCHAR -- maximum memory allowed for the database

Collations List all available collations:

PRAGMA collations;

Set the default collation to one of the available ones:

SET default_collation = 'nocase';

Implicit Casting to VARCHAR Prior to version 0.10.0, DuckDB would automatically allow any type to be implicitly cast to VARCHAR
during function binding. As a result it was possible to e.g., compute the substring of an integer without using an implicit cast. For version
v0.10.0 and later an explicit cast is needed instead. To revert to the old behaviour that performs implicit casting, set the old_implicit_
casting variable to true.

SET old_implicit_casting = true;

Default Ordering for NULLs Set the default ordering for NULLs to be either NULLS FIRST or NULLS LAST:

SET default_null_order = 'NULLS FIRST';


SET default_null_order = 'NULLS LAST';

Set the default result set ordering direction to ASCENDING or DESCENDING:

SET default_order = 'ASCENDING';


SET default_order = 'DESCENDING';

306
DuckDB Documentation

Version Show DuckDB version:

PRAGMA version;
CALL pragma_version();

Platform platform returns an identifier for the platform the current DuckDB executable has been compiled for, e.g., osx_arm64. The
format of this identifier matches the platform name as described on the extension loading explainer.

PRAGMA platform;
CALL pragma_platform();

Progress Bar Show progress bar when running queries:

PRAGMA enable_progress_bar;

Don't show a progress bar for running queries:

PRAGMA disable_progress_bar;

Profiling

Enable Profiling To enable profiling:

PRAGMA enable_profiling;
PRAGMA enable_profile;

Profiling Format The format of the resulting profiling information can be specified as either json, query_tree, or query_tree_
optimizer. The default format is query_tree, which prints the physical operator tree together with the timings and cardinalities of
each operator in the tree to the screen.

To return the logical query plan as JSON:

SET enable_profiling = 'json';

To return the logical query plan:

SET enable_profiling = 'query_tree';

To return the physical query plan:

SET enable_profiling = 'query_tree_optimizer';

Disable Profiling To disable profiling:

PRAGMA disable_profiling;
PRAGMA disable_profile;

Profiling Output By default, profiling information is printed to the console. However, if you prefer to write the profiling information to a
file the PRAGMA profiling_output can be used to write to a specified file. Note that the file contents will be overwritten for every
new query that is issued, hence the file will only contain the profiling information of the last query that is run.

SET profiling_output = '/path/to/file.json';


SET profile_output = '/path/to/file.json';

307
DuckDB Documentation

Profiling Mode By default, a limited amount of profiling information is provided (standard). For more details, use the detailed profiling
mode by setting profiling_mode to detailed. The output of this mode shows how long it takes to apply certain optimizers on the
query tree and how long physical planning takes.

SET profiling_mode = 'detailed';

Optimizer To disable the query optimizer:

PRAGMA disable_optimizer;

To enable the query optimizer:

PRAGMA enable_optimizer;

Logging Set a path for query logging:

SET log_query_path = '/tmp/duckdb_log/';

Disable query logging:

SET log_query_path = '';

Explain Plan Output The output of EXPLAIN output can be configured to show only the physical plan. This is the default configura‑
tion.

SET explain_output = 'physical_only';

To only show the optimized query plan:

SET explain_output = 'optimized_only';

To show all query plans:

SET explain_output = 'all';

Full‑Text Search Indexes The create_fts_index and drop_fts_index options are only available when the fts extension is
loaded. Their usage is documented on the Full‑Text Search extension page.

Verification of External Operators Enable verification of external operators:

PRAGMA verify_external;

Disable verification of external operators:

PRAGMA disable_verify_external;

Verification of Round‑Trip Capabilities Enable verification of round‑trip capabilities for supported logical plans:

PRAGMA verify_serializer;

Disable verification of round‑trip capabilities:

PRAGMA disable_verify_serializer;

Object Cache Enable caching of objects for e.g., Parquet metadata:

PRAGMA enable_object_cache;

Disable caching of objects:

PRAGMA disable_object_cache;

308
DuckDB Documentation

Checkpoint

Force Checkpoint When CHECKPOINT is called when no changes are made, force a checkpoint regardless.

PRAGMA force_checkpoint;

Checkpoint on Shutdown Run a CHECKPOINT on successful shutdown and delete the WAL, to leave only a single database file behind:

PRAGMA enable_checkpoint_on_shutdown;

Don't run a CHECKPOINT on shutdown:

PRAGMA disable_checkpoint_on_shutdown;

Progress Bar Enable printing of the progress bar (if it's possible):

PRAGMA enable_print_progress_bar;

Disable printing of the progress bar:

PRAGMA disable_print_progress_bar;

Temp Directory for Spilling Data to Disk By default, DuckDB uses a temporary directory named database_file_name .tmp to
spill to disk, located in the same directory as the database file. To change this, use:

SET temp_directory = '/path/to/temp_dir.tmp/'

Storage Information To get storage information:

PRAGMA storage_info('table_name');
CALL pragma_storage_info('table_name');

This call returns the following information for the given table:

Name Type Description

row_group_id BIGINT
column_name VARCHAR
column_id BIGINT
column_path VARCHAR
segment_id BIGINT
segment_type VARCHAR
start BIGINT The start row id of this chunk
count BIGINT The amount of entries in this storage chunk
compression VARCHAR Compression type used for this column ‑ see blog post
stats VARCHAR
has_updates BOOLEAN
persistent BOOLEAN false if temporary table
block_id BIGINT empty unless persistent
block_offset BIGINT empty unless persistent

See Storage for more information.

309
DuckDB Documentation

Show Databases The following statement is equivalent to the SHOW DATABASES statement:

PRAGMA show_databases;

User Agent The following statement returns the user agent information, e.g., duckdb/v0.10.0(osx_arm64).

PRAGMA user_agent;

Metadata Information The following statement returns information on the metadata store (block_id, total_blocks, free_
blocks, and free_list).

PRAGMA metadata_info;

Selectively Disabling Optimizers The disabled_optimizers option allows selectively disabling optimization steps. For example,
to disable filter_pushdown and statistics_propagation, run:

SET disabled_optimizers = 'filter_pushdown,statistics_propagation';

The available optimizations can be queried using the duckdb_optimizers() table function.

Note. Warning The disabled_optimizers option should only be used for debugging performance issues and should be
avoided in production.

Returning Errors as JSON The errors_as_json option can be set to obtain error information in raw JSON format. For certain errors,
extra information or decomposed information is provided for easier machine processing. For example:

SET errors_as_json = true;

Then, running a query that results in an error produces a JSON output:

SELECT * FROM nonexistent_tbl;

{
"exception_type":"Catalog",
"exception_message":"Table with name nonexistent_tbl does not exist!\nDid you mean
\"temp.information_schema.tables\"?",
"name":"nonexistent_tbl",
"candidates":"temp.information_schema.tables",
"position":"14",
"type":"Table",
"error_subtype":"MISSING_ENTRY"
}

Query Verification (for Development) The following PRAGMAs are mostly used for development and internal testing.

Enable query verification:

PRAGMA enable_verification;

Disable query verification:

PRAGMA disable_verification;

Enable force parallel query processing:

PRAGMA verify_parallelism;

Disable force parallel query processing:

PRAGMA disable_verify_parallelism;

310
DuckDB Documentation

Secrets Manager

The Secrets manager provides a unified user interface for secrets across all backends that use them. Secrets can be scoped, so different
storage prefixes can have different secrets, allowing for example to join data across organizations in a single query. Secrets can also be
persisted, so that they do not need to be specified every time DuckDB is launched.

Note. Secrets were introduced with DuckDB version 0.10.

Note. Warning Persistent secrets are stored in unencrypted binary format on the disk.

Secrets

Types of Secrets Secrets are typed, their type identifies which service they are for. Currently, the following cloud services are available:

• AWS S3 (S3), through the httpfs extension


• Google Cloud Storage (GCS), through the httpfs extension
• Cloudflare R2 (R2), through the httpfs extension
• Azure Blob Storage (AZURE), through the azure extension

For each type, there are one or more ”secret providers” that specify how the secret is created. Secrets can also have an optional scope,
which is a file path prefix that the secret applies to. When fetching a secret for a path, the secret scopes are compared to the path, returning
the matching secret for the path. In the case of multiple matching secrets, the longest prefix is chosen.

Creating a Secret Secrets can be created using the CREATE SECRET SQL statement. Secrets can be temporary or persistent. Tem‑
porary secrets are used by default – and are stored in‑memory for the life span of the DuckDB instance similar to how settings worked
previously. Persistent secrets are stored in unencrypted binary format in the ~/.duckdb/stored_secrets directory. On startup of
DuckDB, persistent secrets are read from this directory and automatically loaded.

Secret Providers To create a secret, a Secret Provider needs to be used. A Secret Provider is a mechanism through which a secret
is generated. To illustrate this, for the S3, GCS, R2, and AZURE secret types, DuckDB currently supports two providers: CONFIG and
CREDENTIAL_CHAIN. The CONFIG provider requires the user to pass all configuration information into the CREATE SECRET, whereas
the CREDENTIAL_CHAIN provider will automatically try to fetch credentials. When no Secret Provider is specified, the CONFIG provider
is used. For more details on how to create secrets using different providers checkout the respective pages on httpfs and azure

Temporary Secrets To create a temporary unscoped secret to access S3, we can now use the following:

CREATE SECRET (
TYPE S3,
KEY_ID 'mykey',
SECRET 'mysecret',
REGION 'myregion'
);

Note that we implicitly use the default CONFIG secret provider here.

Persistent Secrets In order to persist secrets between DuckDB database instances, we can now use the CREATE PERSISTENT SECRET
command, e.g.:

CREATE PERSISTENT SECRET my_persistent_secret (


TYPE S3,
KEY_ID 'key',
SECRET 'secret'
);

This will write the secret (unencrypted) to the ~/.duckdb/stored_secrets directory.

311
DuckDB Documentation

Deleting Secrets Secrets can be deleted using the DROP SECRET statement, e.g.:

DROP PERSISTENT SECRET my_persistent_secret;

Creating Multiple Secrets for the Same Service Type If two secrets exist for a service type, the scope can be used to decide which one
should be used. For example:

CREATE SECRET secret1 (


TYPE S3,
KEY_ID 'my_key1',
SECRET 'my_secret1',
SCOPE 's3://my-bucket'
);

CREATE SECRET secret2 (


TYPE S3,
KEY_ID 'my_key2',
SECRET 'my_secret2',
SCOPE 's3://my-other-bucket'
);

Now, if the user queries something from s3://my-other-bucket/something, secret secret2 will be chosen automatically for
that request. To see which secret is being used, the which_secret scalar function can be used, which takes a path and a secret type as
parameters:

SELECT which_secret('s3://my-other-bucket/file.parquet', 'S3');

Listing Secrets Secrets can be listed using the built‑in table‑producing function, e.g., by using the duckdb_secrets() table func‑
tion:

FROM duckdb_secrets();

Sensitive information will be redacted.

312
SQL

SQL Introduction

Here we provide an overview of how to perform simple operations in SQL. This tutorial is only intended to give you an introduction and is
in no way a complete tutorial on SQL. This tutorial is adapted from the PostgreSQL tutorial.

In the examples that follow, we assume that you have installed the DuckDB Command Line Interface (CLI) shell. See the installation page
for information on how to install the CLI.

Concepts

DuckDB is a relational database management system (RDBMS). That means it is a system for managing data stored in relations. A relation
is essentially a mathematical term for a table.

Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific
data type. Tables themselves are stored inside schemas, and a collection of schemas constitutes the entire database that you can access.

Creating a New Table

You can create a new table by specifying the table name, along with all column names and their types:

CREATE TABLE weather (


city VARCHAR,
temp_lo INTEGER, -- minimum temperature on a day
temp_hi INTEGER, -- maximum temperature on a day
prcp REAL,
date DATE
);

You can enter this into the shell with the line breaks. The command is not terminated until the semicolon.

White space (i.e., spaces, tabs, and newlines) can be used freely in SQL commands. That means you can type the command aligned differ‑
ently than above, or even all on one line. Two dash characters (--) introduce comments. Whatever follows them is ignored up to the end
of the line. SQL is case insensitive about key words and identifiers.

In the SQL command, we first specify the type of command that we want to perform: CREATE TABLE. After that follows the parameters
for the command. First, the table name, weather, is given. Then the column names and column types follow.

city VARCHAR specifies that the table has a column called city that is of type VARCHAR. VARCHAR specifies a data type that can store
text of arbitrary length. The temperature fields are stored in an INTEGER type, a type that stores integer numbers (i.e., whole numbers
without a decimal point). REAL columns store single precision floating‑point numbers (i.e., numbers with a decimal point). DATE stores a
date (i.e., year, month, day combination). DATE only stores the specific day, not a time associated with that day.

DuckDB supports the standard SQL types INTEGER, SMALLINT, REAL, DOUBLE, DECIMAL, CHAR(n), VARCHAR(n), DATE, TIME and
TIMESTAMP.

The second example will store cities and their associated geographical location:

CREATE TABLE cities (


name VARCHAR,
lat DECIMAL,

313
DuckDB Documentation

lon DECIMAL
);

Finally, it should be mentioned that if you don't need a table any longer or want to recreate it differently you can remove it using the
following command:

DROP TABLE [tablename];

Populating a Table with Rows

The insert statement is used to populate a table with rows:

INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');

Constants that are not numeric values (e.g., text and dates) must be surrounded by single quotes (''), as in the example. Input dates for
the date type must be formatted as 'YYYY-MM-DD'.

We can insert into the cities table in the same manner.

INSERT INTO cities


VALUES ('San Francisco', -194.0, 53.0);

The syntax used so far requires you to remember the order of the columns. An alternative syntax allows you to list the columns explicitly:

INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)


VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');

You can list the columns in a different order if you wish or even omit some columns, e.g., if the prcp is unknown:

INSERT INTO weather (date, city, temp_hi, temp_lo)


VALUES ('1994-11-29', 'Hayward', 54, 37);

Many developers consider explicitly listing the columns better style than relying on the order implicitly.

Please enter all the commands shown above so you have some data to work with in the following sections.

You could also have used COPY to load large amounts of data from CSV files. This is usually faster because the COPY command is optimized
for this application while allowing less flexibility than INSERT. An example with weather.csv would be:

COPY weather
FROM 'weather.csv';

Where the file name for the source file must be available on the machine running the process. There are many other ways of loading data
into DuckDB, see the corresponding documentation section for more information.

Querying a Table

To retrieve data from a table, the table is queried. A SQL SELECT statement is used to do this. The statement is divided into a select list
(the part that lists the columns to be returned), a table list (the part that lists the tables from which to retrieve the data), and an optional
qualification (the part that specifies any restrictions). For example, to retrieve all the rows of table weather, type:

SELECT *
FROM weather;

Here * is a shorthand for ”all columns”. So the same result would be had with:

SELECT city, temp_lo, temp_hi, prcp, date


FROM weather;

The output should be:

314
DuckDB Documentation

┌───────────────┬─────────┬─────────┬───────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
│ varchar │ int32 │ int32 │ float │ date │
├───────────────┼─────────┼─────────┼───────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 43 │ 57 │ 0.0 │ 1994-11-29 │
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │
└───────────────┴─────────┴─────────┴───────┴────────────┘

You can write expressions, not just simple column references, in the select list. For example, you can do:

SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date


FROM weather;

This should give:

┌───────────────┬──────────┬────────────┐
│ city │ temp_avg │ date │
│ varchar │ double │ date │
├───────────────┼──────────┼────────────┤
│ San Francisco │ 48.0 │ 1994-11-27 │
│ San Francisco │ 50.0 │ 1994-11-29 │
│ Hayward │ 45.5 │ 1994-11-29 │
└───────────────┴──────────┴────────────┘

Notice how the AS clause is used to relabel the output column. (The AS clause is optional.)

A query can be ”qualified” by adding a WHERE clause that specifies which rows are wanted. The WHERE clause contains a Boolean (truth
value) expression, and only rows for which the Boolean expression is true are returned. The usual Boolean operators (AND, OR, and NOT)
are allowed in the qualification. For example, the following retrieves the weather of San Francisco on rainy days:

SELECT *
FROM weather
WHERE city = 'San Francisco' AND prcp > 0.0;

Result:

┌───────────────┬─────────┬─────────┬───────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
│ varchar │ int32 │ int32 │ float │ date │
├───────────────┼─────────┼─────────┼───────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
└───────────────┴─────────┴─────────┴───────┴────────────┘

You can request that the results of a query be returned in sorted order:

SELECT *
FROM weather
ORDER BY city;

┌───────────────┬─────────┬─────────┬───────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
│ varchar │ int32 │ int32 │ float │ date │
├───────────────┼─────────┼─────────┼───────┼────────────┤
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 43 │ 57 │ 0.0 │ 1994-11-29 │
└───────────────┴─────────┴─────────┴───────┴────────────┘

In this example, the sort order isn't fully specified, and so you might get the San Francisco rows in either order. But you'd always get the
results shown above if you do:

315
DuckDB Documentation

SELECT *
FROM weather
ORDER BY city, temp_lo;

You can request that duplicate rows be removed from the result of a query:

SELECT DISTINCT city


FROM weather;

┌───────────────┐
│ city │
│ varchar │
├───────────────┤
│ Hayward │
│ San Francisco │
└───────────────┘

Here again, the result row ordering might vary. You can ensure consistent results by using DISTINCT and ORDER BY together:

SELECT DISTINCT city


FROM weather
ORDER BY city;

Joins between Tables

Thus far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or access the same table in such a
way that multiple rows of the table are being processed at the same time. A query that accesses multiple rows of the same or different tables
at one time is called a join query. As an example, say you wish to list all the weather records together with the location of the associated
city. To do that, we need to compare the city column of each row of the weather table with the name column of all rows in the cities
table, and select the pairs of rows where these values match.

This would be accomplished by the following query:

SELECT *
FROM weather, cities
WHERE city = name;

┌───────────────┬─────────┬─────────┬───────┬────────────┬───────────────┬───────────────┬───────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ name │ lat │ lon │
│ varchar │ int32 │ int32 │ float │ date │ varchar │ decimal(18,3) │ decimal(18,3) │
├───────────────┼─────────┼─────────┼───────┼────────────┼───────────────┼───────────────┼───────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ San Francisco │ -194.000 │ 53.000 │
│ San Francisco │ 43 │ 57 │ 0.0 │ 1994-11-29 │ San Francisco │ -194.000 │ 53.000 │
└───────────────┴─────────┴─────────┴───────┴────────────┴───────────────┴───────────────┴───────────────┘

Observe two things about the result set:

• There is no result row for the city of Hayward. This is because there is no matching entry in the cities table for Hayward, so the
join ignores the unmatched rows in the weather table. We will see shortly how this can be fixed.
• There are two columns containing the city name. This is correct because the lists of columns from the weather and cities tables
are concatenated. In practice this is undesirable, though, so you will probably want to list the output columns explicitly rather than
using *:

SELECT city, temp_lo, temp_hi, prcp, date, lon, lat


FROM weather, cities
WHERE city = name;

┌───────────────┬─────────┬─────────┬───────┬────────────┬───────────────┬───────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ lon │ lat │
│ varchar │ int32 │ int32 │ float │ date │ decimal(18,3) │ decimal(18,3) │
├───────────────┼─────────┼─────────┼───────┼────────────┼───────────────┼───────────────┤

316
DuckDB Documentation

│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ 53.000 │ -194.000 │


│ San Francisco │ 43 │ 57 │ 0.0 │ 1994-11-29 │ 53.000 │ -194.000 │
└───────────────┴─────────┴─────────┴───────┴────────────┴───────────────┴───────────────┘

Since the columns all had different names, the parser automatically found which table they belong to. If there were duplicate column
names in the two tables you'd need to qualify the column names to show which one you meant, as in:

SELECT weather.city, weather.temp_lo, weather.temp_hi,


weather.prcp, weather.date, cities.lon, cities.lat
FROM weather, cities
WHERE cities.name = weather.city;

It is widely considered good style to qualify all column names in a join query, so that the query won't fail if a duplicate column name is later
added to one of the tables.

Join queries of the kind seen thus far can also be written in this alternative form:

SELECT *
FROM weather
INNER JOIN cities ON weather.city = cities.name;

This syntax is not as commonly used as the one above, but we show it here to help you understand the following topics.

Now we will figure out how we can get the Hayward records back in. What we want the query to do is to scan the weather table and for
each row to find the matching cities row(s). If no matching row is found we want some ”empty values” to be substituted for the cities
table's columns. This kind of query is called an outer join. (The joins we have seen so far are inner joins.) The command looks like this:

SELECT *
FROM weather
LEFT OUTER JOIN cities ON weather.city = cities.name;

┌───────────────┬─────────┬─────────┬───────┬────────────┬───────────────┬───────────────┬───────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │ name │ lat │ lon │
│ varchar │ int32 │ int32 │ float │ date │ varchar │ decimal(18,3) │ decimal(18,3) │
├───────────────┼─────────┼─────────┼───────┼────────────┼───────────────┼───────────────┼───────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │ San Francisco │ -194.000 │ 53.000 │
│ San Francisco │ 43 │ 57 │ 0.0 │ 1994-11-29 │ San Francisco │ -194.000 │ 53.000 │
│ Hayward │ 37 │ 54 │ │ 1994-11-29 │ │ │ │
└───────────────┴─────────┴─────────┴───────┴────────────┴───────────────┴───────────────┴───────────────┘

This query is called a left outer join because the table mentioned on the left of the join operator will have each of its rows in the output
at least once, whereas the table on the right will only have those rows output that match some row of the left table. When outputting a
left‑table row for which there is no right‑table match, empty (null) values are substituted for the right‑table columns.

Aggregate Functions

Like most other relational database products, DuckDB supports aggregate functions. An aggregate function computes a single result from
multiple input rows. For example, there are aggregates to compute the count, sum, avg (average), max (maximum) and min (minimum)
over a set of rows.

As an example, we can find the highest low‑temperature reading anywhere with:

SELECT max(temp_lo)
FROM weather;

┌──────────────┐
│ max(temp_lo) │
│ int32 │
├──────────────┤
│ 46 │
└──────────────┘

317
DuckDB Documentation

If we wanted to know what city (or cities) that reading occurred in, we might try:

SELECT city
FROM weather
WHERE temp_lo = max(temp_lo); -- WRONG

but this will not work since the aggregate max cannot be used in the WHERE clause. (This restriction exists because the WHERE clause
determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are
computed.) However, as is often the case the query can be restated to accomplish the desired result, here by using a subquery:

SELECT city
FROM weather
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);

┌───────────────┐
│ city │
│ varchar │
├───────────────┤
│ San Francisco │
└───────────────┘

This is OK because the subquery is an independent computation that computes its own aggregate separately from what is happening in
the outer query.

Aggregates are also very useful in combination with GROUP BY clauses. For example, we can get the maximum low temperature observed
in each city with:

SELECT city, max(temp_lo)


FROM weather
GROUP BY city;

┌───────────────┬──────────────┐
│ city │ max(temp_lo) │
│ varchar │ int32 │
├───────────────┼──────────────┤
│ San Francisco │ 46 │
│ Hayward │ 37 │
└───────────────┴──────────────┘

Which gives us one output row per city. Each aggregate result is computed over the table rows matching that city. We can filter these
grouped rows using HAVING:

SELECT city, max(temp_lo)


FROM weather
GROUP BY city
HAVING max(temp_lo) < 40;

┌─────────┬──────────────┐
│ city │ max(temp_lo) │
│ varchar │ int32 │
├─────────┼──────────────┤
│ Hayward │ 37 │
└─────────┴──────────────┘

which gives us the same results for only the cities that have all temp_lo values below 40. Finally, if we only care about cities whose names
begin with ”S”, we can use the LIKE operator:

SELECT city, max(temp_lo)


FROM weather
WHERE city LIKE 'S%' -- (1)
GROUP BY city
HAVING max(temp_lo) < 40;

318
DuckDB Documentation

More information about the LIKE operator can be found in the pattern matching page.

It is important to understand the interaction between aggregates and SQL's WHERE and HAVING clauses. The fundamental difference
between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows
go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed. Thus, the WHERE
clause must not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the
aggregates. On the other hand, the HAVING clause always contains aggregate functions.

In the previous example, we can apply the city name restriction in WHERE, since it needs no aggregate. This is more efficient than adding
the restriction to HAVING, because we avoid doing the grouping and aggregate calculations for all rows that fail the WHERE check.

Updates

You can update existing rows using the UPDATE command. Suppose you discover the temperature readings are all off by 2 degrees after
November 28. You can correct the data as follows:

UPDATE weather
SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
WHERE date > '1994-11-28';

Look at the new state of the data:

SELECT *
FROM weather;

┌───────────────┬─────────┬─────────┬───────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
│ varchar │ int32 │ int32 │ float │ date │
├───────────────┼─────────┼─────────┼───────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 41 │ 55 │ 0.0 │ 1994-11-29 │
│ Hayward │ 35 │ 52 │ │ 1994-11-29 │
└───────────────┴─────────┴─────────┴───────┴────────────┘

Deletions

Rows can be removed from a table using the DELETE command. Suppose you are no longer interested in the weather of Hayward. Then
you can do the following to delete those rows from the table:

DELETE FROM weather


WHERE city = 'Hayward';

All weather records belonging to Hayward are removed.

SELECT *
FROM weather;

┌───────────────┬─────────┬─────────┬───────┬────────────┐
│ city │ temp_lo │ temp_hi │ prcp │ date │
│ varchar │ int32 │ int32 │ float │ date │
├───────────────┼─────────┼─────────┼───────┼────────────┤
│ San Francisco │ 46 │ 50 │ 0.25 │ 1994-11-27 │
│ San Francisco │ 41 │ 55 │ 0.0 │ 1994-11-29 │
└───────────────┴─────────┴─────────┴───────┴────────────┘

One should be wary of statements of the form

DELETE FROM tablename;

Without a qualification, DELETE will remove all rows from the given table, leaving it empty. The system will not request confirmation
before doing this!

319
DuckDB Documentation

Statements

Statements Overview

ALTER TABLE Statement

The ALTER TABLE statement changes the schema of an existing table in the catalog.

Examples

-- add a new column with name "k" to the table "integers", it will be filled with the default value NULL
ALTER TABLE integers ADD COLUMN k INTEGER;
-- add a new column with name "l" to the table integers, it will be filled with the default value 10
ALTER TABLE integers ADD COLUMN l INTEGER DEFAULT 10;

-- drop the column "k" from the table integers


ALTER TABLE integers DROP k;

-- change the type of the column "i" to the type "VARCHAR" using a standard cast
ALTER TABLE integers ALTER i TYPE VARCHAR;
-- change the type of the column "i" to the type "VARCHAR", using the specified expression to convert
the data for each row
ALTER TABLE integers ALTER i SET DATA TYPE VARCHAR USING concat(i, '_', j);

-- set the default value of a column


ALTER TABLE integers ALTER COLUMN i SET DEFAULT 10;
-- drop the default value of a column
ALTER TABLE integers ALTER COLUMN i DROP DEFAULT;

-- make a column not nullable


ALTER TABLE t ALTER COLUMN x SET NOT NULL;
-- drop the not null constraint
ALTER TABLE t ALTER COLUMN x DROP NOT NULL;

-- rename a table
ALTER TABLE integers RENAME TO integers_old;

-- rename a column of a table


ALTER TABLE integers RENAME i TO j;

Syntax

ALTER TABLE changes the schema of an existing table. All the changes made by ALTER TABLE fully respect the transactional semantics,
i.e., they will not be visible to other transactions until committed, and can be fully reverted through a rollback.

RENAME TABLE

-- rename a table
ALTER TABLE integers RENAME TO integers_old;

The RENAME TO clause renames an entire table, changing its name in the schema. Note that any views that rely on the table are not
automatically updated.

320
DuckDB Documentation

RENAME COLUMN

-- rename a column of a table


ALTER TABLE integers RENAME i TO j;
ALTER TABLE integers RENAME COLUMN j TO k;

The RENAME COLUMN clause renames a single column within a table. Any constraints that rely on this name (e.g., CHECK constraints) are
automatically updated. However, note that any views that rely on this column name are not automatically updated.

ADD COLUMN

-- add a new column with name "k" to the table "integers", it will be filled with the default value NULL
ALTER TABLE integers ADD COLUMN k INTEGER;
-- add a new column with name "l" to the table integers, it will be filled with the default value 10
ALTER TABLE integers ADD COLUMN l INTEGER DEFAULT 10;

The ADD COLUMN clause can be used to add a new column of a specified type to a table. The new column will be filled with the specified
default value, or NULL if none is specified.

DROP COLUMN

-- drop the column "k" from the table integers


ALTER TABLE integers DROP k;

The DROP COLUMN clause can be used to remove a column from a table. Note that columns can only be removed if they do not have any
indexes that rely on them. This includes any indexes created as part of a PRIMARY KEY or UNIQUE constraint. Columns that are part of
multi‑column check constraints cannot be dropped either.

ALTER TYPE

-- change the type of the column "i" to the type "VARCHAR" using a standard cast
ALTER TABLE integers ALTER i TYPE VARCHAR;
-- change the type of the column "i" to the type "VARCHAR", using the specified expression to convert
the data for each row
ALTER TABLE integers ALTER i SET DATA TYPE VARCHAR USING concat(i, '_', j);

The SET DATA TYPE clause changes the type of a column in a table. Any data present in the column is converted according to the
provided expression in the USING clause, or, if the USING clause is absent, cast to the new data type. Note that columns can only have
their type changed if they do not have any indexes that rely on them and are not part of any CHECK constraints.

SET / DROP DEFAULT

-- set the default value of a column


ALTER TABLE integers ALTER COLUMN i SET DEFAULT 10;
-- drop the default value of a column
ALTER TABLE integers ALTER COLUMN i DROP DEFAULT;

The SET/DROP DEFAULT clause modifies the DEFAULT value of an existing column. Note that this does not modify any existing data in
the column. Dropping the default is equivalent to setting the default value to NULL.

Note. Warning At the moment DuckDB will not allow you to alter a table if there are any dependencies. That means that if you
have an index on a column you will first need to drop the index, alter the table, and then recreate the index. Otherwise you will get a
”Dependency Error.”

321
DuckDB Documentation

ADD / DROP CONSTRAINT

Note. The ADD CONSTRAINT and DROP CONSTRAINT clauses are not yet supported in DuckDB.

ALTER VIEW Statement

The ALTER VIEW statement changes the schema of an existing view in the catalog.

Examples

-- rename a view
ALTER VIEW v1 RENAME TO v2;

ALTER VIEW changes the schema of an existing table. All the changes made by ALTER VIEW fully respect the transactional semantics,
i.e., they will not be visible to other transactions until committed, and can be fully reverted through a rollback. Note that other views that
rely on the table are not automatically updated.

ATTACH/DETACH Statement

The ATTACH statement adds a new database file to the catalog that can be read from and written to.

Examples

-- attach the database "file.db" with the alias inferred from the name ("file")
ATTACH 'file.db';
-- attach the database "file.db" with an explicit alias ("file_db")
ATTACH 'file.db' AS file_db;
-- attach the database "file.db" in read only mode
ATTACH 'file.db' (READ_ONLY);
-- attach a SQLite database for reading and writing (see the sqlite extension for more information)
ATTACH 'sqlite_file.db' AS sqlite_db (TYPE SQLITE);
-- attach the database "file.db" if inferred database alias "file_db" does not yet exist
ATTACH IF NOT EXISTS 'file.db';
-- attach the database "file.db" if explicit database alias "file_db" does not yet exist
ATTACH IF NOT EXISTS 'file.db' AS file_db;
-- create a table in the attached database with alias "file"
CREATE TABLE file.new_table (i INTEGER);
-- detach the database with alias "file"
DETACH file;
-- show a list of all attached databases
SHOW DATABASES;
-- change the default database that is used to the database "file"
USE file;

Attach

Attach Syntax ATTACH allows DuckDB to operate on multiple database files, and allows for transfer of data between different database
files.

Detach

The DETACH statement allows previously attached database files to be closed and detached, releasing any locks held on the database file.
It is not possible to detach from the default database: if you would like to do so, issue the USE statement to change the default database
to another one.

322
DuckDB Documentation

Note. Warning Closing the connection, e.g., invoking the close() function in Python, does not release the locks held on the
database files as the file handles are held by the main DuckDB instance (in Python's case, the duckdb module).

Detach Syntax

Name Qualification

The fully qualified name of catalog objects contains the catalog, the schema and the name of the object. For example:

-- attach the database "new_db"


ATTACH 'new_db.db';
-- create the schema "my_schema" in the database "new_db"
CREATE SCHEMA new_db.my_schema;
-- create the table "my_table" in the schema "my_schema"
CREATE TABLE new_db.my_schema.my_table (col INTEGER);
-- refer to the column "col" inside the table "my_table"
SELECT new_db.my_schema.my_table.col FROM new_db.my_schema.my_table;

Note that often the fully qualified name is not required. When a name is not fully qualified, the system looks for which entries to reference
using the catalog search path. The default catalog search path includes the system catalog, the temporary catalog and the initially attached
database together with the main schema.

Also note the rules on identifiers and database names in particular.

Default Database and Schema When a table is created without any qualifications, the table is created in the default schema of the default
database. The default database is the database that is launched when the system is created ‑ and the default schema is main.

-- create the table "my_table" in the default database


CREATE TABLE my_table (col INTEGER);

Changing the Default Database and Schema The default database and schema can be changed using the USE command.

-- set the default database schema to `new_db.main`


USE new_db;
-- set the default database schema to `new_db.my_schema`
USE new_db.my_schema;

Resolving Conflicts When providing only a single qualification, the system can interpret this as either a catalog or a schema, as long as
there are no conflicts. For example:

ATTACH 'new_db.db';
CREATE SCHEMA my_schema;
-- creates the table "new_db.main.tbl"
CREATE TABLE new_db.tbl (i INTEGER);
-- creates the table "default_db.my_schema.tbl"
CREATE TABLE my_schema.tbl (i INTEGER);

If we create a conflict (i.e., we have both a schema and a catalog with the same name) the system requests that a fully qualified path is used
instead:

CREATE SCHEMA new_db;


CREATE TABLE new_db.tbl (i INTEGER);
-- Error: Binder Error: Ambiguous reference to catalog or schema "new_db" -
-- use a fully qualified path like "memory.new_db"

323
DuckDB Documentation

Changing the Catalog Search Path The catalog search path can be adjusted by setting the search_path configuration option, which
uses a comma‑separated list of values that will be on the search path. The following example demonstrates searching in two databases:

ATTACH ':memory:' AS db1;


ATTACH ':memory:' AS db2;
CREATE table db1.tbl1 (i INTEGER);
CREATE table db2.tbl2 (j INTEGER);
-- reference the tables using their fully qualified name
SELECT * FROM db1.tbl1;
SELECT * FROM db2.tbl2;
-- or set the search path and reference the tables using their name
SET search_path = 'db1,db2';
SELECT * FROM tbl1;
SELECT * FROM tbl2;

Transactional Semantics

When running queries on multiple databases, the system opens separate transactions per database. The transactions are started lazily by
default ‑ when a given database is referenced for the first time in a query, a transaction for that database will be started. SET immediate_
transaction_mode = true can be toggled to change this behavior to eagerly start transactions in all attached databases instead.

While multiple transactions can be active at a time ‑ the system only supports writing to a single attached database in a single transaction.
If you try to write to multiple attached databases in a single transaction the following error will be thrown:

Attempting to write to database "db2" in a transaction that has already modified database "db1" -
a single transaction can only write to a single attached database.

The reason for this restriction is that the system does not maintain atomicity for transactions across attached databases. Transactions are
only atomic within each database file. By restricting the global transaction to write to only a single database file the atomicity guarantees
are maintained.

CALL Statement

The CALL statement invokes the given table function and returns the results.

Examples

-- Invoke the 'duckdb_functions' table function.


CALL duckdb_functions();
-- Invoke the 'pragma_table_info' table function.
CALL pragma_table_info('pg_am');

Syntax

CHECKPOINT Statement

The CHECKPOINT statement synchronizes data in the write‑ahead log (WAL) to the database data file. For in‑memory databases this
statement will succeed with no effect.

Examples

-- Synchronize data in the default database


CHECKPOINT;
-- Synchronize data in the specified database

324
DuckDB Documentation

CHECKPOINT file_db;
-- Abort any in-progress transactions to synchronize the data
FORCE CHECKPOINT;

Syntax

Checkpoint operations happen automatically based on the WAL size (see Configuration). This statement is for manual checkpoint ac‑
tions.

Behavior

The default CHECKPOINT command will fail if there are any running transactions. Including FORCE will abort any transactions and execute
the checkpoint operation.

Also see the related PRAGMA option for further behavior modification.

Reclaiming Space When performing a checkpoint (automatic or otherwise), the space occupied by deleted rows is partially reclaimed.
Note that this does not remove all deleted rows, but rather merges row groups that have a significant amount of deletes together. In the
current implementation this requires ~25% of rows to be deleted in adjacent row groups.

When running in in‑memory mode, checkpointing has no effect, hence it does not reclaim space after deletes in in‑memory databases.

Note. Warning The VACUUM statement does not trigger vacuuming deletes and hence does not reclaim space.

COMMENT ON Statement

The COMMENT ON statement allows adding metadata to catalog entries (tables, columns, etc.). It follows the PostgreSQL syntax.

Examples

COMMENT ON TABLE test_table IS 'very nice table';


COMMENT ON COLUMN test_table.test_table_column IS 'very nice column';
COMMENT ON VIEW test_view IS 'very nice view';
COMMENT ON INDEX test_index IS 'very nice index';
COMMENT ON SEQUENCE test_sequence IS 'very nice sequence';
COMMENT ON TYPE test_type IS 'very nice type';
COMMENT ON MACRO test_macro IS 'very nice macro';
COMMENT ON MACRO TABLE test_table_macro IS 'very nice table macro';
-- to unset a comment, set it to NULL, e.g.:
COMMENT ON TABLE test_table IS NULL;

Reading Comments

Comments can be read by querying the comment column of the respective metadata functions:

SELECT comment FROM duckdb_tables(); -- TABLE


SELECT comment FROM duckdb_columns(); -- COLUMN
SELECT comment FROM duckdb_views(); -- VIEW
SELECT comment FROM duckdb_indexes(); -- INDEX
SELECT comment FROM duckdb_sequences(); -- SEQUENCE
SELECT comment FROM duckdb_types(); -- TYPE
SELECT comment FROM duckdb_functions(); -- MACRO
SELECT comment FROM duckdb_functions(); -- MACRO TABLE

325
DuckDB Documentation

Limitations

The COMMENT ON statement currently has the following limitations:

• It is not possible to comment on schemas or databases.


• It is not possible to comment on things that have a dependency (e.g., a table with an index).

Syntax

COPY Statement

Examples

-- read a CSV file into the lineitem table, using auto-detected CSV options
COPY lineitem FROM 'lineitem.csv';
-- read a CSV file into the lineitem table, using manually specified CSV options
COPY lineitem FROM 'lineitem.csv' (DELIMITER '|');
-- read a Parquet file into the lineitem table
COPY lineitem FROM 'lineitem.pq' (FORMAT PARQUET);
-- read a JSON file into the lineitem table, using auto-detected options
COPY lineitem FROM 'lineitem.json' (FORMAT JSON, AUTO_DETECT true);
-- read a CSV file into the lineitem table, using double quotes
COPY lineitem FROM "lineitem.csv";
-- read a CSV file into the lineitem table, omitting quotes
COPY lineitem FROM lineitem.csv;

-- write a table to a CSV file


COPY lineitem TO 'lineitem.csv' (FORMAT CSV, DELIMITER '|', HEADER);
-- write a table to a CSV file, using double quotes
COPY lineitem TO "lineitem.csv";
-- write a table to a CSV file, omitting quotes
COPY lineitem TO lineitem.csv;
-- write the result of a query to a Parquet file
COPY (SELECT l_orderkey, l_partkey FROM lineitem) TO 'lineitem.parquet' (COMPRESSION ZSTD);

-- copy the entire content of database 'db1' to database 'db2'


COPY FROM DATABASE db1 TO db2;
-- copy only the schema (catalog elements) but not any data
COPY FROM DATABASE db1 TO db2 (SCHEMA);

Overview

COPY moves data between DuckDB and external files. COPY ... FROM imports data into DuckDB from an external file. COPY ... TO
writes data from DuckDB to an external file. The COPY command can be used for CSV, PARQUET and JSON files.

COPY ... FROM

COPY ... FROM imports data from an external file into an existing table. The data is appended to whatever data is in the table already.
The amount of columns inside the file must match the amount of columns in the table table_name, and the contents of the columns
must be convertible to the column types of the table. In case this is not possible, an error will be thrown.

If a list of columns is specified, COPY will only copy the data in the specified columns from the file. If there are any columns in the table that
are not in the column list, COPY ... FROM will insert the default values for those columns

326
DuckDB Documentation

-- Copy the contents of a comma-separated file 'test.csv' without a header into the table 'test'
COPY test FROM 'test.csv';
-- Copy the contents of a comma-separated file with a header into the 'category' table
COPY category FROM 'categories.csv' (HEADER);
-- Copy the contents of 'lineitem.tbl' into the 'lineitem' table, where the contents are delimited by a
pipe character ('|')
COPY lineitem FROM 'lineitem.tbl' (DELIMITER '|');
-- Copy the contents of 'lineitem.tbl' into the 'lineitem' table, where the delimiter, quote character,
and presence of a header are automatically detected
COPY lineitem FROM 'lineitem.tbl' (AUTO_DETECT true);
-- Read the contents of a comma-separated file 'names.csv' into the 'name' column of the 'category'
table. Any other columns of this table are filled with their default value.
COPY category(name) FROM 'names.csv';
-- Read the contents of a Parquet file 'lineitem.parquet' into the lineitem table
COPY lineitem FROM 'lineitem.parquet' (FORMAT PARQUET);
-- Read the contents of a newline-delimited JSON file 'lineitem.ndjson' into the lineitem table
COPY lineitem FROM 'lineitem.ndjson' (FORMAT JSON);
-- Read the contents of a JSON file 'lineitem.json' into the lineitem table
COPY lineitem FROM 'lineitem.json' (FORMAT JSON, ARRAY true);

Syntax

COPY ... TO

COPY ... TO exports data from DuckDB to an external CSV or Parquet file. It has mostly the same set of options as COPY ... FROM,
however, in the case of COPY ... TO the options specify how the file should be written to disk. Any file created by COPY ... TO can
be copied back into the database by using COPY ... FROM with a similar set of options.

The COPY ... TO function can be called specifying either a table name, or a query. When a table name is specified, the contents of the
entire table will be written into the resulting file. When a query is specified, the query is executed and the result of the query is written to
the resulting file.

-- Copy the contents of the 'lineitem' table to a CSV file with a header
COPY lineitem TO 'lineitem.csv';
-- Copy the contents of the 'lineitem' table to the file 'lineitem.tbl',
-- where the columns are delimited by a pipe character ('|'), including a header line.
COPY lineitem TO 'lineitem.tbl' (DELIMITER '|');
-- Use tab separators to create a TSV file without a header
COPY lineitem TO 'lineitem.tsv' (DELIMITER '\t', HEADER false);
-- Copy the l_orderkey column of the 'lineitem' table to the file 'orderkey.tbl'
COPY lineitem(l_orderkey) TO 'orderkey.tbl' (DELIMITER '|');
-- Copy the result of a query to the file 'query.csv', including a header with column names
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.csv' (DELIMITER ',');
-- Copy the result of a query to the Parquet file 'query.parquet'
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.parquet' (FORMAT PARQUET);
-- Copy the result of a query to the newline-delimited JSON file 'query.ndjson'
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.ndjson' (FORMAT JSON);
-- Copy the result of a query to the JSON file 'query.json'
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.json' (FORMAT JSON, ARRAY true);

COPY ... TO Options Zero or more copy options may be provided as a part of the copy operation. The WITH specifier is optional, but
if any options are specified, the parentheses are required. Parameter values can be passed in with or without wrapping in single quotes.

Any option that is a Boolean can be enabled or disabled in multiple ways. You can write true, ON, or 1 to enable the option, and false,
OFF, or 0 to disable it. The BOOLEAN value can also be omitted, e.g., by only passing (HEADER), in which case true is assumed.

The below options are applicable to all formats written with COPY.

327
DuckDB Documentation

Name Description Type Default

overwrite_or_ignore Whether or not to allow overwriting a directory if one BOOL false


already exists. Only has an effect when used with
partition_by.
file_size_bytes If this parameter is set, the COPY process creates a VARCHAR or (empty)
directory which will contain the exported files. If a file BIGINT
exceeds the set limit (specified as bytes such as 1000 or in
human‑readable format such as 1k), the process creates a
new file in the directory. This parameter works in
combination with per_thread_output. Note that the
size is used as an approximation, and files can be
occasionally slightly over the limit.
format Specifies the copy function to use. The default is selected VARCHAR auto
from the file extension (e.g., .parquet results in a
Parquet file being written/read). If the file extension is
unknown CSV is selected. Available options are CSV,
PARQUET and JSON.
partition_by The columns to partition by using a Hive partitioning VARCHAR[] (empty)
scheme, see the partitioned writes section.
per_thread_output Generate one file per thread, rather than one file in total. BOOL false
This allows for faster parallel writing.
use_tmp_file Whether or not to write to a temporary file first if the BOOL auto
original file exists (target.csv.tmp). This prevents
overwriting an existing file with a broken file in case the
writing is cancelled.

Syntax

COPY FROM DATABASE ... TO

The COPY FROM DATABASE ... TO statement copies the entire content from one attached database to another attached database.
This includes the schema, including constraints, indexes, sequences, macros, and the data itself.

ATTACH 'db1.db' AS db1;


CREATE TABLE db1.tbl AS SELECT 42 AS x, 3 AS y;
CREATE MACRO db1.two_x_plus_y(x, y) AS 2 * x + y;

ATTACH 'db2.db' AS db2;


COPY FROM DATABASE db1 TO db2;
SELECT db2.two_x_plus_y(x, y) AS z FROM db2.tbl;

┌───────┐
│ z │
│ int32 │
├───────┤
│ 87 │
└───────┘

To only copy the schema of db1 to db2 but omit copying the data, add SCHEMA to the statement:

COPY FROM DATABASE db1 TO db2 (SCHEMA);

328
DuckDB Documentation

Syntax

Format‑Specific Options

CSV Options The below options are applicable when writing CSV files.

Name Description Type Default

compression The compression type for the file. By default this will be VARCHAR auto
detected automatically from the file extension (e.g.,
file.csv.gz will use gzip, file.csv will use none).
Options are none, gzip, zstd.
force_quote The list of columns to always add quotes to, even if not VARCHAR[] []
required.
dateformat Specifies the date format to use when writing dates. See VARCHAR (empty)
Date Format
delim or sep The character that is written to separate columns within VARCHAR ,
each row.
escape The character that should appear before a character that VARCHAR "
matches the quote value.
header Whether or not to write a header for the CSV file. BOOL true
nullstr The string that is written to represent a NULL value. VARCHAR (empty)
quote The quoting character to be used when a data value is VARCHAR "
quoted.
timestampformat Specifies the date format to use when writing timestamps. VARCHAR (empty)
See Date Format

Parquet Options The below options are applicable when writing Parquet files.

Name Description Type Default

compression The compression format to use (uncompressed, VARCHAR snappy


snappy, gzip or zstd).
row_group_size The target size, i.e., number of rows, of each row group. BIGINT 122880
row_group_size_bytes The target size of each row group. You can pass either a BIGINT row_group_
human‑readable string, e.g., '2MB', or an integer, i.e., the size * 1024
number of bytes. This option is only used when you have
issued SET preserve_insertion_order =
false;, otherwise it is ignored.
field_ids The field_id for each column. Pass auto to attempt to STRUCT (empty)
infer automatically.

Some examples of FIELD_IDS are:

-- Assign field_ids automatically


COPY
(SELECT 128 AS i)
TO 'my.parquet'
(FIELD_IDS 'auto');

329
DuckDB Documentation

-- Sets the field_id of column 'i' to 42


COPY
(SELECT 128 AS i)
TO 'my.parquet'
(FIELD_IDS {i: 42});

-- Sets the field_id of column 'i' to 42, and column 'j' to 43


COPY
(SELECT 128 AS i, 256 AS j)
TO 'my.parquet'
(FIELD_IDS {i: 42, j: 43});

-- Sets the field_id of column 'my_struct' to 43,


-- and column 'i' (nested inside 'my_struct') to 43
COPY
(SELECT {i: 128} AS my_struct)
TO 'my.parquet'
(FIELD_IDS {my_struct: {__duckdb_field_id: 42, i: 43}});

-- Sets the field_id of column 'my_list' to 42,


-- and column 'element' (default name of list child) to 43
COPY
(SELECT [128, 256] AS my_list)
TO 'my.parquet'
(FIELD_IDS {my_list: {__duckdb_field_id: 42, element: 43}});

-- Sets the field_id of colum 'my_map' to 42,


-- and columns 'key' and 'value' (default names of map children) to 43 and 44
COPY
(SELECT MAP {'key1' : 128, 'key2': 256} my_map)
TO 'my.parquet'
(FIELD_IDS {my_map: {__duckdb_field_id: 42, key: 43, value: 44}});

JSON Options The below options are applicable when writing JSON files.

Name Description Type Default

compression The compression type for the file. By default this will be VARCHAR auto
detected automatically from the file extension (e.g.,
file.csv.gz will use gzip, file.csv will use none).
Options are none, gzip, zstd.
dateformat Specifies the date format to use when writing dates. See VARCHAR (empty)
Date Format
timestampformat Specifies the date format to use when writing timestamps. VARCHAR (empty)
See Date Format
array Whether to write a JSON array. If true, a JSON array of BOOL false
records is written, if false, newline‑delimited JSON is
written

CREATE MACRO Statement

The CREATE MACRO statement can create a scalar or table macro (function) in the catalog. A macro may only be a single SELECT statement
(similar to a VIEW), but it has the benefit of accepting parameters. For a scalar macro, CREATE MACRO is followed by the name of the
macro, and optionally parameters within a set of parentheses. The keyword AS is next, followed by the text of the macro. By design, a scalar
macro may only return a single value. For a table macro, the syntax is similar to a scalar macro except AS is replaced with AS TABLE. A
table macro may return a table of arbitrary size and shape.

330
DuckDB Documentation

Note. If a MACRO is temporary, it is only usable within the same database connection and is deleted when the connection is closed.

Examples

Scalar Macros

-- create a macro that adds two expressions (a and b)


CREATE MACRO add(a, b) AS a + b;
-- create a macro for a case expression
CREATE MACRO ifelse(a, b, c) AS CASE WHEN a THEN b ELSE c END;
-- create a macro that does a subquery
CREATE MACRO one() AS (SELECT 1);
-- create a macro with a common table expression
-- (parameter names get priority over column names: disambiguate using the table name)
CREATE MACRO plus_one(a) AS (WITH cte AS (SELECT 1 AS a) SELECT cte.a + a FROM cte);
-- macros are schema-dependent, and have an alias: FUNCTION
CREATE FUNCTION main.myavg(x) AS sum(x) / count(x);
-- create a macro with default constant parameters
CREATE MACRO add_default(a, b := 5) AS a + b;
-- create a macro arr_append (with a functionality equivalent to array_append)
CREATE MACRO arr_append(l, e) AS list_concat(l, list_value(e));

Table Macros

-- create a table macro without parameters


CREATE MACRO static_table() AS TABLE SELECT 'Hello' AS column1, 'World' AS column2;
-- create a table macro with parameters (that can be of any type)
CREATE MACRO dynamic_table(col1_value, col2_value) AS TABLE SELECT col1_value AS column1, col2_value AS
column2;
-- create a table macro that returns multiple rows.
-- It will be replaced if it already exists, and it is temporary (will be automatically deleted when the
connection ends)
CREATE OR REPLACE TEMP MACRO dynamic_table(col1_value, col2_value) AS TABLE
SELECT col1_value AS column1, col2_value AS column2
UNION ALL
SELECT 'Hello' AS col1_value, 456 AS col2_value;
-- pass an argument as a list: SELECT * FROM get_users([1, 5])
CREATE MACRO get_users(i) AS TABLE SELECT * FROM users WHERE uid IN (SELECT unnest(i));

Syntax

Macros allow you to create shortcuts for combinations of expressions.

-- failure! cannot find column "b"


CREATE MACRO add(a) AS a + b;
-- this works
CREATE MACRO add(a, b) AS a + b;
-- error! cannot bind +(VARCHAR, INTEGER)
SELECT add('hello', 3);
-- success!
SELECT add(1, 2);
-- 3

Macros can have default parameters. Unlike some languages, default parameters must be named when the macro is invoked.

-- b is a default parameter
CREATE MACRO add_default(a, b := 5) AS a + b;

331
DuckDB Documentation

-- the following will result in 42


SELECT add_default(37);
-- error! add_default only has one positional parameter
SELECT add_default(40, 2);
-- success! default parameters are used by assigning them like so
SELECT add_default(40, b := 2);
-- error! default parameters must come after positional parameters
SELECT add_default(b = 2, 40);
-- the order of default parameters does not matter
CREATE MACRO triple_add(a, b := 5, c := 10) AS a + b + c;
-- success!
SELECT triple_add(40, c := 1, b := 1);
-- 42

When macros are used, they are expanded (i.e., replaced with the original expression), and the parameters within the expanded expression
are replaced with the supplied arguments. Step by step:

-- the 'add' macro we defined above is used in a query


SELECT add(40, 2);
-- internally, add is replaced with its definition of a + b
SELECT a + b;
-- then, the parameters are replaced by the supplied arguments
SELECT 40 + 2;
-- 42

Limitations

Using Named Parameters Currently, positional macro parameters can only be used positionally, and named parameters can only be
used by supplying their name. Therefore, the following will not work:

CREATE MACRO test(a, b := 42) AS (a + b);


SELECT test(32, 52);

Error: Binder Error: Macro function 'test(a)' requires a single positional argument, but 2 positional
arguments were provided.
LINE 1: SELECT test(32, 52);
^

Using Subquery Macros If a MACRO is defined as a subquery, it cannot be invoked in a table function. DuckDB will return the following
error:

Binder Error: Table function cannot contain subqueries

CREATE SCHEMA Statement

The CREATE SCHEMA statement creates a schema in the catalog. The default schema is main.

Examples

-- create a schema
CREATE SCHEMA s1;
-- create a schema if it does not exist yet
CREATE SCHEMA IF NOT EXISTS s2;
-- create table in the schemas
CREATE TABLE s1.t (id INTEGER PRIMARY KEY, other_id INTEGER);
CREATE TABLE s2.t (id INTEGER PRIMARY KEY, j VARCHAR);

332
DuckDB Documentation

-- compute a join between tables from two schemas


SELECT *
FROM s1.t s1t, s2.t s2t
WHERE s1t.other_id = s2t.id;

Syntax

CREATE SECRET Statement

The CREATE SECRET statement creates a new secret in the Secrets Manager.

Syntax for CREATE SECRET

Syntax for DROP SECRET

CREATE SEQUENCE Statement

The CREATE SEQUENCE statement creates a new sequence number generator.

Examples
-- generate an ascending sequence starting from 1
CREATE SEQUENCE serial;
-- generate sequence from a given start number
CREATE SEQUENCE serial START 101;
-- generate odd numbers using INCREMENT BY
CREATE SEQUENCE serial START WITH 1 INCREMENT BY 2;
-- generate a descending sequqnce starting from 99
CREATE SEQUENCE serial START WITH 99 INCREMENT BY -1 MAXVALUE 99;
-- by default, cycles are not allowed and will result in a Serialization Error, e.g.:
-- reached maximum value of sequence "serial" (10)
CREATE SEQUENCE serial START WITH 1 MAXVALUE 10;
-- CYCLE allows cycling through the same sequence repeatedly
CREATE SEQUENCE serial START WITH 1 MAXVALUE 10 CYCLE;

Creating and Dropping Sequences Sequences can be created and dropped similarly to other catalogue items:

-- overwrite an existing sequence


CREATE OR REPLACE SEQUENCE serial;
-- only create sequence if no such sequence exists yet
CREATE SEQUENCE IF NOT EXISTS serial;
-- remove sequence
DROP SEQUENCE serial;
-- remove sequence if exists
DROP SEQUENCE IF EXISTS serial;

Using Sequences for Primary Keys Sequences can provide an integer primary key for a table. For example:

CREATE SEQUENCE id_sequence START 1;


CREATE TABLE tbl (id INT DEFAULT nextval('id_sequence'), s VARCHAR);
INSERT INTO tbl (s) VALUES ('hello'), ('world');
SELECT * FROM tbl;

The script results in the following table:

333
DuckDB Documentation

┌───────┬─────────┐
│ id │ s │
│ int32 │ varchar │
├───────┼─────────┤
│ 1 │ hello │
│ 2 │ world │
└───────┴─────────┘

Sequences can also be added using the ALTER TABLE statement. The following example adds an id column and fills it with values
generated by the sequence:

CREATE TABLE tbl (s VARCHAR);


INSERT INTO tbl VALUES ('hello'), ('world');
CREATE SEQUENCE id_sequence START 1;
ALTER TABLE tbl ADD COLUMN id INT DEFAULT nextval('id_sequence');
SELECT * FROM tbl;

This script results in the same table as the previous example.

Selecting the Next Value Select the next number from a sequence:

SELECT nextval('serial') AS nextval;

┌─────────┐
│ nextval │
│ int64 │
├─────────┤
│ 1 │
└─────────┘

Using this sequence in an INSERT command:

INSERT INTO distributors VALUES (nextval('serial'), 'nothing');

Selecting the Current Value You may also view the current number from the sequence. Note that the nextval function must have
already been called before calling currval, otherwise a Serialization Error (”sequence is not yet defined in this session”) will be thrown.

SELECT currval('serial') AS currval;

┌─────────┐
│ currval │
│ int64 │
├─────────┤
│ 1 │
└─────────┘

Syntax CREATE SEQUENCE creates a new sequence number generator.

If a schema name is given then the sequence is created in the specified schema. Otherwise it is created in the current schema. Temporary
sequences exist in a special schema, so a schema name may not be given when creating a temporary sequence. The sequence name must
be distinct from the name of any other sequence in the same schema.

After a sequence is created, you use the function nextval to operate on the sequence.

Parameters

334
DuckDB Documentation

Name Description

CYCLE or NO CYCLE The CYCLE option allows the sequence to wrap around when the maxvalue or
minvalue has been reached by an ascending or descending sequence respectively. If
the limit is reached, the next number generated will be the minvalue or maxvalue,
respectively. If NO CYCLE is specified, any calls to nextval after the sequence has
reached its maximum value will return an error. If neither CYCLE or NO CYCLE are
specified, NO CYCLE is the default.
increment The optional clause INCREMENT BY increment specifies which value is added to the
current sequence value to create a new value. A positive value will make an ascending
sequence, a negative one a descending sequence. The default value is 1.
maxvalue The optional clause MAXVALUE maxvalue determines the maximum value for the
sequence. If this clause is not supplied or NO MAXVALUE is specified, then default values
will be used. The defaults are 2^63 ‑ 1 and ‑1 for ascending and descending sequences,
respectively.
minvalue The optional clause MINVALUE minvalue determines the minimum value a sequence
can generate. If this clause is not supplied or NO MINVALUE is specified, then defaults
will be used. The defaults are 1 and ‑(2^63 ‑ 1) for ascending and descending sequences,
respectively.
name The name (optionally schema‑qualified) of the sequence to be created.
start The optional clause START WITH start allows the sequence to begin anywhere. The
default starting value is minvalue for ascending sequences and maxvalue for
descending ones.
TEMPORARY or TEMP If specified, the sequence object is created only for this session, and is automatically
dropped on session exit. Existing permanent sequences with the same name are not
visible (in this session) while the temporary sequence exists, unless they are referenced
with schema‑qualified names.

Note. Sequences are based on BIGINT arithmetic, so the range cannot exceed the range of an eight‑byte integer (‑9223372036854775808
to 9223372036854775807).

CREATE TABLE Statement

The CREATE TABLE statement creates a table in the catalog.

Examples

-- create a table with two integer columns (i and j)


CREATE TABLE t1 (i INTEGER, j INTEGER);
-- create a table with a primary key
CREATE TABLE t1 (id INTEGER PRIMARY KEY, j VARCHAR);
-- create a table with a composite primary key
CREATE TABLE t1 (id INTEGER, j VARCHAR, PRIMARY KEY (id, j));
-- create a table with various different types and constraints
CREATE TABLE t1 (
i INTEGER NOT NULL,
decimalnr DOUBLE CHECK (decimalnr < 10),
date DATE UNIQUE,
time TIMESTAMP
);

335
DuckDB Documentation

-- create table as select (CTAS)


CREATE TABLE t1 AS SELECT 42 AS i, 84 AS j;
-- create a table from a CSV file (automatically detecting column names and types)
CREATE TABLE t1 AS SELECT * FROM read_csv('path/file.csv');
-- we can use the FROM-first syntax to omit 'SELECT *'
CREATE TABLE t1 AS FROM read_csv('path/file.csv');
-- copy the schema of t2 to t1
CREATE TABLE t1 AS FROM t2 LIMIT 0;

Temporary Tables

Temporary tables can be created using the CREATE TEMP TABLE or the CREATE TEMPORARY TABLE statement (see diagram below).
Temporary tables are session scoped (similar to PostgreSQL for example), meaning that only the specific connection that created them can
access them, and once the connection to DuckDB is closed they will be automatically dropped. Temporary tables reside in memory rather
than on disk (even when connecting to a persistent DuckDB), but if the temp_directory configuration is set when connecting or with a
SET command, data will be spilled to disk if memory becomes constrained.

-- create a temporary table from a CSV file (automatically detecting column names and types)
CREATE TEMP TABLE t1 AS SELECT * FROM read_csv('path/file.csv');

-- allow temporary tables to off-load excess memory to disk


SET temp_directory = '/path/to/directory/';

Temporary tables are part of the temp.main schema. While discouraged, their names can overlap with the names of the regular database
tables. In these cases, use their fully qualified name, e.g., temp.main.t1, for disambiguation.

CREATE OR REPLACE

The CREATE OR REPLACE syntax allows a new table to be created or for an existing table to be overwritten by the new table. This is
shorthand for dropping the existing table and then creating the new one.

-- create a table with two integer columns (i and j) even if t1 already exists
CREATE OR REPLACE TABLE t1 (i INTEGER, j INTEGER);

IF NOT EXISTS

The IF NOT EXISTS syntax will only proceed with the creation of the table if it does not already exist. If the table already exists, no action
will be taken and the existing table will remain in the database.

-- create a table with two integer columns (i and j) only if t1 does not exist yet
CREATE TABLE IF NOT EXISTS t1 (i INTEGER, j INTEGER);

Check Constraints

A CHECK constraint is an expression that must be satisfied by the values of every row in the table.

CREATE TABLE t1 (
id INTEGER PRIMARY KEY,
percentage INTEGER CHECK (0 <= percentage AND percentage <= 100)
);
INSERT INTO t1 VALUES (1, 5);
INSERT INTO t1 VALUES (2, -1);
-- Error: Constraint Error: CHECK constraint failed: t1
INSERT INTO t1 VALUES (3, 101);
-- Error: Constraint Error: CHECK constraint failed: t1

336
DuckDB Documentation

CREATE TABLE t2 (id INTEGER PRIMARY KEY, x INTEGER, y INTEGER CHECK (x < y));
INSERT INTO t2 VALUES (1, 5, 10);
INSERT INTO t2 VALUES (2, 5, 3);
-- Error: Constraint Error: CHECK constraint failed: t2

CHECK constraints can also be added as part of the CONSTRAINTS clause:

CREATE TABLE t3 (
id INTEGER PRIMARY KEY,
x INTEGER,
y INTEGER,
CONSTRAINT x_smaller_than_y CHECK (x < y)
);
INSERT INTO t3 VALUES (1, 5, 10);
INSERT INTO t3 VALUES (2, 5, 3);
-- Error: Constraint Error: CHECK constraint failed: t3

Foreign Key Constraints

A FOREIGN KEY is a column (or set of columns) that references another table's primary key. Foreign keys check referential integrity, i.e.,
the referred primary key must exist in the other table upon insertion.

CREATE TABLE t1 (id INTEGER PRIMARY KEY, j VARCHAR);


CREATE TABLE t2 (
id INTEGER PRIMARY KEY,
t1_id INTEGER,
FOREIGN KEY (t1_id) REFERENCES t1 (id)
);

-- example
INSERT INTO t1 VALUES (1, 'a');
INSERT INTO t2 VALUES (1, 1);
INSERT INTO t2 VALUES (2, 2);
-- Error: Constraint Error: Violates foreign key constraint because key "id: 2" does not exist in the
referenced table

Foreign keys can be defined on composite primary keys:

CREATE TABLE t3 (id INTEGER, j VARCHAR, PRIMARY KEY (id, j));


CREATE TABLE t4 (
id INTEGER PRIMARY KEY, t3_id INTEGER, t3_j VARCHAR,
FOREIGN KEY (t3_id, t3_j) REFERENCES t3(id, j)
);

-- example
INSERT INTO t3 VALUES (1, 'a');
INSERT INTO t4 VALUES (1, 1, 'a');
INSERT INTO t4 VALUES (2, 1, 'b');
-- Error: Constraint Error: Violates foreign key constraint because key "id: 1, j: b" does not exist in
the referenced table

Foreign keys can also be defined on unique columns:

CREATE TABLE t5 (id INTEGER UNIQUE, j VARCHAR);


CREATE TABLE t6 (
id INTEGER PRIMARY KEY,
t5_id INTEGER,
FOREIGN KEY (t5_id) REFERENCES t5(id)
);

337
DuckDB Documentation

Note. Foreign keys with cascading deletes (FOREIGN KEY ... REFERENCES ... ON DELETE CASCADE) are not supported.

Generated Columns

The [type] [GENERATED ALWAYS] AS (expr) [VIRTUAL|STORED] syntax will create a generated column. The data in this kind
of column is generated from its expression, which can reference other (regular or generated) columns of the table. Since they are produced
by calculations, these columns can not be inserted into directly.

DuckDB can infer the type of the generated column based on the expression's return type. This allows you to leave out the type when
declaring a generated column. It is possible to explicitly set a type, but insertions into the referenced columns might fail if the type can not
be cast to the type of the generated column.

Generated columns come in two varieties: VIRTUAL and STORED. The data of virtual generated columns is not stored on disk, instead it
is computed from the expression every time the column is referenced (through a select statement).

The data of stored generated columns is stored on disk and is computed every time the data of their dependencies change (through an
insert/update/drop statement).

Currently, only the VIRTUAL kind is supported, and it is also the default option if the last field is left blank.

-- The simplest syntax for a generated column


-- The type is derived from the expression, and the variant defaults to VIRTUAL
CREATE TABLE t1 (x FLOAT, two_x AS (2 * x));

-- Fully specifying the same generated column for completeness


CREATE TABLE t1 (x FLOAT, two_x FLOAT GENERATED ALWAYS AS (2 * x) VIRTUAL);

Syntax

CREATE VIEW Statement

The CREATE VIEW statement defines a new view in the catalog.

Examples

-- create a simple view


CREATE VIEW v1 AS SELECT * FROM tbl;
-- create a view or replace it if a view with that name already exists
CREATE OR REPLACE VIEW v1 AS SELECT 42;
-- create a view and replace the column names
CREATE VIEW v1(a) AS SELECT 42;

The SQL query behind an existing view can be read using the duckdb_views() function like this:

SELECT sql FROM duckdb_views() WHERE view_name = v1;

Syntax

CREATE VIEW defines a view of a query. The view is not physically materialized. Instead, the query is run every time the view is referenced
in a query.

CREATE OR REPLACE VIEW is similar, but if a view of the same name already exists, it is replaced.

If a schema name is given then the view is created in the specified schema. Otherwise it is created in the current schema. Temporary views
exist in a special schema, so a schema name cannot be given when creating a temporary view. The name of the view must be distinct from
the name of any other view or table in the same schema.

338
DuckDB Documentation

CREATE TYPE Statement

The CREATE TYPE statement defines a new type in the catalog.

Examples

-- create a simple enum type


CREATE TYPE mood AS ENUM ('happy', 'sad', 'curious');
-- create a simple struct type
CREATE TYPE many_things AS STRUCT(k INTEGER, l VARCHAR);
-- create a simple union type
CREATE TYPE one_thing AS UNION(number INTEGER, string VARCHAR);
-- create a type alias
CREATE TYPE x_index AS INTEGER;

Syntax

CREATE TYPE defines a new data type available to this duckdb instance. These new types can then be inspected in the duckdb_types
table.

Extending these custom types to support custom operators (such as the PostgreSQL && operator) would require C++ development. To do
this, create an extension.

DELETE Statement

The DELETE statement removes rows from the table identified by the table‑name.

Examples

-- remove the rows matching the condition "i = 2" from the database
DELETE FROM tbl WHERE i = 2;
-- delete all rows in the table "tbl"
DELETE FROM tbl;
-- the TRUNCATE statement removes all rows from a table,
-- acting as an alias for DELETE FROM without a WHERE clause
TRUNCATE tbl;

Syntax

The DELETE statement removes rows from the table identified by the table‑name.

If the WHERE clause is not present, all records in the table are deleted. If a WHERE clause is supplied, then only those rows for which the
WHERE clause results in true are deleted. Rows for which the expression is false or NULL are retained.

The USING clause allows deleting based on the content of other tables or subqueries.

Limitations on Reclaiming Memory and Disk Space

Running DELETE does not mean space is reclaimed. In general, rows are only marked as deleted. DuckDB's support for VACUUM is limited
to vacuuming entire row groups.

339
DuckDB Documentation

DROP Statement

The DROP statement removes a catalog entry added previously with the CREATE command.

Examples

-- delete the table with the name "tbl"


DROP TABLE tbl;
-- drop the view with the name "v1"; do not throw an error if the view does not exist
DROP VIEW IF EXISTS v1;
-- drop type
DROP TYPE type_name;

Syntax

The optional IF EXISTS clause suppresses the error that would normally result if the table does not exist.

By default (or if the RESTRICT clause is provided), the entry will not be dropped if there are any other objects that depend on it. If the
CASCADE clause is provided then all the objects that are dependent on the object will be dropped as well.

CREATE SCHEMA myschema;


CREATE TABLE myschema.t1 (i INTEGER);
-- ERROR: Cannot drop myschema because the table myschema.t1 depends on it.
DROP SCHEMA myschema;
-- Cascade drops both myschema and myschema.t1
DROP SCHEMA myschema CASCADE;

Limitations on Reclaiming Disk Space

Running DROP TABLE should free the memory used by the table, but not always disk space. Even if disk space does not decrease, the free
blocks will be marked as ”free”. For example, if we have a 2 GB file and we drop a 1 GB table, the file might still be 2 GB, but it should have
1 GB of free blocks in it. To check this, use the following PRAGMA and check the number of free_blocks in the output:

PRAGMA database_size;

EXPORT/IMPORT DATABASE Statements

The EXPORT DATABASE command allows you to export the contents of the database to a specific directory. The IMPORT DATABASE
command allows you to then read the contents again.

Examples

-- export the database to the target directory 'target_directory' as CSV files


EXPORT DATABASE 'target_directory';
-- export to directory 'target_directory',
-- using the given options for the CSV serialization
EXPORT DATABASE 'target_directory' (FORMAT CSV, DELIMITER '|');
-- export to directory 'target_directory', tables serialized as Parquet
EXPORT DATABASE 'target_directory' (FORMAT PARQUET);
-- export to directory 'target_directory', tables serialized as Parquet,
-- compressed with ZSTD, with a row_group_size of 100,000
EXPORT DATABASE 'target_directory' (
FORMAT PARQUET,
COMPRESSION ZSTD,

340
DuckDB Documentation

ROW_GROUP_SIZE 100_000
);
-- reload the database again
IMPORT DATABASE 'source_directory';
-- alternatively, use a PRAGMA
PRAGMA import_database('source_directory');

For details regarding the writing of Parquet files, see the Parquet Files page in the Data Import section, and the COPY Statement page.

EXPORT DATABASE

The EXPORT DATABASE command exports the full contents of the database ‑ including schema information, tables, views and sequences
‑ to a specific directory that can then be loaded again. The created directory will be structured as follows:

target_directory/schema.sql
target_directory/load.sql
target_directory/t_1.csv
...
target_directory/t_n.csv

The schema.sql file contains the schema statements that are found in the database. It contains any CREATE SCHEMA, CREATE TABLE,
CREATE VIEW and CREATE SEQUENCE commands that are necessary to re‑construct the database.

The load.sql file contains a set of COPY statements that can be used to read the data from the CSV files again. The file contains a single
COPY statement for every table found in the schema.

Syntax

IMPORT DATABASE

The database can be reloaded by using the IMPORT DATABASE command again, or manually by running schema.sql followed by
load.sql to re‑load the data.

Syntax

INSERT Statement

The INSERT statement inserts new data into a table.

Examples

-- insert the values (1), (2), (3) into "tbl"


INSERT INTO tbl VALUES (1), (2), (3);
-- insert the result of a query into a table
INSERT INTO tbl SELECT * FROM other_tbl;
-- insert values into the "i" column, inserting the default value into other columns
INSERT INTO tbl (i) VALUES (1), (2), (3);
-- explicitly insert the default value into a column
INSERT INTO tbl (i) VALUES (1), (DEFAULT), (3);
-- assuming tbl has a primary key/unique constraint, do nothing on conflict
INSERT OR IGNORE INTO tbl (i) VALUES (1);
-- or update the table with the new values instead
INSERT OR REPLACE INTO tbl (i) VALUES (1);

341
DuckDB Documentation

Syntax INSERT INTO inserts new rows into a table. One can insert one or more rows specified by value expressions, or zero or more
rows resulting from a query.

Insert Column Order

It's possible to provide an optional insert column order, this can either be BY POSITION (the default) or BY NAME. Each column not
present in the explicit or implicit column list will be filled with a default value, either its declared default value or NULL if there is none.

If the expression for any column is not of the correct data type, automatic type conversion will be attempted.

INSERT INTO ... [BY POSITION] The order that values are inserted into the columns of the table is determined by the order that
the columns were declared in. That is, the values supplied by the VALUES clause or query are associated with the column list left‑to‑right.
This is the default option, that can be explicitly specified using the BY POSITION option. For example:

CREATE TABLE tbl (a INTEGER, b INTEGER);


INSERT INTO tbl VALUES (5, 42);
-- specifying "BY POSITION" is optional and is equivalent to the default behavior
INSERT INTO tbl BY POSITION VALUES (5, 42);

To use a different order, column names can be provided as part of the target, for example:

CREATE TABLE tbl (a INTEGER, b INTEGER);


INSERT INTO tbl (b, a) VALUES (5, 42);
-- adding "BY POSITION" results in the same behavior
INSERT INTO tbl BY POSITION (b, a) VALUES (5, 42);

This will insert 5 into b and 42 into a.

INSERT INTO ... BY NAME The names of the column list of the SELECT statement are matched against the column names of the
table to determine the order that values should be inserted into the table, even if the order of the columns in the table differs from the order
of the values in the SELECT statement. For example:

CREATE TABLE tbl (a INTEGER, b INTEGER);


INSERT INTO tbl BY NAME (SELECT 42 AS b);

This will insert 42 into b and insert NULL (or its default value) into a.

It's important to note that when using INSERT INTO ... BY NAME, the column names specified in the SELECT statement must match
the column names in the table. If a column name is misspelled or does not exist in the table, an error will occur. Columns that are missing
from the SELECT statement will be filled with the default value.

ON CONFLICT Clause

An ON CONFLICT clause can be used to perform a certain action on conflicts that arise from UNIQUE or PRIMARY KEY constraints. An
example for such a conflict is shown in the following example:

CREATE TABLE tbl (i INT PRIMARY KEY, j INT);


INSERT INTO tbl VALUES (1, 42);
INSERT INTO tbl VALUES (1, 84);

This raises as an error and leaves the table with a single row <i: 1, j: 42>.

Error: Constraint Error: Duplicate key "i: 1" violates primary key constraint.

There are two supported actions: DO NOTHING and DO UPDATE SET ...

342
DuckDB Documentation

DO NOTHING Clause The DO NOTHING clause causes the error(s) to be ignored, and the values are not inserted or updated. For
example:

CREATE TABLE tbl (i INT PRIMARY KEY, j INT);


INSERT INTO tbl VALUES (1, 42);
INSERT INTO tbl VALUES (1, 84) ON CONFLICT DO NOTHING;

These statements finish successfully and leaves the table with the row <i: 1, j: 42>.

Shorthand The INSERT OR IGNORE INTO ... statement is a shorter syntax alternative to INSERT INTO ... ON CONFLICT
DO NOTHING. For example, the following statements are equivalent:

INSERT OR IGNORE INTO tbl VALUES (1, 84);


INSERT INTO tbl VALUES (1, 84) ON CONFLICT DO NOTHING;

DO UPDATE Clause The DO UPDATE clause causes the INSERT to turn into an UPDATE on the conflicting row(s) instead. The SET ex‑
pressions that follow determine how these rows are updated. The expressions can use the special virtual table EXCLUDED, which contains
the conflicting values for the row. Optionally you can provide an additional WHERE clause that can exclude certain rows from the update.
The conflicts that don't meet this condition are ignored instead.

Because we need a way to refer to both the to‑be‑inserted tuple and the existing tuple, we introduce the special EXCLUDED qualifier.
When the EXCLUDED qualifier is provided, the reference refers to the to‑be‑inserted tuple, otherwise it refers to the existing tuple. This
special qualifier can be used within the WHERE clauses and SET expressions of the ON CONFLICT clause.

An example using DO UPDATE is the following:

CREATE TABLE tbl (i INT PRIMARY KEY, j INT);


INSERT INTO tbl VALUES (1, 42);
INSERT INTO tbl VALUES (1, 84) ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
-- rearranging columns and BY NAME also work:
INSERT INTO tbl (j, i) VALUES (168, 1) ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
INSERT INTO tbl BY NAME (SELECT 1 AS i, 336 AS j) ON CONFLICT DO UPDATE SET j = EXCLUDED.j;

These statements finish successfully and leaves the table with a single row <i: 1, j: 84>.

Shorthand The INSERT OR REPLACE INTO ... statement is a shorter syntax alternative to INSERT INTO ... DO UPDATE
SET c1 = EXCLUDED.c1, c2 = EXCLUDED.c2, .... That is, it updates every column of the existing row to the new values of
the to‑be‑inserted row. For example, given the following input table:

CREATE TABLE tbl (i INT PRIMARY KEY, j INT);


INSERT INTO tbl VALUES (1, 42);

These statements are equivalent:

INSERT OR REPLACE INTO tbl VALUES (1, 84);


INSERT INTO tbl VALUES (1, 84) ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
INSERT INTO tbl (j, i) VALUES (84, 1) ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
INSERT INTO tbl BY NAME (SELECT 84 AS j, 1 AS i) ON CONFLICT DO UPDATE SET j = EXCLUDED.j;

Defining a Conflict Target A conflict target may be provided as ON CONFLICT (confict_target). This is a group of columns
that an index or uniqueness/key constraint is defined on. If the conflict target is omitted, or PRIMARY KEY constraint(s) on the table are
targeted.

Specifying a conflict target is optional unless using a DO UPDATE and there are multiple unique/primary key constraints on the table.

CREATE TABLE tbl (i INT PRIMARY KEY, j INT UNIQUE, k INT);


INSERT INTO tbl VALUES (1, 20, 300);
INSERT INTO tbl VALUES (1, 40, 700) ON CONFLICT (i) DO UPDATE SET k = 2 * EXCLUDED.k;

343
DuckDB Documentation

-- tbl will contain <1, 20, 1400>


INSERT INTO tbl VALUES (1, 20, 900) ON CONFLICT (j) DO UPDATE SET k = 5 * EXCLUDED.k;
-- tbl will contain <1, 20, 4500>

When a conflict target is provided, you can further filter this with a WHERE clause, that should be met by all conflicts.

INSERT INTO tbl VALUES (1, 40, 700) ON CONFLICT (i) DO UPDATE SET k = 2 * EXCLUDED.k WHERE k < 100;

Multiple Tuples Conflicting on the Same Key Having multiple tuples conflicting on the same key is not supported. For example:

CREATE TABLE tbl (i INT PRIMARY KEY, j INT);


INSERT INTO tbl VALUES (1, 42);
INSERT INTO tbl VALUES (1, 84), (1, 168) ON CONFLICT DO NOTHING;

Running this returns the following message.

Error: Invalid Input Error: ON CONFLICT DO UPDATE can not update the same row twice in the same command.
Ensure that no rows proposed for insertion within the same command have duplicate constrained values

RETURNING Clause

The RETURNING clause may be used to return the contents of the rows that were inserted. This can be useful if some columns are calculated
upon insert. For example, if the table contains an automatically incrementing primary key, then the RETURNING clause will include the
automatically created primary key. This is also useful in the case of generated columns.

Some or all columns can be explicitly chosen to be returned and they may optionally be renamed using aliases. Arbitrary non‑aggregating
expressions may also be returned instead of simply returning a column. All columns can be returned using the * expression, and columns
or expressions can be returned in addition to all columns returned by the *.

For example:

CREATE TABLE t1 (i INT);


INSERT INTO t1
SELECT 42
RETURNING *;

42

A more complex example that includes an expression in the RETURNING clause:

CREATE TABLE t2 (i INT, j INT);


INSERT INTO t2
SELECT 2 AS i, 3 AS j
RETURNING *, i * j AS i_times_j;

i j i_times_j

2 3 6

The next example shows a situation where the RETURNING clause is more helpful. First, a table is created with a primary key column.
Then a sequence is created to allow for that primary key to be incremented as new rows are inserted. When we insert into the table, we
do not already know the values generated by the sequence, so it is valuable to return them. For additional information, see the CREATE
SEQUENCE page.

344
DuckDB Documentation

CREATE TABLE t3 (i INT PRIMARY KEY, j INT);


CREATE SEQUENCE 't3_key';
INSERT INTO t3
SELECT nextval('t3_key') AS i, 42 AS j
UNION ALL
SELECT nextval('t3_key') AS i, 43 AS j
RETURNING *;

i j

1 42
2 43

PIVOT Statement

The PIVOT statement allows distinct values within a column to be separated into their own columns. The values within those new columns
are calculated using an aggregate function on the subset of rows that match each distinct value.

DuckDB implements both the SQL Standard PIVOT syntax and a simplified PIVOT syntax that automatically detects the columns to create
while pivoting. PIVOT_WIDER may also be used in place of the PIVOT keyword.

Note. The UNPIVOT statement is the inverse of the PIVOT statement.

Simplified PIVOT Syntax

The full syntax diagram is below, but the simplified PIVOT syntax can be summarized using spreadsheet pivot table naming conventions
as:

PIVOT dataset
ON columns
USING values
GROUP BY rows
ORDER BY columns_with_order_directions
LIMIT number_of_rows ;

The ON, USING, and GROUP BY clauses are each optional, but they may not all be omitted.

Example Data All examples use the dataset produced by the queries below:

CREATE TABLE Cities (Country VARCHAR, Name VARCHAR, Year INT, Population INT);
INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2000, 1005);
INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2010, 1065);
INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2020, 1158);
INSERT INTO Cities VALUES ('US', 'Seattle', 2000, 564);
INSERT INTO Cities VALUES ('US', 'Seattle', 2010, 608);
INSERT INTO Cities VALUES ('US', 'Seattle', 2020, 738);
INSERT INTO Cities VALUES ('US', 'New York City', 2000, 8015);
INSERT INTO Cities VALUES ('US', 'New York City', 2010, 8175);
INSERT INTO Cities VALUES ('US', 'New York City', 2020, 8772);

FROM Cities;

345
DuckDB Documentation

Country Name Year Population

NL Amsterdam 2000 1005


NL Amsterdam 2010 1065
NL Amsterdam 2020 1158
US Seattle 2000 564
US Seattle 2010 608
US Seattle 2020 738
US New York City 2000 8015
US New York City 2010 8175
US New York City 2020 8772

PIVOT ON and USING Use the PIVOT statement below to create a separate column for each year and calculate the total population in
each. The ON clause specifies which column(s) to split into separate columns. It is equivalent to the columns parameter in a spreadsheet
pivot table.

The USING clause determines how to aggregate the values that are split into separate columns. This is equivalent to the values parameter
in a spreadsheet pivot table. If the USING clause is not included, it defaults to count(*).

PIVOT Cities
ON Year
USING sum(Population);

Country Name 2000 2010 2020

NL Amsterdam 1005 1065 1158


US Seattle 564 608 738
US New York City 8015 8175 8772

In the above example, the sum aggregate is always operating on a single value. If we only want to change the orientation of how the data
is displayed without aggregating, use the first aggregate function. In this example, we are pivoting numeric values, but the first
function works very well for pivoting out a text column. (This is something that is difficult to do in a spreadsheet pivot table, but easy in
DuckDB!)

This query produces a result that is identical to the one above:

PIVOT Cities ON Year USING first(Population);

PIVOT ON, USING, and GROUP BY By default, the PIVOT statement retains all columns not specified in the ON or USING clauses. To
include only certain columns and further aggregate, specify columns in the GROUP BY clause. This is equivalent to the rows parameter of
a spreadsheet pivot table.

In the below example, the Name column is no longer included in the output, and the data is aggregated up to the Country level.

PIVOT Cities
ON Year
USING sum(Population)
GROUP BY Country;

346
DuckDB Documentation

Country 2000 2010 2020

NL 1005 1065 1158


US 8579 8783 9510

IN Filter for ON Clause To only create a separate column for specific values within a column in the ON clause, use an optional IN expres‑
sion. Let's say for example that we wanted to forget about the year 2020 for no particular reason...

PIVOT Cities
ON Year IN (2000, 2010)
USING sum(Population)
GROUP BY Country;

Country 2000 2010

NL 1005 1065
US 8579 8783

Multiple Expressions per Clause Multiple columns can be specified in the ON and GROUP BY clauses, and multiple aggregate expres‑
sions can be included in the USING clause.

Multiple ON Columns and ON Expressions Multiple columns can be pivoted out into their own columns. DuckDB will find the distinct
values in each ON clause column and create one new column for all combinations of those values (a cartesian product).

In the below example, all combinations of unique countries and unique cities receive their own column. Some combinations may not be
present in the underlying data, so those columns are populated with NULL values.

PIVOT Cities
ON Country, Name
USING sum(Population);

Year NL_Amsterdam NL_New York City NL_Seattle US_Amsterdam US_New York City US_Seattle

2000 1005 NULL NULL NULL 8015 564


2010 1065 NULL NULL NULL 8175 608
2020 1158 NULL NULL NULL 8772 738

To pivot only the combinations of values that are present in the underlying data, use an expression in the ON clause. Multiple expressions
and/or columns may be provided.

Here, Country and Name are concatenated together and the resulting concatenations each receive their own column. Any arbitrary non‑
aggregating expression may be used. In this case, concatenating with an underscore is used to imitate the naming convention the PIVOT
clause uses when multiple ON columns are provided (like in the prior example).

PIVOT Cities ON Country || '_' || Name USING sum(Population);

Year NL_Amsterdam US_New York City US_Seattle

2000 1005 8015 564


2010 1065 8175 608

347
DuckDB Documentation

Year NL_Amsterdam US_New York City US_Seattle

2020 1158 8772 738

Multiple USING Expressions An alias may also be included for each expression in the USING clause. It will be appended to the generated
column names after an underscore (_). This makes the column naming convention much cleaner when multiple expressions are included
in the USING clause.

In this example, both the sum and max of the Population column are calculated for each year and are split into separate columns.

PIVOT Cities
ON Year
USING sum(Population) AS total, max(Population) AS max
GROUP BY Country;

Country 2000_total 2000_max 2010_total 2010_max 2020_total 2020_max

US 8579 8015 8783 8175 9510 8772


NL 1005 1005 1065 1065 1158 1158

Multiple GROUP BY Columns Multiple GROUP BY columns may also be provided. Note that column names must be used rather than
column positions (1, 2, etc.), and that expressions are not supported in the GROUP BY clause.

PIVOT Cities
ON Year
USING sum(Population)
GROUP BY Country, Name;

Country Name 2000 2010 2020

NL Amsterdam 1005 1065 1158


US Seattle 564 608 738
US New York City 8015 8175 8772

Using PIVOT within a SELECT Statement The PIVOT statement may be included within a SELECT statement as a CTE (a Common
Table Expression, or WITH clause), or a subquery. This allows for a PIVOT to be used alongside other SQL logic, as well as for multiple
PIVOTs to be used in one query.

No SELECT is needed within the CTE, the PIVOT keyword can be thought of as taking its place.

WITH pivot_alias AS (
PIVOT Cities
ON Year
USING sum(Population)
GROUP BY Country
)
SELECT * FROM pivot_alias;

A PIVOT may be used in a subquery and must be wrapped in parentheses. Note that this behavior is different than the SQL Standard Pivot,
as illustrated in subsequent examples.

348
DuckDB Documentation

SELECT *
FROM (
PIVOT Cities
ON Year
USING sum(Population)
GROUP BY Country
) pivot_alias;

Multiple PIVOT Statements Each PIVOT can be treated as if it were a SELECT node, so they can be joined together or manipulated in
other ways.

For example, if two PIVOT statements share the same GROUP BY expression, they can be joined together using the columns in the GROUP
BY clause into a wider pivot.

FROM (PIVOT Cities ON Year USING sum(Population) GROUP BY Country) year_pivot


JOIN (PIVOT Cities ON Name USING sum(Population) GROUP BY Country) name_pivot
USING (Country);

Country 2000 2010 2020 Amsterdam New York City Seattle

NL 1005 1065 1158 3228 NULL NULL


US 8579 8783 9510 NULL 24962 1910

Internals

Pivoting is implemented as a combination of SQL query re‑writing and a dedicated PhysicalPivot operator for higher performance.
Each PIVOT is implemented as set of aggregations into lists and then the dedicated PhysicalPivot operator converts those lists into
column names and values. Additional pre‑processing steps are required if the columns to be created when pivoting are detected dynami‑
cally (which occurs when the IN clause is not in use).

DuckDB, like most SQL engines, requires that all column names and types be known at the start of a query. In order to automatically detect
the columns that should be created as a result of a PIVOT statement, it must be translated into multiple queries. ENUM types are used to
find the distinct values that should become columns. Each ENUM is then injected into one of the PIVOT statement's IN clauses.

After the IN clauses have been populated with ENUMs, the query is re‑written again into a set of aggregations into lists.

For example:

PIVOT Cities
ON Year
USING sum(Population);

is initially translated into:

CREATE TEMPORARY TYPE __pivot_enum_0_0 AS ENUM (


SELECT DISTINCT
Year::VARCHAR
FROM Cities
ORDER BY
Year
);
PIVOT Cities
ON Year IN __pivot_enum_0_0
USING sum(Population);

and finally translated into:

349
DuckDB Documentation

SELECT Country, Name, list(Year), list(population_sum)


FROM (
SELECT Country, Name, Year, sum(population) AS population_sum
FROM Cities
GROUP BY ALL
)
GROUP BY ALL;

This produces the result:

Country Name list(”YEAR”) list(population_sum)

NL Amsterdam [2000, 2010, 2020] [1005, 1065, 1158]


US Seattle [2000, 2010, 2020] [564, 608, 738]
US New York City [2000, 2010, 2020] [8015, 8175, 8772]

The PhysicalPivot operator converts those lists into column names and values to return this result:

Country Name 2000 2010 2020

NL Amsterdam 1005 1065 1158


US Seattle 564 608 738
US New York City 8015 8175 8772

Simplified PIVOT Full Syntax Diagram

Below is the full syntax diagram of the PIVOT statement.

SQL Standard PIVOT Syntax

The full syntax diagram is below, but the SQL Standard PIVOT syntax can be summarized as:

FROM dataset
PIVOT (
values
FOR
column_1 IN ( in_list )
column_2 IN ( in_list )
...
GROUP BY rows
);

Unlike the simplified syntax, the IN clause must be specified for each column to be pivoted. If you are interested in dynamic pivoting, the
simplified syntax is recommended.

Note that no commas separate the expressions in the FOR clause, but that value and GROUP BY expressions must be comma‑
separated!

Examples

This example uses a single value expression, a single column expression, and a single row expression:

350
DuckDB Documentation

FROM Cities
PIVOT (
sum(Population)
FOR
Year IN (2000, 2010, 2020)
GROUP BY Country
);

Country 2000 2010 2020

NL 1005 1065 1158


US 8579 8783 9510

This example is somewhat contrived, but serves as an example of using multiple value expressions and multiple columns in the FOR
clause.

FROM Cities
PIVOT (
sum(Population) AS total,
count(Population) AS count
FOR
Year IN (2000, 2010)
Country in ('NL', 'US')
);

2000_NL_ 2000_NL_ 2000_US_ 2000_US_ 2010_NL_ 2010_NL_ 2010_US_ 2010_US_


Name total count total count total count total count

Amsterdam 1005 1 NULL 0 1065 1 NULL 0


Seattle NULL 0 564 1 NULL 0 608 1
New York NULL 0 8015 1 NULL 0 8175 1
City

SQL Standard PIVOT Full Syntax Diagram Below is the full syntax diagram of the SQL Standard version of the PIVOT statement.

Profiling Queries

DuckDB supports profiling queries via the EXPLAIN and EXPLAIN ANALYZE statements.

EXPLAIN

To see the query plan of a query without executing it, run:

EXPLAIN query ;

The output of EXPLAIN contains the estimated cardinalities for each operator.

EXPLAIN ANALYZE

To profile a query, run:

EXPLAIN ANALYZE query ;

351
DuckDB Documentation

The EXPLAIN ANALYZE statement runs the query, and shows the actual cardinalities for each operator, as well as the cumulative wall‑
clock time spent in each operator.

SELECT Statement

The SELECT statement retrieves rows from the database.

Examples
-- select all columns from the table "tbl"
SELECT * FROM tbl;
-- select the rows from tbl
SELECT j FROM tbl WHERE i = 3;
-- perform an aggregate grouped by the column "i"
SELECT i, sum(j) FROM tbl GROUP BY i;
-- select only the top 3 rows from the tbl
SELECT * FROM tbl ORDER BY i DESC LIMIT 3;
-- join two tables together using the USING clause
SELECT * FROM t1 JOIN t2 USING (a, b);
-- use column indexes to select the first and third column from the table "tbl"
SELECT #1, #3 FROM tbl;
-- select all unique cities from the addresses table
SELECT DISTINCT city FROM addresses;

Syntax The SELECT statement retrieves rows from the database. The canonical order of a select statement is as follows, with less com‑
mon clauses being indented:

SELECT select_list
FROM tables
USING SAMPLE sample_expr
WHERE condition
GROUP BY groups
HAVING group_filter
WINDOW window_expr
QUALIFY qualify_filter
ORDER BY order_expr
LIMIT n;

Optionally, the SELECT statement can be prefixed with a WITH clause.

As the SELECT statement is so complex, we have split up the syntax diagrams into several parts. The full syntax diagram can be found at
the bottom of the page.

SELECT Clause

The SELECT clause specifies the list of columns that will be returned by the query. While it appears first in the clause, logically the ex‑
pressions here are executed only at the end. The SELECT clause can contain arbitrary expressions that transform the output, as well as
aggregates and window functions. The DISTINCT keyword ensures that only unique tuples are returned.

Note. Column names are case‑insensitive. See the Rules for Case Sensitivity for more details.

FROM Clause

The FROM clause specifies the source of the data on which the remainder of the query should operate. Logically, the FROM clause is where
the query starts execution. The FROM clause can contain a single table, a combination of multiple tables that are joined together, or another
SELECT query inside a subquery node.

352
DuckDB Documentation

SAMPLE Clause

The SAMPLE clause allows you to run the query on a sample from the base table. This can significantly speed up processing of queries,
at the expense of accuracy in the result. Samples can also be used to quickly see a snapshot of the data when exploring a data set. The
sample clause is applied right after anything in the from clause (i.e., after any joins, but before the where clause or any aggregates). See
the sample page for more information.

WHERE Clause

The WHERE clause specifies any filters to apply to the data. This allows you to select only a subset of the data in which you are interested.
Logically the WHERE clause is applied immediately after the FROM clause.

GROUP BY and HAVING Clauses

The GROUP BY clause specifies which grouping columns should be used to perform any aggregations in the SELECT clause. If the GROUP
BY clause is specified, the query is always an aggregate query, even if no aggregations are present in the SELECT clause.

WINDOW Clause

The WINDOW clause allows you to specify named windows that can be used within window functions. These are useful when you have
multiple window functions, as they allow you to avoid repeating the same window clause.

QUALIFY Clause

The QUALIFY clause is used to filter the result of WINDOW functions.

ORDER BY and LIMIT Clauses

ORDER BY and LIMIT are output modifiers. Logically they are applied at the very end of the query. The LIMIT clause restricts the
amount of rows fetched, and the ORDER BY clause sorts the rows on the sorting criteria in either ascending or descending order.

VALUES List

A VALUES list is a set of values that is supplied instead of a SELECT statement.

Row IDs

For each table, the rowid pseudocolumn returns the row identifiers based on the physical storage.

CREATE TABLE t (id INT, content STRING);


INSERT INTO t VALUES (42, 'hello'), (43, 'world');
SELECT rowid, id, content FROM t;

┌───────┬────┬─────────┐
│ rowid │ id │ content │
├───────┼────┼─────────┤
│ 0 │ 42 │ hello │
│ 1 │ 43 │ world │
└───────┴────┴─────────┘

In the current storage, these identifiers are contiguous unsigned integers (0, 1, ...) if no rows were deleted. Deletions introduce gaps in the
rowids which may be reclaimed later. Therefore, it is strongly recommended not to use rowids as identifiers.

353
DuckDB Documentation

Note. Tip The rowid values are stable within a transaction.

Note. If there is a user‑defined column named rowid, it shadows the rowid pseudocolumn.

Common Table Expressions

Full Syntax Diagram

Below is the full syntax diagram of the SELECT statement:

SET/RESET Statements

The SET statement modifies the provided DuckDB configuration option at the specified scope.

Examples

-- Update the memory_limit configuration value


SET memory_limit = '10GB';
-- Configure the system to use 1 thread
SET threads = 1;
-- Or use the 'TO' keyword
SET threads TO 1;
-- Change configuration option to default value
RESET threads;
-- Retrieve configuration value
SELECT current_setting('threads');
-- Set the default catalog search path globally
SET GLOBAL search_path = 'db1,db2'
-- Set the default collation for the session
SET SESSION default_collation = 'nocase';

Syntax

SET updates a DuckDB configuration option to the provided value.

RESET

The RESET statement changes the given DuckDB configuration option to the default value.

Scopes

Configuration options can have different scopes:

• GLOBAL: Configuration value is used (or reset) across the entire DuckDB instance.
• SESSION: Configuration value is used (or reset) only for the current session attached to a DuckDB instance.
• LOCAL: Not yet implemented.

When not specified, the default scope for the configuration option is used. For most options this is GLOBAL.

Configuration

See the Configuration page for the full list of configuration options.

354
DuckDB Documentation

Transaction Management

DuckDB supports ACID database transactions. Transactions provide isolation, i.e., changes made by a transaction are not visible from
concurrent transactions until it is committed. A transaction can also be aborted, which discards any changes it made so far.

Statements

DuckDB provides the following statements for transaction management.

Starting a Transaction To start a transaction, run:

BEGIN TRANSACTION;

Committing a Transaction You can commit a transaction to make it visible to other transactions and to write it to persistent storage (if
using DuckDB in persistent mode). To commit a transaction, run:

COMMIT;

If you are not in an active transaction, the COMMIT statement will fail.

Rolling Back a Transaction You can abort a transaction. This operation, also known as rolling back, will discard any changes the trans‑
action made to the database. To abort a transaction, run:

ROLLBACK;

You can also use the abort command, which has an identical behavior:

ABORT;

If you are not in an active transaction, the ROLLBACK and ABORT statements will fail.

Example We illustrate the use of transactions through a simple example.

CREATE TABLE person (name VARCHAR, age BIGINT);

BEGIN TRANSACTION;
INSERT INTO person VALUES ('Ada', 52);
COMMIT;

BEGIN TRANSACTION;
DELETE FROM person WHERE name = 'Ada';
INSERT INTO person VALUES ('Bruce', 39);
ROLLBACK;

SELECT * FROM person;

The first transaction (inserting ”Ada”) was committed but the second (deleting ”Ada” and inserting ”Bob”) was aborted. Therefore, the
resulting table will only contain <'Ada', 52>.

UNPIVOT Statement

The UNPIVOT statement allows multiple columns to be stacked into fewer columns. In the basic case, multiple columns are stacked into
two columns: a NAME column (which contains the name of the source column) and a VALUE column (which contains the value from the
source column).

DuckDB implements both the SQL Standard UNPIVOT syntax and a simplified UNPIVOT syntax. Both can utilize a COLUMNS expression
to automatically detect the columns to unpivot. PIVOT_LONGER may also be used in place of the UNPIVOT keyword.

355
DuckDB Documentation

Note. The PIVOT statement is the inverse of the UNPIVOT statement.

Simplified UNPIVOT Syntax

The full syntax diagram is below, but the simplified UNPIVOT syntax can be summarized using spreadsheet pivot table naming conventions
as:

UNPIVOT dataset
ON column(s)
INTO
NAME name-column-name
VALUE value-column-name(s)
ORDER BY column(s)-with-order-direction(s)
LIMIT number-of-rows ;

Example Data All examples use the dataset produced by the queries below:

CREATE OR REPLACE TABLE monthly_sales


(empid INT, dept TEXT, Jan INT, Feb INT, Mar INT, Apr INT, May INT, Jun INT);
INSERT INTO monthly_sales VALUES
(1, 'electronics', 1, 2, 3, 4, 5, 6),
(2, 'clothes', 10, 20, 30, 40, 50, 60),
(3, 'cars', 100, 200, 300, 400, 500, 600);

FROM monthly_sales;

empid dept Jan Feb Mar Apr May Jun

1 electronics 1 2 3 4 5 6
2 clothes 10 20 30 40 50 60
3 cars 100 200 300 400 500 600

UNPIVOT Manually The most typical UNPIVOT transformation is to take already pivoted data and re‑stack it into a column each for the
name and value. In this case, all months will be stacked into a month column and a sales column.

UNPIVOT monthly_sales
ON jan, feb, mar, apr, may, jun
INTO
NAME month
VALUE sales;

empid dept month sales

1 electronics Jan 1
1 electronics Feb 2
1 electronics Mar 3
1 electronics Apr 4
1 electronics May 5
1 electronics Jun 6
2 clothes Jan 10
2 clothes Feb 20

356
DuckDB Documentation

empid dept month sales

2 clothes Mar 30
2 clothes Apr 40
2 clothes May 50
2 clothes Jun 60
3 cars Jan 100
3 cars Feb 200
3 cars Mar 300
3 cars Apr 400
3 cars May 500
3 cars Jun 600

UNPIVOT Dynamically Using Columns Expression In many cases, the number of columns to unpivot is not easy to predetermine ahead
of time. In the case of this dataset, the query above would have to change each time a new month is added. The COLUMNS expression can
be used to select all columns that are not empid or dept. This enables dynamic unpivoting that will work regardless of how many months
are added. The query below returns identical results to the one above.

UNPIVOT monthly_sales
ON COLUMNS(* EXCLUDE (empid, dept))
INTO
NAME month
VALUE sales;

empid dept month sales

1 electronics Jan 1
1 electronics Feb 2
1 electronics Mar 3
1 electronics Apr 4
1 electronics May 5
1 electronics Jun 6
2 clothes Jan 10
2 clothes Feb 20
2 clothes Mar 30
2 clothes Apr 40
2 clothes May 50
2 clothes Jun 60
3 cars Jan 100
3 cars Feb 200
3 cars Mar 300
3 cars Apr 400
3 cars May 500
3 cars Jun 600

357
DuckDB Documentation

UNPIVOT into Multiple Value Columns The UNPIVOT statement has additional flexibility: more than 2 destination columns are sup‑
ported. This can be useful when the goal is to reduce the extent to which a dataset is pivoted, but not completely stack all pivoted columns.
To demonstrate this, the query below will generate a dataset with a separate column for the number of each month within the quarter
(month 1, 2, or 3), and a separate row for each quarter. Since there are fewer quarters than months, this does make the dataset longer, but
not as long as the above.

To accomplish this, multiple sets of columns are included in the ON clause. The q1 and q2 aliases are optional. The number of columns in
each set of columns in the ON clause must match the number of columns in the VALUE clause.

UNPIVOT monthly_sales
ON (jan, feb, mar) AS q1, (apr, may, jun) AS q2
INTO
NAME quarter
VALUE month_1_sales, month_2_sales, month_3_sales;

empid dept quarter month_1_sales month_2_sales month_3_sales

1 electronics q1 1 2 3
1 electronics q2 4 5 6
2 clothes q1 10 20 30
2 clothes q2 40 50 60
3 cars q1 100 200 300
3 cars q2 400 500 600

Using UNPIVOT within a SELECT Statement The UNPIVOT statement may be included within a SELECT statement as a CTE (a Com‑
mon Table Expression, or WITH clause), or a subquery. This allows for an UNPIVOT to be used alongside other SQL logic, as well as for
multiple UNPIVOTs to be used in one query.

No SELECT is needed within the CTE, the UNPIVOT keyword can be thought of as taking its place.

WITH unpivot_alias AS (
UNPIVOT monthly_sales
ON COLUMNS(* EXCLUDE (empid, dept))
INTO
NAME month
VALUE sales
)
SELECT * FROM unpivot_alias;

An UNPIVOT may be used in a subquery and must be wrapped in parentheses. Note that this behavior is different than the SQL Standard
Unpivot, as illustrated in subsequent examples.

SELECT *
FROM (
UNPIVOT monthly_sales
ON COLUMNS(* EXCLUDE (empid, dept))
INTO
NAME month
VALUE sales
) unpivot_alias;

Expressions within UNPIVOT Statements DuckDB allows expressions within the UNPIVOT statements, provided that they only involve
a single column. These can be used to perform computations as well as explicit casts. For example:

358
DuckDB Documentation

UNPIVOT
(SELECT 42 as col1, 'woot' as col2)
ON
(col1 * 2)::VARCHAR,
col2;

┌─────────┬─────────┐
│ name │ value │
│ varchar │ varchar │
├─────────┼─────────┤
│ col1 │ 84 │
│ col2 │ woot │
└─────────┴─────────┘

Internals Unpivoting is implemented entirely as rewrites into SQL queries. Each UNPIVOT is implemented as set of unnest functions,
operating on a list of the column names and a list of the column values. If dynamically unpivoting, the COLUMNS expression is evaluated
first to calculate the column list.

For example:

UNPIVOT monthly_sales
ON jan, feb, mar, apr, may, jun
INTO
NAME month
VALUE sales;

is translated into:

SELECT
empid,
dept,
unnest(['jan', 'feb', 'mar', 'apr', 'may', 'jun']) AS month,
unnest(["jan", "feb", "mar", "apr", "may", "jun"]) AS sales
FROM monthly_sales;

Note the single quotes to build a list of text strings to populate month, and the double quotes to pull the column values for use in sales.
This produces the same result as the initial example:

empid dept month sales

1 electronics jan 1
1 electronics feb 2
1 electronics mar 3
1 electronics apr 4
1 electronics may 5
1 electronics jun 6
2 clothes jan 10
2 clothes feb 20
2 clothes mar 30
2 clothes apr 40
2 clothes may 50
2 clothes jun 60
3 cars jan 100

359
DuckDB Documentation

empid dept month sales

3 cars feb 200


3 cars mar 300
3 cars apr 400
3 cars may 500
3 cars jun 600

Simplified UNPIVOT Full Syntax Diagram Below is the full syntax diagram of the UNPIVOT statement.

SQL Standard UNPIVOT Syntax

The full syntax diagram is below, but the SQL Standard UNPIVOT syntax can be summarized as:

FROM [dataset]
UNPIVOT [INCLUDE NULLS] (
[value-column-name(s)]
FOR [name-column-name] IN [column(s)]
);

Note that only one column can be included in the name-column-name expression.

SQL Standard UNPIVOT Manually To complete the basic UNPIVOT operation using the SQL standard syntax, only a few additions are
needed.

FROM monthly_sales UNPIVOT (


sales
FOR month IN (jan, feb, mar, apr, may, jun)
);

empid dept month sales

1 electronics Jan 1
1 electronics Feb 2
1 electronics Mar 3
1 electronics Apr 4
1 electronics May 5
1 electronics Jun 6
2 clothes Jan 10
2 clothes Feb 20
2 clothes Mar 30
2 clothes Apr 40
2 clothes May 50
2 clothes Jun 60
3 cars Jan 100
3 cars Feb 200
3 cars Mar 300

360
DuckDB Documentation

empid dept month sales

3 cars Apr 400


3 cars May 500
3 cars Jun 600

SQL Standard UNPIVOT Dynamically Using the COLUMNS Expression The COLUMNS expression can be used to determine the IN list
of columns dynamically. This will continue to work even if additional month columns are added to the dataset. It produces the same result
as the query above.

FROM monthly_sales UNPIVOT (


sales
FOR month IN (columns(* EXCLUDE (empid, dept)))
);

SQL Standard UNPIVOT into Multiple Value Columns The UNPIVOT statement has additional flexibility: more than 2 destination
columns are supported. This can be useful when the goal is to reduce the extent to which a dataset is pivoted, but not completely stack
all pivoted columns. To demonstrate this, the query below will generate a dataset with a separate column for the number of each month
within the quarter (month 1, 2, or 3), and a separate row for each quarter. Since there are fewer quarters than months, this does make the
dataset longer, but not as long as the above.

To accomplish this, multiple columns are included in the value-column-name portion of the UNPIVOT statement. Multiple sets of
columns are included in the IN clause. The q1 and q2 aliases are optional. The number of columns in each set of columns in the IN clause
must match the number of columns in the value-column-name portion.

FROM monthly_sales
UNPIVOT (
(month_1_sales, month_2_sales, month_3_sales)
FOR quarter IN (
(jan, feb, mar) AS q1,
(apr, may, jun) AS q2
)
);

empid dept quarter month_1_sales month_2_sales month_3_sales

1 electronics q1 1 2 3
1 electronics q2 4 5 6
2 clothes q1 10 20 30
2 clothes q2 40 50 60
3 cars q1 100 200 300
3 cars q2 400 500 600

SQL Standard UNPIVOT Full Syntax Diagram Below is the full syntax diagram of the SQL Standard version of the UNPIVOT state‑
ment.

UPDATE Statement

The UPDATE statement modifies the values of rows in a table.

361
DuckDB Documentation

Examples

-- for every row where "i" is NULL, set the value to 0 instead
UPDATE tbl
SET i = 0
WHERE i IS NULL;

-- set all values of "i" to 1 and all values of "j" to 2


UPDATE tbl
SET i = 1, j = 2;

Syntax

UPDATE changes the values of the specified columns in all rows that satisfy the condition. Only the columns to be modified need be
mentioned in the SET clause; columns not explicitly modified retain their previous values.

Update from Other Table

A table can be updated based upon values from another table. This can be done by specifying a table in a FROM clause, or using a sub‑select
statement. Both approaches have the benefit of completing the UPDATE operation in bulk for increased performance.

CREATE OR REPLACE TABLE original AS


SELECT 1 AS key, 'original value' AS value
UNION ALL
SELECT 2 AS key, 'original value 2' AS value;

CREATE OR REPLACE TABLE new AS


SELECT 1 AS key, 'new value' AS value
UNION ALL
SELECT 2 AS key, 'new value 2' AS value;

SELECT *
FROM original;

key value

1 original value
2 original value 2

UPDATE original
SET value = new.value
FROM new
WHERE original.key = new.key;
-- OR
UPDATE original
SET value = (
SELECT
new.value
FROM new
WHERE original.key = new.key
);

SELECT *
FROM original;

362
DuckDB Documentation

key value

1 new value
2 new value 2

Update from Same Table

The only difference between this case and the above is that a different table alias must be specified on both the target table and the source
table. In this example as true_original and as new are both required.

UPDATE original as true_original


SET value = (
SELECT
new.value || ' a change!' as value
FROM original as new
WHERE true_original.key = new.key
);

Update Using Joins

To select the rows to update, UPDATE statements can use the FROM clause and express joins via the WHERE clause. For example:

CREATE TABLE city (name VARCHAR, revenue BIGINT, country_code VARCHAR);


CREATE TABLE country (code VARCHAR, name VARCHAR);
INSERT INTO city VALUES ('Paris', 700, 'FR'), ('Lyon', 200, 'FR'), ('Brussels', 400, 'BE');
INSERT INTO country VALUES ('FR', 'France'), ('BE', 'Belgium');

To increase the revenue of all cities in France, join the city and the country tables, and filter on the latter:

UPDATE city
SET revenue = revenue + 100
FROM country
WHERE city.country_code = country.code
AND country.name = 'France';

SELECT *
FROM city;

┌──────────┬─────────┬──────────────┐
│ name │ revenue │ country_code │
│ varchar │ int64 │ varchar │
├──────────┼─────────┼──────────────┤
│ Paris │ 800 │ FR │
│ Lyon │ 300 │ FR │
│ Brussels │ 400 │ BE │
└──────────┴─────────┴──────────────┘

Upsert (Insert or Update)

See the Insert documentation for details.

USE Statement

The USE statement selects a database and optional schema to use as the default.

363
DuckDB Documentation

Examples

--- Sets the 'memory' database as the default


USE memory;
--- Sets the 'duck.main' database and schema as the default
USE duck.main;

Syntax

The USE statement sets a default database or database/schema combination to use for future operations. For instance, tables created
without providing a fully qualified table name will be created in the default database.

VACUUM Statement

The VACUUM statement alone does nothing and is at present provided for PostgreSQL‑compatibility. The VACUUM ANALYZE statement
recomputes table statistics if they have become stale due to table updates or deletions.

Examples

-- No-op
VACUUM;
-- Rebuild database statistics
VACUUM ANALYZE;
-- Rebuild statistics for the table and column
VACUUM ANALYZE memory.main.my_table(my_column);
-- Not supported
VACUUM FULL; -- error

Reclaiming Space

To reclaim space after deleting rows, use the CHECKPOINT statement.

Syntax

Query Syntax

SELECT Clause

The SELECT clause specifies the list of columns that will be returned by the query. While it appears first in the clause, logically the ex‑
pressions here are executed only at the end. The SELECT clause can contain arbitrary expressions that transform the output, as well as
aggregates and window functions.

Examples

-- select all columns from the table called "table_name"


SELECT * FROM table_name;
-- perform arithmetic on columns in a table, and provide an alias
SELECT col1 + col2 AS res, sqrt(col1) AS root FROM table_name;
-- select all unique cities from the addresses table
SELECT DISTINCT city FROM addresses;
-- return the total number of rows in the addresses table

364
DuckDB Documentation

SELECT count(*) FROM addresses;


-- select all columns except the city column from the addresses table
SELECT * EXCLUDE (city) FROM addresses;
-- select all columns from the addresses table, but replace city with lower(city)
SELECT * REPLACE (lower(city) AS city) FROM addresses;
-- select all columns matching the given regex from the table
SELECT COLUMNS('number\d+') FROM addresses;
-- compute a function on all given columns of a table
SELECT min(COLUMNS(*)) FROM addresses;
-- to select columns with spaces or special characters, use double quotes
SELECT "Some Column Name" FROM tbl;

Syntax

SELECT List

The SELECT clause contains a list of expressions that specify the result of a query. The select list can refer to any columns in the FROM
clause, and combine them using expressions. As the output of a SQL query is a table ‑ every expression in the SELECT clause also has a
name. The expressions can be explicitly named using the AS clause (e.g., expr AS name). If a name is not provided by the user the
expressions are named automatically by the system.

Note. Column names are case‑insensitive. See the Rules for Case Sensitivity for more details.

Star Expressions
-- select all columns from the table called "table_name"
SELECT *
FROM table_name;
-- select all columns matching the given regex from the table
SELECT COLUMNS('number\d+')
FROM addresses;

The star expression is a special expression that expands to multiple expressions based on the contents of the FROM clause. In the simplest
case, * expands to all expressions in the FROM clause. Columns can also be selected using regular expressions or lambda functions. See
the star expression page for more details.

DISTINCT Clause
-- select all unique cities from the addresses table
SELECT DISTINCT city
FROM addresses;

The DISTINCT clause can be used to return only the unique rows in the result ‑ so that any duplicate rows are filtered out.

Note. Queries starting with SELECT DISTINCT run deduplication, which is an expensive operation. Therefore, only use DIS-
TINCT if necessary.

DISTINCT ON Clause
-- select only the highest population city for each country
SELECT DISTINCT ON(country) city, population
FROM cities
ORDER BY population DESC;

The DISTINCT ON clause returns only one row per unique value in the set of expressions as defined in the ON clause. If an ORDER BY
clause is present, the row that is returned is the first row that is encountered as per the ORDER BY criteria. If an ORDER BY clause is not
present, the first row that is encountered is not defined and can be any row in the table.

365
DuckDB Documentation

Note. When querying large data sets, using DISTINCT on all columns can be expensive. Therefore, consider using DISTINCT ON
on a column (or a set of columns) which guaranetees a sufficient degree of uniqueness for your results. For example, using DISTINCT
ON on the key column(s) of a table guarantees full uniqueness.

Aggregates

-- return the total number of rows in the addresses table


SELECT count(*)
FROM addresses;
-- return the total number of rows in the addresses table grouped by city
SELECT city, count(*)
FROM addresses
GROUP BY city;

Aggregate functions are special functions that combine multiple rows into a single value. When aggregate functions are present in the
SELECT clause, the query is turned into an aggregate query. In an aggregate query, all expressions must either be part of an aggregate
function, or part of a group (as specified by the GROUP BY clause).

Window Functions

-- generate a "row_number" column containing incremental identifiers for each row


SELECT row_number() OVER ()
FROM sales;
-- compute the difference between the current amount, and the previous amount, by order of time
SELECT amount - lag(amount) OVER (ORDER BY time)
FROM sales;

Window functions are special functions that allow the computation of values relative to other rows in a result. Window functions are marked
by the OVER clause which contains the window specification. The window specification defines the frame or context in which the window
function is computed. See the window functions page for more information.

unnest Function

-- unnest an array by one level


SELECT unnest([1, 2, 3]);
-- unnest a struct by one level
SELECT unnest({'a': 42, 'b': 84});

The unnest function is a special function that can be used together with arrays, lists, or structs. The unnest function strips one level of
nesting from the type. For example, INT[] is transformed into INT. STRUCT(a INT, b INT) is transformed into a INT, b INT.
The unnest function can be used to transform nested types into regular scalar types, which makes them easier to operate on.

FROM & JOIN Clauses

The FROM clause specifies the source of the data on which the remainder of the query should operate. Logically, the FROM clause is where
the query starts execution. The FROM clause can contain a single table, a combination of multiple tables that are joined together using
JOIN clauses, or another SELECT query inside a subquery node. DuckDB also has an optional FROM‑first syntax which enables you to
also query without a SELECT statement.

Examples

-- select all columns from the table called "table_name"


SELECT * FROM table_name;
-- select all columns from the table using the FROM-first syntax
FROM table_name SELECT *;

366
DuckDB Documentation

-- select all columns using the FROM-first syntax and omitting the SELECT clause
FROM table_name;
-- select all columns from the table called "table_name" through an alias "tn"
SELECT tn.* FROM table_name tn;
-- select all columns from the table "table_name" in the schema "schema_name"
SELECT * FROM schema_name.table_name;
-- select the column "i" from the table function "range",
-- where the first column of the range function is renamed to "i"
SELECT t.i FROM range(100) AS t(i);
-- select all columns from the CSV file called "test.csv"
SELECT * FROM 'test.csv';
-- select all columns from a subquery
SELECT * FROM (SELECT * FROM table_name);
-- select the entire row of the table as a struct
SELECT t FROM t;
-- select the entire row of the subquery as a struct (i.e., a single column)
SELECT t FROM (SELECT unnest(generate_series(41, 43)) AS x, 'hello' AS y) t;
-- join two tables together
SELECT * FROM table_name JOIN other_table ON (table_name.key = other_table.key);
-- select a 10% sample from a table
SELECT * FROM table_name TABLESAMPLE 10%;
-- select a sample of 10 rows from a table
SELECT * FROM table_name TABLESAMPLE 10 ROWS;
-- use the FROM-first syntax with WHERE clause and aggregation
FROM range(100) AS t(i) SELECT sum(t.i) WHERE i % 2 = 0;

Joins

Joins are a fundamental relational operation used to connect two tables or relations horizontally. The relations are referred to as the left
and right sides of the join based on how they are written in the join clause. Each result row has the columns from both relations.

A join uses a rule to match pairs of rows from each relation. Often this is a predicate, but there are other implied rules that may be speci‑
fied.

Outer Joins Rows that do not have any matches can still be returned if an OUTER join is specified. Outer joins can be one of:

• LEFT (All rows from the left relation appear at least once)
• RIGHT (All rows from the right relation appear at least once)
• FULL (All rows from both relations appear at least once)

A join that is not OUTER is INNER (only rows that get paired are returned).

When an unpaired row is returned, the attributes from the other table are set to NULL.

Cross Product Joins The simplest type of join is a CROSS JOIN. There are no conditions for this type of join, and it just returns all the
possible pairs.

-- return all pairs of rows


SELECT a.*, b.* FROM a CROSS JOIN b;

Conditional Joins Most joins are specified by a predicate that connects attributes from one side to attributes from the other side. The
conditions can be explicitly specified using an ON clause with the join (clearer) or implied by the WHERE clause (old‑fashioned).

We use the l_regions and the l_nations tables from the TPC‑H schema:

367
DuckDB Documentation

CREATE TABLE l_regions (


r_regionkey INTEGER NOT NULL PRIMARY KEY,
r_name CHAR(25) NOT NULL,
r_comment VARCHAR(152)
);

CREATE TABLE l_nations (


n_nationkey INTEGER NOT NULL PRIMARY KEY,
n_name CHAR(25) NOT NULL,
n_regionkey INTEGER NOT NULL,
n_comment VARCHAR(152),
FOREIGN KEY (n_regionkey) REFERENCES l_regions(r_regionkey)
);

-- return the regions for the nations


SELECT n.*, r.*
FROM l_nations n JOIN l_regions r ON (n_regionkey = r_regionkey);

If the column names are the same and are required to be equal, then the simpler USING syntax can be used:

CREATE TABLE l_regions (regionkey INTEGER NOT NULL PRIMARY KEY,


name CHAR(25) NOT NULL,
comment VARCHAR(152));

CREATE TABLE l_nations (nationkey INTEGER NOT NULL PRIMARY KEY,


name CHAR(25) NOT NULL,
regionkey INTEGER NOT NULL,
comment VARCHAR(152),
FOREIGN KEY (regionkey) REFERENCES l_regions(regionkey));

-- return the regions for the nations


SELECT n.*, r.*
FROM l_nations n JOIN l_regions r USING (regionkey);

The expressions to not have to be equalities ‑ any predicate can be used:

-- return the pairs of jobs where one ran longer but cost less
SELECT s1.t_id, s2.t_id
FROM west s1, west s2
WHERE s1.time > s2.time
AND s1.cost < s2.cost;

Semi and Anti Joins Semi joins return rows from the left table that have at least one match in the right table. Anti joins return rows from
the left table that have no matches in the right table. When using a semi or anti join the result will never have more rows than the left hand
side table. Semi and anti joins provide the same logic as (NOT) IN statements.

-- return a list of cars that have a valid region.


SELECT cars.name, cars.manufacturer
FROM cars
SEMI JOIN region
ON cars.region = region.id;

-- return a list of cars with no recorded safety data.


SELECT cars.name, cars.manufacturer
FROM cars
ANTI JOIN safety_data
ON cars.safety_report_id = safety_data.report_id;

368
DuckDB Documentation

Lateral Joins The LATERAL keyword allows subqueries in the FROM clause to refer to previous subqueries. This feature is also known
as a lateral join.

SELECT *
FROM range(3) t(i), LATERAL (SELECT i + 1) t2(j);

┌───────┬───────┐
│ i │ j │
│ int64 │ int64 │
├───────┼───────┤
│ 0 │ 1 │
│ 1 │ 2 │
│ 2 │ 3 │
└───────┴───────┘

Lateral joins are a generalization of correlated subqueries, as they can return multiple values per input value rather than only a single
value.

SELECT *
FROM
generate_series(0, 1) t(i),
LATERAL (SELECT i + 10 UNION ALL SELECT i + 100) t2(j);

┌───────┬───────┐
│ i │ j │
│ int64 │ int64 │
├───────┼───────┤
│ 0 │ 10 │
│ 1 │ 11 │
│ 0 │ 100 │
│ 1 │ 101 │
└───────┴───────┘

It may be helpful to think about LATERAL as a loop where we iterate through the rows of the first subquery and use it as input to the second
(LATERAL) subquery. In the examples above, we iterate through table t and refer to its column i from the definition of table t2. The rows
of t2 form column j in the result.

It is possible to refer to multiple attributes from the LATERAL subquery. Using the table from the first example:

CREATE TABLE t1 AS SELECT * FROM range(3) t(i), LATERAL (SELECT i + 1) t2(j);


SELECT * FROM t1, LATERAL (SELECT i + j) t2(k) ORDER BY ALL;

┌───────┬───────┬───────┐
│ i │ j │ k │
│ int64 │ int64 │ int64 │
├───────┼───────┼───────┤
│ 0 │ 1 │ 1 │
│ 1 │ 2 │ 3 │
│ 2 │ 3 │ 5 │
└───────┴───────┴───────┘

Note. DuckDB detects when LATERAL joins should be used, making the use of the LATERAL keyword optional.

Positional Joins When working with data frames or other embedded tables of the same size, the rows may have a natural correspondence
based on their physical order. In scripting languages, this is easily expressed using a loop:

for (i = 0; i < n; i++) {


f(t1.a[i], t2.b[i])
}

369
DuckDB Documentation

It is difficult to express this in standard SQL because relational tables are not ordered, but imported tables (like data frames) or disk files
(like CSVs or Parquet files) do have a natural ordering.

Connecting them using this ordering is called a positional join:

-- treat two data frames as a single table


SELECT df1.*, df2.*
FROM df1
POSITIONAL JOIN df2;

Positional joins are always FULL OUTER joins.

As‑Of Joins A common operation when working with temporal or similarly‑ordered data is to find the nearest (first) event in a reference
table (such as prices). This is called an as‑of join:

-- attach prices to stock trades


SELECT t.*, p.price
FROM trades t
ASOF JOIN prices p
ON t.symbol = p.symbol AND t.when >= p.when;

The ASOF join requires at least one inequality condition on the ordering field. The inequality can be any inequality condition (>=, >, <=,
<) on any data type, but the most common form is >= on a temporal type. Any other conditions must be equalities (or NOT DISTINCT).
This means that the left/right order of the tables is significant.

ASOF joins each left side row with at most one right side row. It can be specified as an OUTER join to find unpaired rows (e.g., trades without
prices or prices which have no trades.)

-- attach prices or NULLs to stock trades


SELECT *
FROM trades t
ASOF LEFT JOIN prices p
ON t.symbol = p.symbol AND t.when >= p.when;

ASOF joins can also specify join conditions on matching column names with the USING syntax, but the last attribute in the list must be the
inequality, which will be greater than or equal to (>=):

SELECT *
FROM trades t
ASOF JOIN prices p USING (symbol, "when");
-- Returns symbol, trades.when, price (but NOT prices.when)

If you combine USING with a SELECT * like this, the query will return the left side (probe) column values for the matches, not the right
side (build) column values. To get the prices times in the example, you will need to list the columns explicitly:

SELECT t.symbol, t.when AS trade_when, p.when AS price_when, price


FROM trades t
ASOF LEFT JOIN prices p USING (symbol, "when");

Syntax

WHERE Clause

The WHERE clause specifies any filters to apply to the data. This allows you to select only a subset of the data in which you are interested.
Logically the WHERE clause is applied immediately after the FROM clause.

370
DuckDB Documentation

Examples

-- select all rows that have id equal to 3


SELECT *
FROM table_name
WHERE id = 3;

-- select all rows that match the given case-insensitive LIKE expression
SELECT *
FROM table_name
WHERE name ILIKE '%mark%';

-- select all rows that match the given composite expression


SELECT *
FROM table_name
WHERE id = 3 OR id = 7;

Syntax

GROUP BY Clause

The GROUP BY clause specifies which grouping columns should be used to perform any aggregations in the SELECT clause. If the GROUP
BY clause is specified, the query is always an aggregate query, even if no aggregations are present in the SELECT clause.

When a GROUP BY clause is specified, all tuples that have matching data in the grouping columns (i.e., all tuples that belong to the same
group) will be combined. The values of the grouping columns themselves are unchanged, and any other columns can be combined using
an aggregate function (such as count, sum, avg, etc).

GROUP BY ALL

Use GROUP BY ALL to GROUP BY all columns in the SELECT statement that are not wrapped in aggregate functions. This simplifies the
syntax by allowing the columns list to be maintained in a single location, and prevents bugs by keeping the SELECT granularity aligned to
the GROUP BY granularity (Ex: Prevents any duplication). See examples below and additional examples in the Friendlier SQL with DuckDB
blog post.

Multiple Dimensions

Normally, the GROUP BY clause groups along a single dimension. Using the GROUPING SETS, CUBE or ROLLUP clauses it is possible to
group along multiple dimensions. See the GROUPING SETS page for more information.

Examples

-- count the number of entries in the "addresses" table that belong to each different city
SELECT city, count(*)
FROM addresses
GROUP BY city;

-- compute the average income per city per street_name


SELECT city, street_name, avg(income)
FROM addresses
GROUP BY city, street_name;

371
DuckDB Documentation

GROUP BY ALL Examples


-- Group by city and street_name to remove any duplicate values
SELECT city, street_name
FROM addresses
GROUP BY ALL;
-- GROUP BY city, street_name

-- compute the average income per city per street_name


-- Since income is wrapped in an aggregate function, do not include it in the GROUP BY
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY ALL;
-- GROUP BY city, street_name

Syntax

GROUPING SETS

GROUPING SETS, ROLLUP and CUBE can be used in the GROUP BY clause to perform a grouping over multiple dimensions within the
same query. Note that this syntax is not compatible with GROUP BY ALL.

Examples

-- compute the average income along the provided four different dimensions
-- () signifies the empty set (i.e., computing an ungrouped aggregate)
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY GROUPING SETS ((city, street_name), (city), (street_name), ());

-- compute the average income along the same dimensions


SELECT city, street_name, avg(income)
FROM addresses
GROUP BY CUBE (city, street_name);

-- compute the average income along the dimensions (city, street_name), (city) and ()
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY ROLLUP (city, street_name);

Description

GROUPING SETS perform the same aggregate across different GROUP BY clauses in a single query.

CREATE TABLE students (course VARCHAR, type VARCHAR);


INSERT INTO students (course, type)
VALUES
('CS', 'Bachelor'), ('CS', 'Bachelor'), ('CS', 'PhD'), ('Math', 'Masters'),
('CS', NULL), ('CS', NULL), ('Math', NULL);

SELECT course, type, count(*)


FROM students
GROUP BY GROUPING SETS ((course, type), course, type, ());

┌────────┬──────────┬──────────────┐
│ course │ type │ count_star() │
├────────┼──────────┼──────────────┤
│ CS │ Bachelor │ 2 │

372
DuckDB Documentation

│ CS │ PhD │ 1 │
│ Math │ Masters │ 1 │
│ CS │ NULL │ 2 │
│ Math │ NULL │ 1 │
│ CS │ NULL │ 5 │
│ Math │ NULL │ 2 │
│ NULL │ Bachelor │ 2 │
│ NULL │ PhD │ 1 │
│ NULL │ Masters │ 1 │
│ NULL │ NULL │ 3 │
│ NULL │ NULL │ 7 │
└────────┴──────────┴──────────────┘

In the above query, we group across four different sets: course, type, course, type and () (the empty group). The result contains
NULL for a group which is not in the grouping set for the result, i.e., the above query is equivalent to the following UNION statement:

-- group by course, type


SELECT course, type, count(*)
FROM students
GROUP BY course, type
UNION ALL
-- group by type
SELECT NULL AS course, type, count(*)
FROM students
GROUP BY type
UNION ALL
-- group by course
SELECT course, NULL AS type, count(*)
FROM students
GROUP BY course
UNION ALL
-- group by nothing
SELECT NULL AS course, NULL AS type, count(*)
FROM students;

CUBE and ROLLUP are syntactic sugar to easily produce commonly used grouping sets.

The ROLLUP clause will produce all ”sub‑groups” of a grouping set, e.g., ROLLUP (country, city, zip) produces the grouping
sets (country, city, zip), (country, city), (country), (). This can be useful for producing different levels of detail
of a group by clause. This produces n+1 grouping sets where n is the amount of terms in the ROLLUP clause.

CUBE produces grouping sets for all combinations of the inputs, e.g., CUBE (country, city, zip) will produce (country, city,
zip), (country, city), (country, zip), (city, zip), (country), (city), (zip), (). This produces 2^n
grouping sets.

Identifying Grouping Sets with GROUPING_ID()

The super‑aggregate rows generated by GROUPING SETS, ROLLUP and CUBE can often be identified by NULL‑values returned for the
respective column in the grouping. But if the columns used in the grouping can themselves contain actual NULL‑values, then it can be
challenging to distinguish whether the value in the resultset is a ”real” NULL‑value coming out of the data itself, or a NULL‑value generated
by the grouping construct. The GROUPING_ID() or GROUPING() function is designed to identify which groups generated the super‑
aggregate rows in the result.

GROUPING_ID() is an aggregate function that takes the column expressions that make up the grouping(s). It returns a BIGINT value.
The return value is 0 for the rows that are not super‑aggregate rows. But for the super‑aggregate rows, it returns an integer value that
identifies the combination of expressions that make up the group for which the super‑aggregate is generated. At this point, an example
might help. Consider the following query:

373
DuckDB Documentation

WITH days AS (
SELECT
year("generate_series") AS y,
quarter("generate_series") AS q,
month("generate_series") AS m
FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY)
)
SELECT y, q, m, GROUPING_ID(y, q, m) AS "grouping_id()"
FROM days
GROUP BY GROUPING SETS (
(y, q, m),
(y, q),
(y),
()
)
ORDER BY y, q, m;

These are the results:

┌───────┬───────┬───────┬─────────────┐
│ y │ q │ m │ grouping_id │
│ int64 │ int64 │ int64 │ int64 │
├───────┼───────┼───────┼─────────────┤
│ 2023 │ 1 │ 1 │ 0 │
│ 2023 │ 1 │ 2 │ 0 │
│ 2023 │ 1 │ 3 │ 0 │
│ 2023 │ 1 │ NULL │ 1 │
│ 2023 │ 2 │ 4 │ 0 │
│ 2023 │ 2 │ 5 │ 0 │
│ 2023 │ 2 │ 6 │ 0 │
│ 2023 │ 2 │ NULL │ 1 │
│ 2023 │ 3 │ 7 │ 0 │
│ 2023 │ 3 │ 8 │ 0 │
│ 2023 │ 3 │ 9 │ 0 │
│ 2023 │ 3 │ NULL │ 1 │
│ 2023 │ 4 │ 10 │ 0 │
│ 2023 │ 4 │ 11 │ 0 │
│ 2023 │ 4 │ 12 │ 0 │
│ 2023 │ 4 │ NULL │ 1 │
│ 2023 │ NULL │ NULL │ 3 │
│ NULL │ NULL │ NULL │ 7 │
├───────┴───────┴───────┴─────────────┤
│ 18 rows 4 columns │
└─────────────────────────────────────┘

In this example, the lowest level of grouping is at the month level, defined by the grouping set (y, q, m). Result rows corresponding to
that level are simply aggregate rows and the GROUPING_ID(y, q, m) function returns 0 for those. The grouping set (y, q) results
in super‑aggregate rows over the month level, leaving a NULL‑value for the m column, and for which GROUPING_ID(y, q, m) returns
1. The grouping set (y) results in super‑aggregate rows over the quarter level, leaving NULL‑values for the m and q column, for which
GROUPING_ID(y, q, m) returns 3. Finally, the () grouping set results in one super‑aggregate row for the entire resultset, leaving
NULL‑values for y, q and m and for which GROUPING_ID(y, q, m) returns 7.

To understand the relationship between the return value and the grouping set, you can think of GROUPING_ID(y, q, m) writing to a
bitfield, where the first bit corresponds to the last expression passed to GROUPING_ID(), the second bit to the one‑but‑last expression
passed to GROUPING_ID(), and so on. This may become clearer by casting GROUPING_ID() to BIT:

WITH days AS (
SELECT
year("generate_series") AS y,
quarter("generate_series") AS q,

374
DuckDB Documentation

month("generate_series") AS m
FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY)
)
SELECT
y, q, m,
GROUPING_ID(y, q, m) AS "grouping_id(y, q, m)",
right(GROUPING_ID(y, q, m)::BIT, 3) AS "y_q_m_bits"
FROM days
GROUP BY GROUPING SETS (
(y, q, m),
(y, q),
(y),
()
)
ORDER BY y, q, m;

Which returns these results:

┌───────┬───────┬───────┬──────────────────────┬────────────┐
│ y │ q │ m │ grouping_id(y, q, m) │ y_q_m_bits │
│ int64 │ int64 │ int64 │ int64 │ varchar │
├───────┼───────┼───────┼──────────────────────┼────────────┤
│ 2023 │ 1 │ 1 │ 0 │ 000 │
│ 2023 │ 1 │ 2 │ 0 │ 000 │
│ 2023 │ 1 │ 3 │ 0 │ 000 │
│ 2023 │ 1 │ NULL │ 1 │ 001 │
│ 2023 │ 2 │ 4 │ 0 │ 000 │
│ 2023 │ 2 │ 5 │ 0 │ 000 │
│ 2023 │ 2 │ 6 │ 0 │ 000 │
│ 2023 │ 2 │ NULL │ 1 │ 001 │
│ 2023 │ 3 │ 7 │ 0 │ 000 │
│ 2023 │ 3 │ 8 │ 0 │ 000 │
│ 2023 │ 3 │ 9 │ 0 │ 000 │
│ 2023 │ 3 │ NULL │ 1 │ 001 │
│ 2023 │ 4 │ 10 │ 0 │ 000 │
│ 2023 │ 4 │ 11 │ 0 │ 000 │
│ 2023 │ 4 │ 12 │ 0 │ 000 │
│ 2023 │ 4 │ NULL │ 1 │ 001 │
│ 2023 │ NULL │ NULL │ 3 │ 011 │
│ NULL │ NULL │ NULL │ 7 │ 111 │
├───────┴───────┴───────┴──────────────────────┴────────────┤
│ 18 rows 5 columns │
└───────────────────────────────────────────────────────────┘

Note that the number of expressions passed to GROUPING_ID(), or the order in which they are passed is independent from the actual
group definitions appearing in the GROUPING SETS‑clause (or the groups implied by ROLLUP and CUBE). As long as the expressions
passed to GROUPING_ID() are expressions that appear some where in the GROUPING SETS‑clause, GROUPING_ID() will set a bit
corresponding to the position of the expression whenever that expression is rolled up to a super‑aggregate.

Syntax

HAVING Clause

The HAVING clause can be used after the GROUP BY clause to provide filter criteria after the grouping has been completed. In terms of
syntax the HAVING clause is identical to the WHERE clause, but while the WHERE clause occurs before the grouping, the HAVING clause
occurs after the grouping.

375
DuckDB Documentation

Examples

-- count the number of entries in the "addresses" table that belong to each different city
-- filtering out cities with a count below 50
SELECT city, count(*)
FROM addresses
GROUP BY city
HAVING count(*) >= 50;

-- compute the average income per city per street_name


-- filtering out cities with an average income bigger than twice the median income
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY city, street_name
HAVING avg(income) > 2 * median(income);

Syntax

ORDER BY Clause

ORDER BY is an output modifier. Logically it is applied near the very end of the query (just prior to LIMIT or OFFSET, if present). The
ORDER BY clause sorts the rows on the sorting criteria in either ascending or descending order. In addition, every order clause can specify
whether NULL values should be moved to the beginning or to the end.

The ORDER BY clause may contain one or more expressions, separated by commas. An error will be thrown if no expressions are included,
since the ORDER BY clause should be removed in that situation. The expressions may begin with either an arbitrary scalar expression
(which could be a column name), a column position number (Ex: 1. Note that it is 1‑indexed), or the keyword ALL. Each expression can
optionally be followed by an order modifier (ASC or DESC, default is ASC), and/or a NULL order modifier (NULLS FIRST or NULLS LAST,
default is NULLS LAST).

ORDER BY ALL

The ALL keyword indicates that the output should be sorted by every column in order from left to right. The direction of this sort may be
modified using either ORDER BY ALL ASC or ORDER BY ALL DESC and/or NULLS FIRST or NULLS LAST. Note that ALL may not
be used in combination with other expressions in the ORDER BY clause ‑ it must be by itself. See examples below.

NULL Order Modifier

By default if no modifiers are provided, DuckDB sorts ASC NULLS LAST, i.e., the values are sorted in ascending order and null values are
placed last. This is identical to the default sort order of PostgreSQL. The default sort order can be changed with the following configuration
options.

Note. Using ASC NULLS LAST as default the default sorting order was a breaking change in version 0.8.0. Prior to 0.8.0, DuckDB
sorted using ASC NULLS FIRST.

-- change the default null sorting order to either NULLS FIRST and NULLS LAST
SET default_null_order = 'NULLS FIRST';
-- change the default sorting order to either DESC or ASC
SET default_order = 'DESC';

Collations

Text is sorted using the binary comparison collation by default, which means values are sorted on their binary UTF‑8 values. While this
works well for ASCII text (e.g., for English language data), the sorting order can be incorrect for other languages. For this purpose, DuckDB
provides collations. For more information on collations, see the Collation page.

376
DuckDB Documentation

Examples

All examples use this example table:

CREATE OR REPLACE TABLE addresses AS


SELECT '123 Quack Blvd' AS address, 'DuckTown' AS city, '11111' AS zip
UNION ALL
SELECT '111 Duck Duck Goose Ln', 'DuckTown', '11111'
UNION ALL
SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111'
UNION ALL
SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111-0001';

-- select the addresses, ordered by city name using the default null order and default order
SELECT *
FROM addresses
ORDER BY city;

-- select the addresses, ordered by city name in descending order with nulls at the end
SELECT *
FROM addresses
ORDER BY city DESC NULLS LAST;

-- order by city and then by zip code, both using the default orderings
SELECT *
FROM addresses
ORDER BY city, zip;

-- order by city using german collation rules


SELECT *
FROM addresses
ORDER BY city COLLATE DE;

ORDER BY ALL Examples


-- Order from left to right (by address, then by city, then by zip) in ascending order
SELECT *
FROM addresses
ORDER BY ALL;

address city zip

111 Duck Duck Goose Ln Duck Town 11111


111 Duck Duck Goose Ln Duck Town 11111‑0001
111 Duck Duck Goose Ln DuckTown 11111
123 Quack Blvd DuckTown 11111

-- Order from left to right (by address, then by city, then by zip) in descending order
SELECT *
FROM addresses
ORDER BY ALL DESC;

address city zip

123 Quack Blvd DuckTown 11111


111 Duck Duck Goose Ln DuckTown 11111

377
DuckDB Documentation

address city zip

111 Duck Duck Goose Ln Duck Town 11111‑0001


111 Duck Duck Goose Ln Duck Town 11111

Syntax

LIMIT Clause

LIMIT is an output modifier. Logically it is applied at the very end of the query. The LIMIT clause restricts the amount of rows fetched.
The OFFSET clause indicates at which position to start reading the values, i.e., the first OFFSET values are ignored.

Note that while LIMIT can be used without an ORDER BY clause, the results might not be deterministic without the ORDER BY clause.
This can still be useful, however, for example when you want to inspect a quick snapshot of the data.

Examples

-- select the first 5 rows from the addresses table


SELECT *
FROM addresses
LIMIT 5;

-- select the 5 rows from the addresses table, starting at position 5 (i.e., ignoring the first 5 rows)
SELECT *
FROM addresses
LIMIT 5
OFFSET 5;

-- select the top 5 cities with the highest population


SELECT city, count(*) AS population
FROM addresses
GROUP BY city
ORDER BY population DESC
LIMIT 5;

Syntax

SAMPLE Clause

The SAMPLE clause allows you to run the query on a sample from the base table. This can significantly speed up processing of queries,
at the expense of accuracy in the result. Samples can also be used to quickly see a snapshot of the data when exploring a data set. The
sample clause is applied right after anything in the FROM clause (i.e., after any joins, but before the WHERE clause or any aggregates). See
the SAMPLE page for more information.

Examples

-- select a sample of 1% of the addresses table using default (system) sampling


SELECT *
FROM addresses
USING SAMPLE 1%;

-- select a sample of 1% of the addresses table using bernoulli sampling


SELECT *
FROM addresses
USING SAMPLE 1% (bernoulli);

378
DuckDB Documentation

-- select a sample of 10 rows from the subquery


SELECT *
FROM (SELECT * FROM addresses)
USING SAMPLE 10 ROWS;

Syntax

Unnesting

Examples

-- unnest a list, generating 3 rows (1, 2, 3)


SELECT unnest([1, 2, 3]);
-- unnesting a struct, generating two columns (a, b)
SELECT unnest({'a': 42, 'b': 84});
-- recursive unnest of a list of structs
SELECT unnest([{'a': 42, 'b': 84}, {'a': 100, 'b': NULL}], recursive := true);

The unnest function is used to unnest lists or structs by one level. The function can be used as a regular scalar function, but only in the
SELECT clause. Invoking unnest with the recursive parameter will unnest lists and structs of multiple levels.

Unnesting Lists

-- unnest a list, generating 3 rows (1, 2, 3)


SELECT unnest([1, 2, 3]);
-- unnest a scalar list, generating 3 rows ((1, 10), (2, 11), (3, NULL))
SELECT unnest([1, 2, 3]), unnest([10, 11]);
-- unnest a scalar list, generating 3 rows ((1, 10), (2, 10), (3, 10))
SELECT unnest([1, 2, 3]), 10;
-- unnest a list column generated from a subquery
SELECT unnest(l) + 10 FROM (VALUES ([1, 2, 3]), ([4, 5])) tbl(l);
-- empty result
SELECT unnest([]);
-- empty result
SELECT unnest(NULL);

Using unnest on a list will emit one tuple per entry in the list. When unnest is combined with regular scalar expressions, those ex‑
pressions are repeated for every entry in the list. When multiple lists are unnested in the same SELECT clause, the lists are unnested
side‑by‑side. If one list is longer than the other, the shorter list will be padded with NULL values.

An empty list and a NULL list will both unnest to zero elements.

Unnesting Structs

-- unnesting a struct, generating two columns (a, b)


SELECT unnest({'a': 42, 'b': 84});
-- unnesting a struct, generating two columns (a, b)
SELECT unnest({'a': 42, 'b': {'x': 84}});

unnest on a struct will emit one column per entry in the struct.

Recursive Unnest

-- unnesting a list of lists recursively, generating 5 rows (1, 2, 3, 4, 5)


SELECT unnest([[1, 2, 3], [4, 5]], recursive := true);
-- unnesting a list of structs recursively, generating two rows of two columns (a, b)
SELECT unnest([{'a': 42, 'b': 84}, {'a': 100, 'b': NULL}], recursive := true);

379
DuckDB Documentation

-- unnesting a struct, generating two columns (a, b)


SELECT unnest({'a': [1, 2, 3], 'b': 88}, recursive := true);

Calling unnest with the recursive setting will fully unnest lists, followed by fully unnesting structs. This can be useful to fully flatten
columns that contain lists within lists, or lists of structs. Note that lists within structs are not unnested.

WITH Clause

The WITH clause allows you to specify common table expressions (CTEs). Regular (non‑recursive) common‑table‑expressions are essen‑
tially views that are limited in scope to a particular query. CTEs can reference each‑other and can be nested.

Basic CTE Examples

-- create a CTE called "cte" and use it in the main query


WITH cte AS (SELECT 42 AS x)
SELECT * FROM cte;

┌────┐
│ x │
├────┤
│ 42 │
└────┘

-- create two CTEs, where the second CTE references the first CTE
WITH cte AS (SELECT 42 AS i),
cte2 AS (SELECT i*100 AS x FROM cte)
SELECT * FROM cte2;

┌──────┐
│ x │
├──────┤
│ 4200 │
└──────┘

Materialized CTEs

By default, CTEs are inlined into the main query. Inlining can result in duplicate work, because the definition is copied for each reference.
Take this query for example:

WITH t(x) AS ( Q_t )


SELECT * FROM t AS t1,
t AS t2,
t AS t3;

Inlining duplicates the definition of t for each reference which results in the following query:

SELECT * FROM ( Q_t ) AS t1(x),


( Q_t ) AS t2(x),
( Q_t ) AS t3(x);

If Q_t is expensive, materializing it with the MATERIALIZED keyword can improve performance. In this case, Q_t is evaluated only
once.

WITH t(x) AS MATERIALIZED ( Q_t )


SELECT * FROM t AS t1,
t AS t2,
t AS t3;

380
DuckDB Documentation

Recursive CTEs

WITH RECURSIVE allows the definition of CTEs which can refer to themselves. Note that the query must be formulated in a way that
ensures termination, otherwise, it may run into an infinite loop.

Tree Traversal WITH RECURSIVE can be used to traverse trees. For example, take a hierarchy of tags:

CREATE TABLE tag (id INT, name VARCHAR, subclassof INT);


INSERT INTO tag VALUES
(1, 'U2', 5),
(2, 'Blur', 5),
(3, 'Oasis', 5),
(4, '2Pac', 6),
(5, 'Rock', 7),
(6, 'Rap', 7),
(7, 'Music', 9),
(8, 'Movies', 9),
(9, 'Art', NULL);

The following query returns the path from the node Oasis to the root of the tree (Art).

WITH RECURSIVE tag_hierarchy(id, source, path) AS (


SELECT id, name, [name] AS path
FROM tag
WHERE subclassof IS NULL

381
DuckDB Documentation

UNION ALL
SELECT tag.id, tag.name, list_prepend(tag.name, tag_hierarchy.path)
FROM tag, tag_hierarchy
WHERE tag.subclassof = tag_hierarchy.id
)
SELECT path
FROM tag_hierarchy
WHERE source = 'Oasis';

┌───────────────────────────┐
│ path │
├───────────────────────────┤
│ [Oasis, Rock, Music, Art] │
└───────────────────────────┘

Graph Traversal The WITH RECURSIVE clause can be used to express graph traversal on arbitrary graphs. However, if the graph has
cycles, the query must perform cycle detection to prevent infinite loops. One way to achieve this is to store the path of a traversal in a list
and, before extending the path with a new edge, check whether its endpoint has been visited before (see the example later).

Take the following directed graph from the LDBC Graphalytics benchmark:

CREATE TABLE edge (node1id INT, node2id INT);


INSERT INTO edge
VALUES
(1, 3), (1, 5), (2, 4), (2, 5), (2, 10), (3, 1), (3, 5), (3, 8), (3, 10),
(5, 3), (5, 4), (5, 8), (6, 3), (6, 4), (7, 4), (8, 1), (9, 4);

Note that the graph contains directed cycles, e.g., between nodes 1, 2, and 5.

382
DuckDB Documentation

Enumerate All Paths from a Node The following query returns all paths starting in node 1:

WITH RECURSIVE paths(startNode, endNode, path) AS (


SELECT -- define the path as the first edge of the traversal
node1id AS startNode,
node2id AS endNode,
[node1id, node2id] AS path
FROM edge
WHERE startNode = 1
UNION ALL
SELECT -- concatenate new edge to the path
paths.startNode AS startNode,
node2id AS endNode,
array_append(path, node2id) AS path
FROM paths
JOIN edge ON paths.endNode = node1id
-- Prevent adding a repeated node to the path.
-- This ensures that no cycles occur.
WHERE node2id != ALL(paths.path)
)
SELECT startNode, endNode, path
FROM paths
ORDER BY length(path), path;

┌───────────┬─────────┬───────────────┐
│ startNode │ endNode │ path │
├───────────┼─────────┼───────────────┤
│ 1 │ 3 │ [1, 3] │
│ 1 │ 5 │ [1, 5] │
│ 1 │ 5 │ [1, 3, 5] │
│ 1 │ 8 │ [1, 3, 8] │
│ 1 │ 10 │ [1, 3, 10] │
│ 1 │ 3 │ [1, 5, 3] │
│ 1 │ 4 │ [1, 5, 4] │
│ 1 │ 8 │ [1, 5, 8] │
│ 1 │ 4 │ [1, 3, 5, 4] │
│ 1 │ 8 │ [1, 3, 5, 8] │
│ 1 │ 8 │ [1, 5, 3, 8] │
│ 1 │ 10 │ [1, 5, 3, 10] │
└───────────┴─────────┴───────────────┘

Note that the result of this query is not restricted to shortest paths, e.g., for node 5, the results include paths [1, 5] and [1, 3, 5].

Enumerate Unweighted Shortest Paths from a Node In most cases, enumerating all paths is not practical or feasible. Instead, only the
(unweighted) shortest paths are of interest. To find these, the second half of the WITH RECURSIVE query should be adjusted such
that it only includes a node if it has not yet been visited. This is implemented by using a subquery that checks if any of the previous paths
includes the node:

WITH RECURSIVE paths(startNode, endNode, path) AS (


SELECT -- define the path as the first edge of the traversal
node1id AS startNode,
node2id AS endNode,
[node1id, node2id] AS path
FROM edge
WHERE startNode = 1
UNION ALL
SELECT -- concatenate new edge to the path
paths.startNode AS startNode,
node2id AS endNode,

383
DuckDB Documentation

array_append(path, node2id) AS path


FROM paths
JOIN edge ON paths.endNode = node1id
-- Prevent adding a node that was visited previously by any path.
-- This ensures that (1) no cycles occur and (2) only nodes that
-- were not visited by previous (shorter) paths are added to a path.
WHERE NOT EXISTS (SELECT 1
FROM paths previous_paths
WHERE list_contains(previous_paths.path, node2id))
)
SELECT startNode, endNode, path
FROM paths
ORDER BY length(path), path;

┌───────────┬─────────┬────────────┐
│ startNode │ endNode │ path │
├───────────┼─────────┼────────────┤
│ 1 │ 3 │ [1, 3] │
│ 1 │ 5 │ [1, 5] │
│ 1 │ 8 │ [1, 3, 8] │
│ 1 │ 10 │ [1, 3, 10] │
│ 1 │ 4 │ [1, 5, 4] │
│ 1 │ 8 │ [1, 5, 8] │
└───────────┴─────────┴────────────┘

Enumerate Unweighted Shortest Paths between Two Nodes WITH RECURSIVE can also be used to find all (unweighted) shortest
paths between two nodes. To ensure that the recursive query is stopped as soon as we reach the end node, we use a window function
which checks whether the end node is among the newly added nodes.

The following query returns all unweighted shortest paths between nodes 1 (start node) and 8 (end node):

WITH RECURSIVE paths(startNode, endNode, path, endReached) AS (


SELECT -- define the path as the first edge of the traversal
node1id AS startNode,
node2id AS endNode,
[node1id, node2id] AS path,
(node2id = 8) AS endReached
FROM edge
WHERE startNode = 1
UNION ALL
SELECT -- concatenate new edge to the path
paths.startNode AS startNode,
node2id AS endNode,
array_append(path, node2id) AS path,
max(CASE WHEN node2id = 8 THEN 1 ELSE 0 END)
OVER (ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS endReached
FROM paths
JOIN edge ON paths.endNode = node1id
WHERE NOT EXISTS (SELECT 1
FROM paths previous_paths
WHERE list_contains(previous_paths.path, node2id))
AND paths.endReached = 0
)
SELECT startNode, endNode, path
FROM paths
WHERE endNode = 8
ORDER BY length(path), path;

384
DuckDB Documentation

┌───────────┬─────────┬───────────┐
│ startNode │ endNode │ path │
├───────────┼─────────┼───────────┤
│ 1 │ 8 │ [1, 3, 8] │
│ 1 │ 8 │ [1, 5, 8] │
└───────────┴─────────┴───────────┘

Common Table Expressions

WINDOW Clause

The WINDOW clause allows you to specify named windows that can be used within window functions. These are useful when you have
multiple window functions, as they allow you to avoid repeating the same window clause.

Syntax

QUALIFY Clause

The QUALIFY clause is used to filter the results of WINDOW functions. This filtering of results is similar to how a HAVING clause filters the
results of aggregate functions applied based on the GROUP BY clause.

The QUALIFY clause avoids the need for a subquery or WITH clause to perform this filtering (much like HAVING avoids a subquery). An
example using a WITH clause instead of QUALIFY is included below the QUALIFY examples.

Note that this is filtering based on WINDOW functions, not necessarily based on the WINDOW clause. The WINDOW clause is optional and
can be used to simplify the creation of multiple WINDOW function expressions.

The position of where to specify a QUALIFY clause is following the WINDOW clause in a SELECT statement (WINDOW does not need to be
specified), and before the ORDER BY.

Examples

Each of the following examples produce the same output, located below.

-- Filter based on a WINDOW function defined in the QUALIFY clause


SELECT
schema_name,
function_name,
-- In this example the function_rank column in the select clause is for reference
row_number() OVER (PARTITION BY schema_name ORDER BY function_name) AS function_rank
FROM duckdb_functions()
QUALIFY
row_number() OVER (PARTITION BY schema_name ORDER BY function_name) < 3;

-- Filter based on a WINDOW function defined in the SELECT clause


SELECT
schema_name,
function_name,
row_number() OVER (PARTITION BY schema_name ORDER BY function_name) AS function_rank
FROM duckdb_functions()
QUALIFY
function_rank < 3;

-- Filter based on a WINDOW function defined in the QUALIFY clause, but using the WINDOW clause
SELECT
schema_name,
function_name,

385
DuckDB Documentation

-- In this example the function_rank column in the select clause is for reference
row_number() OVER my_window AS function_rank
FROM duckdb_functions()
WINDOW
my_window AS (PARTITION BY schema_name ORDER BY function_name)
QUALIFY
row_number() OVER my_window < 3;

-- Filter based on a WINDOW function defined in the SELECT clause, but using the WINDOW clause
SELECT
schema_name,
function_name,
row_number() OVER my_window AS function_rank
FROM duckdb_functions()
WINDOW
my_window AS (PARTITION BY schema_name ORDER BY function_name)
QUALIFY
function_rank < 3;

-- Equivalent query based on a WITH clause (without QUALIFY clause)


WITH ranked_functions AS (
SELECT
schema_name,
function_name,
row_number() OVER (PARTITION BY schema_name ORDER BY function_name) AS function_rank
FROM duckdb_functions()
)
SELECT
*
FROM ranked_functions
WHERE
function_rank < 3;

schema_name function_name function_rank

main !__postfix 1
main !~~ 2
pg_catalog col_description 1
pg_catalog format_pg_type 2

Syntax

VALUES Clause

The VALUES clause is used to specify a fixed number of rows. The VALUES clause can be used as a stand‑alone statement, as part of the
FROM clause, or as input to an INSERT INTO statement.

Examples

-- generate two rows and directly return them


VALUES ('Amsterdam', 1), ('London', 2);
-- generate two rows as part of a FROM clause, and rename the columns
SELECT * FROM (VALUES ('Amsterdam', 1), ('London', 2)) Cities(Name, Id);
-- generate two rows and insert them into a table

386
DuckDB Documentation

INSERT INTO Cities VALUES ('Amsterdam', 1), ('London', 2);


-- create a table directly from a VALUES clause
CREATE TABLE Cities AS SELECT * FROM (VALUES ('Amsterdam', 1), ('London', 2)) Cities(Name, Id);

Syntax

FILTER Clause

The FILTER clause may optionally follow an aggregate function in a SELECT statement. This will filter the rows of data that are fed into
the aggregate function in the same way that a WHERE clause filters rows, but localized to the specific aggregate function. FILTERs are not
currently able to be used when the aggregate function is in a windowing context.

There are multiple types of situations where this is useful, including when evaluating multiple aggregates with different filters, and when
creating a pivoted view of a dataset. FILTER provides a cleaner syntax for pivoting data when compared with the more traditional CASE
WHEN approach discussed below.

Some aggregate functions also do not filter out null values, so using a FILTER clause will return valid results when at times the CASE
WHEN approach will not. This occurs with the functions first and last, which are desirable in a non‑aggregating pivot operation where
the goal is to simply re‑orient the data into columns rather than re‑aggregate it. FILTER also improves null handling when using the LIST
and ARRAY_AGG functions, as the CASE WHEN approach will include null values in the list result, while the FILTER clause will remove
them.

Examples

-- Compare total row count to:


-- The number of rows where i <= 5
-- The number of rows where i is odd
SELECT
count(*) AS total_rows,
count(*) FILTER (i <= 5) AS lte_five,
count(*) FILTER (i % 2 = 1) AS odds
FROM generate_series(1, 10) tbl(i);

total_rows lte_five odds

10 5 5

-- Different aggregate functions may be used, and multiple WHERE expressions are also permitted
-- The sum of i for rows where i <= 5
-- The median of i where i is odd
SELECT
sum(i) FILTER (i <= 5) AS lte_five_sum,
median(i) FILTER (i % 2 = 1) AS odds_median,
median(i) FILTER (i % 2 = 1 AND i <= 5) AS odds_lte_five_median
FROM generate_series(1, 10) tbl(i);

lte_five_sum odds_median odds_lte_five_median

15 5.0 3.0

The FILTER clause can also be used to pivot data from rows into columns. This is a static pivot, as columns must be defined prior to
runtime in SQL. However, this kind of statement can be dynamically generated in a host programming language to leverage DuckDB's SQL
engine for rapid, larger than memory pivoting.

387
DuckDB Documentation

-- First generate an example dataset


CREATE TEMP TABLE stacked_data AS
SELECT
i,
CASE WHEN i <= rows * 0.25 THEN 2022
WHEN i <= rows * 0.5 THEN 2023
WHEN i <= rows * 0.75 THEN 2024
WHEN i <= rows * 0.875 THEN 2025
ELSE NULL
END AS year
FROM (
SELECT
i,
count(*) OVER () AS rows
FROM generate_series(1, 100_000_000) tbl(i)
) tbl;

-- "Pivot" the data out by year (move each year out to a separate column)
SELECT
count(i) FILTER (year = 2022) AS "2022",
count(i) FILTER (year = 2023) AS "2023",
count(i) FILTER (year = 2024) AS "2024",
count(i) FILTER (year = 2025) AS "2025",
count(i) FILTER (year IS NULL) AS "NULLs"
FROM stacked_data;

-- This syntax produces the same results as the FILTER clauses above
SELECT
count(CASE WHEN year = 2022 THEN i END) AS "2022",
count(CASE WHEN year = 2023 THEN i END) AS "2023",
count(CASE WHEN year = 2024 THEN i END) AS "2024",
count(CASE WHEN year = 2025 THEN i END) AS "2025",
count(CASE WHEN year IS NULL THEN i END) AS "NULLs"
FROM stacked_data;

2022 2023 2024 2025 NULLs

25000000 25000000 25000000 12500000 12500000

However, the CASE WHEN approach will not work as expected when using an aggregate function that does not ignore NULL values. The
first function falls into this category, so FILTER is preferred in this case.

-- "Pivot" the data out by year (move each year out to a separate column)
SELECT
first(i) FILTER (year = 2022) AS "2022",
first(i) FILTER (year = 2023) AS "2023",
first(i) FILTER (year = 2024) AS "2024",
first(i) FILTER (year = 2025) AS "2025",
first(i) FILTER (year IS NULL) AS "NULLs"
FROM stacked_data;

2022 2023 2024 2025 NULLs

1474561 25804801 50749441 76431361 87500001

-- This will produce NULL values whenever the first evaluation of the CASE WHEN clause returns a NULL
SELECT

388
DuckDB Documentation

first(CASE WHEN year = 2022 THEN i END) AS "2022",


first(CASE WHEN year = 2023 THEN i END) AS "2023",
first(CASE WHEN year = 2024 THEN i END) AS "2024",
first(CASE WHEN year = 2025 THEN i END) AS "2025",
first(CASE WHEN year IS NULL THEN i END) AS "NULLs"
FROM stacked_data;

2022 2023 2024 2025 NULLs

1228801 NULL NULL NULL NULL

Aggregate Function Syntax (Including FILTER Clause)

Set Operations

Set operations allow queries to be combined according to set operation semantics. Set operations refer to the UNION [ALL], INTERSECT
[ALL] and EXCEPT [ALL] clauses. The vanilla variants use set semantics, i.e., they eliminate duplicates, while the variants with ALL
use bag semantics.

Traditional set operations unify queries by column position, and require the to‑be‑combined queries to have the same number of input
columns. If the columns are not of the same type, casts may be added. The result will use the column names from the first query.

DuckDB also supports UNION [ALL] BY NAME, which joins columns by name instead of by position. UNION BY NAME does not require
the inputs to have the same number of columns. NULL values will be added in case of missing columns.

UNION

The UNION clause can be used to combine rows from multiple queries. The queries are required to have the same number of columns and
the same column types.

Vanilla UNION (Set Semantics) The vanilla UNION clause follows set semantics, therefore it performs duplicate elimination, i.e., only
unique rows will be included in the result.

-- the values [0..2)


SELECT * FROM range(2) t1(x)
UNION
SELECT * FROM range(3) t2(x);

┌───────┐
│ x │
│ int64 │
├───────┤
│ 0 │
│ 2 │
│ 1 │
└───────┘

UNION ALL (Bag Semantics) UNION ALL returns all rows of both queries following bag semantics, i.e., without duplicate elimina‑
tion.

-- the values [0..2) and [0..3)


SELECT * FROM range(2) t1(x)
UNION ALL
SELECT * FROM range(3) t2(x);

389
DuckDB Documentation

┌───────┐
│ x │
│ int64 │
├───────┤
│ 0 │
│ 1 │
│ 0 │
│ 1 │
│ 2 │
└───────┘

UNION [ALL] BY NAME The UNION [ALL] BY NAME clause can be used to combine rows from different tables by name, instead
of by position. UNION BY NAME does not require both queries to have the same number of columns. Any columns that are only found in
one of the queries are filled with NULL values for the other query.

Take the following tables for example:

CREATE TABLE capitals (city VARCHAR, country VARCHAR);


INSERT INTO capitals VALUES
('Amsterdam', 'NL'),
('Berlin', 'Germany');
CREATE TABLE weather (city VARCHAR, degrees INTEGER, date DATE);
INSERT INTO weather VALUES
('Amsterdam', 10, '2022-10-14'),
('Seattle', 8, '2022-10-12');

SELECT * FROM capitals


UNION BY NAME
SELECT * FROM weather;

┌───────────┬─────────┬─────────┬────────────┐
│ city │ country │ degrees │ date │
│ varchar │ varchar │ int32 │ date │
├───────────┼─────────┼─────────┼────────────┤
│ Amsterdam │ NULL │ 10 │ 2022-10-14 │
│ Seattle │ NULL │ 8 │ 2022-10-12 │
│ Amsterdam │ NL │ NULL │ NULL │
│ Berlin │ Germany │ NULL │ NULL │
└───────────┴─────────┴─────────┴────────────┘

UNION BY NAME follows set semantics (therefore it performs duplicate elimination), whereas UNION ALL BY NAME follows bag
semantics.

INTERSECT

The INTERSECT clause can be used to select all rows that occur in the result of both queries.

Vanilla INTERSECT (Set Semantics) Vanilla INTERSECT performs duplicate elimination, so only unique rows are returned.

-- the values [0..5] (all values that are both in t1 and t2)
SELECT * FROM range(2) t1(x)
INTERSECT
SELECT * FROM range(6) t2(x);

┌───────┐
│ x │
│ int64 │
├───────┤

390
DuckDB Documentation

│ 0 │
│ 1 │
└───────┘

INTERSECT ALL (Bag Semantics) INTERSECT ALL follows bag semantics, so duplicates are returned.

-- INTERSECT ALL uses bag semantics


SELECT unnest([5, 5, 6, 6, 6, 6, 7, 8]) AS x
INTERSECT ALL
SELECT unnest([5, 6, 6, 7, 7, 9]);

┌───────┐
│ x │
│ int32 │
├───────┤
│ 7 │
│ 6 │
│ 6 │
│ 5 │
└───────┘

EXCEPT

The EXCEPT clause can be used to select all rows that only occur in the left query.

Vanilla EXCEPT (Set Semantics) Vanilla EXCEPT follows set semantics, therefore, it performs duplicate elimination, so only unique
rows are returned.

-- the values [2..5)


SELECT * FROM range(5) t1(x)
EXCEPT
SELECT * FROM range(2) t2(x);

┌───────┐
│ x │
│ int64 │
├───────┤
│ 4 │
│ 3 │
│ 2 │
└───────┘

EXCEPT ALL (Bag Semantics)


-- EXCEPT ALL uses bag semantics
SELECT unnest([5, 5, 6, 6, 6, 6, 7, 8]) AS x
EXCEPT ALL
SELECT unnest([5, 6, 6, 7, 7, 9]);

┌───────┐
│ x │
│ int32 │
├───────┤
│ 6 │
│ 6 │
│ 8 │
│ 5 │
└───────┘

391
DuckDB Documentation

Syntax

Prepared Statements

DuckDB supports prepared statements where parameters are substituted when the query is executed. This can improve readability and is
useful for preventing SQL injections.

Syntax

There are three syntaxes for denoting parameters in prepared statements: auto‑incremented (?), positional ($1), and named ($param).
Note that not all clients support all of these syntaxes, e.g., the JDBC client only supports auto‑incremented parameters in prepared state‑
ments.

Example Data Set In the following, we introduce the three different syntaxes and illustrate them with examples using the following
table.

CREATE TABLE person (name VARCHAR, age BIGINT);


INSERT INTO person VALUES ('Alice', 37), ('Ana', 35), ('Bob', 41), ('Bea', 25);

In our example query, we'll look for people whose name starts with a ”B” and are at least 40 years old. This will return a single row <'Bob',
41>.

Auto‑Incremented Parameters: ? DuckDB support using prepared statements with auto‑incremented indexing, i.e., the position of the
parameters in the query corresponds to their position in the execution statement. For example:

PREPARE query_person AS
SELECT *
FROM person
WHERE starts_with(name, ?)
AND age >= ?;

Using the CLI client, the statement is executed as follows.

EXECUTE query_person('B', 40);

Positional Parameters: $1 Prepared statements can use positional parameters, where parameters are denoted with an integer ($1, $2).
For example:

PREPARE query_person AS
SELECT *
FROM person
WHERE starts_with(name, $2)
AND age >= $1;

Using the CLI client, the statement is executed as follows. Note that the first parameter corresponds to $1, the second to $2, and so on.

EXECUTE query_person(40, 'B');

Named Parameters: $parameter DuckDB also supports names parameters where parameters are denoted with $parameter_name.
For example:

PREPARE query_person AS
SELECT *
FROM person
WHERE starts_with(name, $name_start_letter)
AND age >= $minimum_age;

392
DuckDB Documentation

Using the CLI client, the statement is executed as follows.

EXECUTE query_person(name_start_letter := 'B', minimum_age := 40);

Data Types

Data Types

General‑Purpose Data Types

The table below shows all the built‑in general‑purpose data types. The alternatives listed in the aliases column can be used to refer to
these types as well, however, note that the aliases are not part of the SQL standard and hence might not be accepted by other database
engines.

Name Aliases Description

BIGINT INT8, LONG signed eight‑byte integer


BIT BITSTRING string of 1s and 0s
BLOB BYTEA, BINARY, VARBINARY variable‑length binary data
BOOLEAN BOOL, LOGICAL logical boolean (true/false)
DATE calendar date (year, month day)
DECIMAL(prec, scale) NUMERIC(prec, scale) fixed‑precision number with the given width (precision)
and scale, defaults to prec = 18 and scale = 3
DOUBLE FLOAT8, double precision floating‑point number (8 bytes)
HUGEINT signed sixteen‑byte integer
INTEGER INT4, INT, SIGNED signed four‑byte integer
INTERVAL date / time delta
REAL FLOAT4, FLOAT single precision floating‑point number (4 bytes)
SMALLINT INT2, SHORT signed two‑byte integer
TIME time of day (no time zone)
TIMESTAMP WITH TIME ZONE TIMESTAMPTZ combination of time and date that uses the current time
zone
TIMESTAMP DATETIME combination of time and date
TINYINT INT1 signed one‑byte integer
UBIGINT unsigned eight‑byte integer
UHUGEINT unsigned sixteen‑byte integer
UINTEGER unsigned four‑byte integer
USMALLINT unsigned two‑byte integer
UTINYINT unsigned one‑byte integer
UUID UUID data type
VARCHAR CHAR, BPCHAR, TEXT, STRING variable‑length character string

Implicit and explicit typecasting is possible between numerous types, see the Typecasting page for details.

393
DuckDB Documentation

Nested / Composite Types

DuckDB supports five nested data types: ARRAY, LIST, STRUCT, MAP, and UNION. Each supports different use cases and has a different
structure.

Name Description Rules when used in a column Build from values Define in DDL/CREATE

ARRAY An ordered, fixed‑length Each row must have the same [1, 2, 3] INT[3]
sequence of data values of the data type within each instance
same type. of the ARRAY and the same
number of elements.
LIST An ordered sequence of data Each row must have the same [1, 2, 3] INT[]
values of the same type. data type within each instance
of the LIST, but can have any
number of elements.
MAP A dictionary of multiple named Rows may have different keys. map([1, 2], MAP(INT,
values, each key having the ['a', 'b']) VARCHAR)
same type and each value
having the same type. Keys
and values can be any type
and can be different types
from one another.
STRUCT A dictionary of multiple named Each row must have the same {'i': 42, 'j': STRUCT(i INT, j
values, where each key is a keys. 'a'} VARCHAR)
string, but the value can be a
different type for each key.
UNION A union of multiple alternative Rows may be set to different union_value(num UNION(num INT,
data types, storing one of them member types of the union. := 2) text VARCHAR)
in each value at a time. A union
also contains a discriminator
”tag” value to inspect and
access the currently set
member type.

Nesting

ARRAY, LIST, MAP, STRUCT, and UNION types can be arbitrarily nested to any depth, so long as the type rules are observed.

-- Struct with lists


SELECT {'birds': ['duck', 'goose', 'heron'], 'aliens': NULL, 'amphibians': ['frog', 'toad']};
-- Struct with list of maps
SELECT {'test': [map([1, 5], [42.1, 45]), map([1, 5], [42.1, 45])]};
-- A list of unions
SELECT [union_value(num := 2), union_value(str := 'ABC')::UNION(str VARCHAR, num INTEGER)];

Performance Implications

The choice of data types can have a strong effect on performance. Please consult the Performance Guide for details.

394
DuckDB Documentation

Array Type

An ARRAY column stores fixed‑sized arrays. All fields in the column must have the same length and the same underlying type. Arrays are
typically used to store arrays of numbers, but can contain any uniform data type, including ARRAY, LIST and STRUCT types.

Arrays can be used to store vectors such as word embeddings or image embeddings.

To store variable‑length lists, use the LIST type. See the data types overview for a comparison between nested data types.

Note. The ARRAY type in PostgreSQL allows variable‑length fields. DuckDB's ARRAY type is fixed‑length.

Creating Arrays

Arrays can be created using the array_value(expr, ...) function.

-- Construct with the 'array_value' function


SELECT array_value(1, 2, 3);
-- You can always implicitly cast an array to a list (and use list functions, like list_extract, '[i]')
SELECT array_value(1, 2, 3)[2];
-- You can cast from a list to an array, but the dimensions have to match up!
SELECT [3, 2, 1]::INT[3];
-- Arrays can be nested
SELECT array_value(array_value(1, 2), array_value(3, 4), array_value(5, 6));
-- Arrays can store structs
SELECT array_value({'a': 1, 'b': 2}, {'a': 3, 'b': 4});

Defining an Array Field

Arrays can be created using the TYPE_NAME [ LENGTH ] syntax. For example, to create an array field for 3 integers, run:

CREATE TABLE array_table (id INT, arr INT[3]);


INSERT INTO array_table VALUES (10, [1, 2, 3]), (20, [4, 5, 6]);

Retrieving Values from Arrays

Retrieving one or more values from an array can be accomplished using brackets and slicing notation, or through list functions like list_
extract and array_extract. Using the example in Defining an Array Field.

The following queries for extracting the second element of an array are equivalent:

SELECT id, arr[1] AS element FROM array_table;


SELECT id, list_extract(arr, 1) AS element FROM array_table;
SELECT id, array_extract(arr, 1) AS element FROM array_table;

┌───────┬─────────┐
│ id │ element │
│ int32 │ int32 │
├───────┼─────────┤
│ 10 │ 1 │
│ 20 │ 4 │
└───────┴─────────┘

Using the slicing notation returns a LIST:

SELECT id, arr[1:2] AS elements FROM array_table;

┌───────┬──────────┐
│ id │ elements │
│ int32 │ int32[] │

395
DuckDB Documentation

├───────┼──────────┤
│ 10 │ [1, 2] │
│ 20 │ [4, 5] │
└───────┴──────────┘

Functions

All LIST functions work with the ARRAY type. Additionally, several ARRAY‑native functions are also supported. In the following, l1
stands for the 3‑element list created by array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT) and l2 stands for array_
value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT).

Function Description Example Result

array_value( index) Create an ARRAY containing the array_value(1.0::FLOAT, [1.0, 2.0,


argument values. 2.0::FLOAT, 3.0::FLOAT) 3.0]
array_cross_ Compute the cross product of array_cross_product(l1, l2) [-1.0, 2.0,
product( array1, two arrays of size 3. The array -1.0]
array2) elements can not be NULL.
array_cosine_ Compute the cosine similarity array_cosine_similarity(l1, l2) 0.9925833
similarity( array1, between two arrays of the same
array2) size. The array elements can not
be NULL. The arrays can have
any size as long as the size is the
same for both arguments.
array_distance( Compute the distance between array_distance(l1, l2) 1.7320508
array1, array2) two arrays of the same size. The
array elements can not be NULL.
The arrays can have any size as
long as the size is the same for
both arguments.
array_inner_ Compute the inner product array_inner_product(l1, l2) 20.0
product( array1, between two arrays of the same
array2) size. The array elements can not
be NULL. The arrays can have
any size as long as the size is the
same for both arguments.
array_dot_product( Alias for array_inner_ array_dot_product(l1, l2) 20.0
array1, array2) product( array1, array2).

Examples

-- create sample data


CREATE TABLE x (i INT, v FLOAT[3]);
CREATE TABLE y (i INT, v FLOAT[3]);
INSERT INTO x VALUES (1, array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT));
INSERT INTO y VALUES (1, array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT));
-- compute cross product
SELECT array_cross_product(x.v, y.v)
FROM x, y
WHERE x.i = y.i;
-- compute cosine similarity
SELECT array_cosine_similarity(x.v, y.v)

396
DuckDB Documentation

FROM x, y
WHERE x.i = y.i;

Ordering

The ordering of ARRAY instances is defined using a lexicographical order. NULL values compare greater than all other values and are
considered equal to each other.

Functions

See Nested Functions.

Bitstring Type

Name Aliases Description

BIT BITSTRING variable‑length strings of 1s and 0s

Bitstrings are strings of 1s and 0s. The bit type data is of variable length. A bitstring value requires 1 byte for each group of 8 bits, plus a
fixed amount to store some metadata.

By default bitstrings will not be padded with zeroes. Bitstrings can be very large, having the same size restrictions as BLOBs.

-- Create a bitstring
SELECT '101010'::BIT
-- Create a bitstring with predefined length.
-- The resulting bitstring will be left-padded with zeroes.
-- This returns 000000101011.
SELECT bitstring('0101011', 12);

Functions

See Bitstring Functions.

Blob Type

Name Aliases Description

BLOB BYTEA, BINARY, VARBINARY variable‑length binary data

The blob (Binary Large OBject) type represents an arbitrary binary object stored in the database system. The blob type can contain any
type of binary data with no restrictions. What the actual bytes represent is opaque to the database system.

-- create a blob value with a single byte (170)


SELECT '\xAA'::BLOB;
-- create a blob value with three bytes (170, 171, 172)
SELECT '\xAA\xAB\xAC'::BLOB;
-- create a blob value with two bytes (65, 66)
SELECT 'AB'::BLOB;

Blobs are typically used to store non‑textual objects that the database does not provide explicit support for, such as images. While blobs
can hold objects up to 4GB in size, typically it is not recommended to store very large objects within the database system. In many situations
it is better to store the large file on the file system, and store the path to the file in the database system in a VARCHAR field.

397
DuckDB Documentation

Functions

See Blob Functions.

Boolean Type

Name Aliases Description

BOOLEAN BOOL logical boolean (true/false)

The BOOLEAN type represents a statement of truth (”true” or ”false”). In SQL, the boolean field can also have a third state ”unknown” which
is represented by the SQL NULL value.

-- select the three possible values of a boolean column


SELECT true, false, NULL::BOOLEAN;

Boolean values can be explicitly created using the literals true and false. However, they are most often created as a result of comparisons
or conjunctions. For example, the comparison i > 10 results in a boolean value. Boolean values can be used in the WHERE and HAVING
clauses of a SQL statement to filter out tuples from the result. In this case, tuples for which the predicate evaluates to true will pass the
filter, and tuples for which the predicate evaluates to false or NULL will be filtered out. Consider the following example:

-- create a table with the value (5), (15) and (NULL)


CREATE TABLE integers (i INTEGER);
INSERT INTO integers VALUES (5), (15), (NULL);

-- select all entries where i > 10


SELECT * FROM integers WHERE i > 10;
-- in this case (5) and (NULL) are filtered out:
-- 5 > 10 = false
-- NULL > 10 = NULL
-- The result is (15)

Conjunctions

The AND/OR conjunctions can be used to combine boolean values.

Below is the truth table for the AND conjunction (i.e., x AND y).

X X AND true X AND false X AND NULL

true true false NULL


false false false false
NULL NULL false NULL

Below is the truth table for the OR conjunction (i.e., x OR y).

X X OR true X OR false X OR NULL

true true true true


false true false NULL
NULL true NULL NULL

398
DuckDB Documentation

Expressions

See Logical Operators and Comparison Operators.

Date Types

Name Aliases Description

DATE calendar date (year, month day)

A date specifies a combination of year, month and day. DuckDB follows the SQL standard's lead by counting dates exclusively in the Gre‑
gorian calendar, even for years before that calendar was in use. Dates can be created using the DATE keyword, where the data must be
formatted according to the ISO 8601 format (YYYY-MM-DD).

-- 20 September, 1992
SELECT DATE '1992-09-20';

Special Values

There are also three special date values that can be used on input:

Input string Description

epoch 1970‑01‑01 (Unix system day zero)


infinity later than all other dates
‑infinity earlier than all other dates

The values infinity and -infinity are specially represented inside the system and will be displayed unchanged; but epoch is simply
a notational shorthand that will be converted to the date value when read.

SELECT
'-infinity'::DATE AS negative,
'epoch'::DATE AS epoch,
'infinity'::DATE AS positive;

┌───────────┬────────────┬──────────┐
│ negative │ epoch │ positive │
│ date │ date │ date │
├───────────┼────────────┼──────────┤
│ -infinity │ 1970-01-01 │ infinity │
└───────────┴────────────┴──────────┘

Functions

See Date Functions.

Enum Data Type

399
DuckDB Documentation

Name Description

ENUM Dictionary Encoding representing all possible string values of a column.

The ENUM type represents a dictionary data structure with all possible unique values of a column. For example, a column storing the days
of the week can be an Enum holding all possible days. Enums are particularly interesting for string columns with low cardinality (i.e., fewer
distinct values). This is because the column only stores a numerical reference to the string in the Enum dictionary, resulting in immense
savings in disk storage and faster query performance.

Enum Definition

Enum types are created from either a hardcoded set of values or from a select statement that returns a single column of varchars. The set
of values in the select statement will be deduplicated, but if the enum is created from a hardcoded set there may not be any duplicates.

-- Create enum using hardcoded values


CREATE TYPE ${enum_name} AS ENUM ([${value_1}, ${value_2},...]);

-- Create enum using a select statement that returns a single column of varchars
CREATE TYPE ${enum_name} AS ENUM (${SELECT expression});

For example:

-- Creates new user defined type 'mood' as an Enum


CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');

-- This will fail since the mood type already exists


CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy', 'anxious');

-- This will fail since Enums cannot hold null values


CREATE TYPE breed AS ENUM ('maltese', NULL);

-- This will fail since Enum values must be unique


CREATE TYPE breed AS ENUM ('maltese', 'maltese');

-- Create an enum from a select statement


-- First create an example table of values
CREATE TABLE my_inputs AS
SELECT 'duck' AS my_varchar UNION ALL
SELECT 'duck' AS my_varchar UNION ALL
SELECT 'goose' AS my_varchar;

-- Create an enum using the unique string values in the my_varchar column
CREATE TYPE birds AS ENUM (SELECT my_varchar FROM my_inputs);

-- Show the available values in the birds enum using the enum_range function
SELECT enum_range(NULL::birds) AS my_enum_range;

my_enum_range

[duck, goose]

Enum Usage

After an enum has been created, it can be used anywhere a standard built‑in type is used. For example, we can create a table with a column
that references the enum.

400
DuckDB Documentation

Creates a table person, with attributes name (string type) and current_mood (mood type):

CREATE TABLE person (


name TEXT,
current_mood mood
);

Inserts tuples in the person table:

INSERT INTO person


VALUES ('Pedro', 'happy'), ('Mark', NULL), ('Pagliacci', 'sad'), ('Mr. Mackey', 'ok');

The following query will fail since the mood type does not have a 'quackity‑quack' value.

INSERT INTO person


VALUES ('Hannes', 'quackity-quack');

The string 'sad' is cast to the type mood, returning a numerical reference value. This makes the comparison a numerical comparison instead
of a string comparison.

SELECT *
FROM person
WHERE current_mood = 'sad';

┌───────────┬───────────────────────────────────────┐
│ name │ current_mood │
│ varchar │ enum('sad', 'ok', 'happy', 'anxious') │
├───────────┼───────────────────────────────────────┤
│ Pagliacci │ sad │
└───────────┴───────────────────────────────────────┘

If you are importing data from a file, you can create an Enum for a VARCHAR column before importing. Given this, the following subquery
selects automatically selects only distinct values:

CREATE TYPE mood AS ENUM (SELECT mood FROM 'path/to/file.csv');

Then you can create a table with the ENUM type and import using any data import statement

CREATE TABLE person (name TEXT, current_mood mood);


COPY person FROM 'path/to/file.csv';

Enums vs. Strings

DuckDB Enums are automatically cast to VARCHAR types whenever necessary. This characteristic allows for ENUM columns to be used in
any VARCHAR function. In addition, it also allows for comparisons between different ENUM columns, or an ENUM and a VARCHAR column.

For example:

-- regexp_matches is a function that takes a VARCHAR, hence current_mood is cast to VARCHAR


SELECT regexp_matches(current_mood, '.*a.*') AS contains_a
FROM person;

┌────────────┐
│ contains_a │
│ boolean │
├────────────┤
│ true │
│ NULL │
│ true │
│ false │
└────────────┘

Create a new mood and table:

401
DuckDB Documentation

CREATE TYPE new_mood AS ENUM ('happy', 'anxious');


CREATE TABLE person_2 (
name text,
current_mood mood,
future_mood new_mood,
past_mood VARCHAR
);

Since the current_mood and future_mood columns are constructed on different ENUM types, DuckDB will cast both ENUMs to strings
and perform a string comparison:

SELECT *
FROM person_2
WHERE current_mood = future_mood;

When comparing the past_mood column (string), DuckDB will cast the current_mood ENUM to VARCHAR and perform a string compar‑
ison:

SELECT *
FROM person_2
WHERE current_mood = past_mood;

Enum Removal

Enum types are stored in the catalog, and a catalog dependency is added to each table that uses them. It is possible to drop an Enum from
the catalog using the following command:

DROP TYPE ${enum_name};

Currently, it is possible to drop Enums that are used in tables without affecting the tables.

Note. Warning This behavior of the Enum Removal feature is subject to change. In future releases, it is expected that any dependent
columns must be removed before dropping the Enum, or the Enum must be dropped with the additional CASCADE parameter.

Comparison of Enums

Enum values are compared according to their order in the Enum's definition. For example:

CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');

SELECT 'sad'::mood < 'ok'::mood AS comp;

┌─────────┐
│ comp │
│ boolean │
├─────────┤
│ true │
└─────────┘

SELECT unnest(['ok'::mood, 'happy'::mood, 'sad'::mood]) AS m


ORDER BY m;

┌────────────────────────────┐
│ m │
│ enum('sad', 'ok', 'happy') │
├────────────────────────────┤
│ sad │
│ ok │
│ happy │
└────────────────────────────┘

402
DuckDB Documentation

Interval Type

Intervals represent a period of time. This period can be measured in a specific unit or combination of units, for example years, days, or
seconds. Intervals are generally used to modify timestamps or dates by either adding or subtracting them.

Name Description

INTERVAL Period of time

An INTERVAL can be constructed by providing an amount together with a unit. Intervals can be added or subtracted from DATE or TIMES-
TAMP values.

-- 1 year
SELECT INTERVAL 1 YEAR;
-- add 1 year to a specific date
SELECT DATE '2000-01-01' + INTERVAL 1 YEAR;
-- subtract 1 year from a specific date
SELECT DATE '2000-01-01' - INTERVAL 1 YEAR;
-- construct an interval from a column, instead of a constant
SELECT INTERVAL (i) YEAR FROM range(1, 5) t(i);
-- construct an interval with mixed units
SELECT INTERVAL '1 month 1 day';
-- intervals greater than 24 hours/12 months/etc. are supported
SELECT '540:58:47.210'::INTERVAL;
SELECT INTERVAL '16 MONTHS';

-- WARNING:
-- If a decimal value is specified, it will be automatically rounded to an integer
-- To use more precise values, simply use a more granular date part
-- (In this example use 18 MONTHS instead of 1.5 YEARS)
-- The statement below is equivalent to to_years(CAST(1.5 AS INTEGER))
SELECT INTERVAL '1.5' YEARS; -- WARNING! This returns 2 years!

Details

The interval class represents a period of time using three distinct components: the month, day and microsecond. These three components
are required because there is no direct translation between them. For example, a month does not correspond to a fixed amount of days.
That depends on which month is referenced. February has fewer days than March.

The division into components makes the interval class suitable for adding or subtracting specific time units to a date. For example, we can
generate a table with the first day of every month using the following SQL query:

SELECT DATE '2000-01-01' + INTERVAL (i) MONTH


FROM range(12) t(i);

Difference between Dates

If we subtract two timestamps from one another, we obtain an interval describing the difference between the timestamps with the days
and microseconds components. For example:

SELECT TIMESTAMP '2000-02-01 12:00:00' - TIMESTAMP '2000-01-01 11:00:00' AS diff;

┌──────────────────┐
│ diff │
│ interval │
├──────────────────┤

403
DuckDB Documentation

│ 31 days 01:00:00 │
└──────────────────┘

The datediff function can be used to obtain the difference between two dates for a specific unit.

SELECT datediff('month', TIMESTAMP '2000-01-01 11:00:00', TIMESTAMP '2000-02-01 12:00:00') AS diff;

┌───────┐
│ diff │
│ int64 │
├───────┤
│ 1 │
└───────┘

Functions

See the Date Part Functions page for a list of available date parts for use with an INTERVAL.

See the Interval Operators page for functions that operate on intervals.

List Type

A LIST column encodes lists of values. Fields in the column can have values with different lengths, but they must all have the same
underlying type. LISTs are typically used to store arrays of numbers, but can contain any uniform data type, including other LISTs and
STRUCTs.

LISTs are similar to PostgreSQL's ARRAY type. DuckDB uses the LIST terminology, but some array functions are provided for PostgreSQL
compatibility.

See the data types overview for a comparison between nested data types.

Note. For storing fixed‑length lists, DuckDB uses the ARRAY type.

Creating Lists

Lists can be created using the list_value(expr, ...) function or the equivalent bracket notation [expr, ...]. The expressions
can be constants or arbitrary expressions. To create a list from a table column, use the list aggregate function.

-- List of integers
SELECT [1, 2, 3];
-- List of strings with a NULL value
SELECT ['duck', 'goose', NULL, 'heron'];
-- List of lists with NULL values
SELECT [['duck', 'goose', 'heron'], NULL, ['frog', 'toad'], []];
-- Create a list with the list_value function
SELECT list_value(1, 2, 3);
-- Create a table with an integer list column and a varchar list column
CREATE TABLE list_table (int_list INT[], varchar_list VARCHAR[]);

Retrieving from Lists

Retrieving one or more values from a list can be accomplished using brackets and slicing notation, or through list functions like list_
extract. Multiple equivalent functions are provided as aliases for compatibility with systems that refer to lists as arrays. For example,
the function array_slice.

404
DuckDB Documentation

Note. We wrap the list creation in parenthesis so that it happens first. This is only needed in our basic examples here, not when
working with a list column. For example, this can't be parsed: SELECT ['a', 'b', 'c'][1].

Example Result

SELECT (['a', 'b', 'c'])[3] 'c'


SELECT (['a', 'b', 'c'])[-1] 'c'
SELECT (['a', 'b', 'c'])[2 + 1] 'c'
SELECT list_extract(['a', 'b', 'c'], 3) 'c'
SELECT (['a', 'b', 'c'])[1:2] ['a', 'b']
SELECT (['a', 'b', 'c'])[:2] ['a', 'b']
SELECT (['a', 'b', 'c'])[-2:] ['b', 'c']
SELECT list_slice(['a', 'b', 'c'], 2, 3) ['b', 'c']

Ordering

The ordering is defined positionally. NULL values compare greater than all other values and are considered equal to each other.

Null Comparisons

At the top level, NULL nested values obey standard SQL NULL comparison rules: comparing a NULL nested value to a non‑NULL nested
value produces a NULL result. Comparing nested value members, however, uses the internal nested value rules for NULLs, and a NULL
nested value member will compare above a non‑NULL nested value member.

Functions

See Nested Functions.

Literal Types

DuckDB has special literal types for representing NULL, integer and string literals in queries. These have their own binding and conversion
rules.

Note. Prior to version 0.10.0, integer and string literals behaved identically to the INTEGER and VARCHAR types.

Null Literals

The NULL literal can be implicitly converted to any other type.

Integer Literals

INTEGER_LITERAL types can be implicitly converted to any integer type in which the value fits. For example, the integer literal 42 can
be implicitly converted to a TINYINT, but the integer literal 1000 cannot be.

405
DuckDB Documentation

String Literals

STRING_LITERAL instances can be implicitly converted to any other type.

For example, we can compare string literals with dates.

SELECT d > '1992-01-01' AS result FROM (VALUES (DATE '1992-01-01')) t(d);

┌─────────┐
│ result │
│ boolean │
├─────────┤
│ false │
└─────────┘

However, we cannot compare VARCHAR values with dates.

SELECT d > '1992-01-01'::VARCHAR FROM (VALUES (DATE '1992-01-01')) t(d);

-- Binder Error: Cannot compare values of type DATE and type VARCHAR –
-- an explicit cast is required

Escape String Literals To include special characters such as newline, use E escape the string. Both the uppercase (E'...') and lower‑
case variants (e'...') work.

SELECT E'Hello\nworld' AS msg;


-- or
SELECT e'Hello\nworld' AS msg;

┌──────────────┐
│ msg │
│ varchar │
├──────────────┤
│ Hello\nworld │
└──────────────┘

Dollar‑Quoted String Literals DuckDB supports dollar‑quoted string literals, which are surrounded by double‑dollar symbols ($$):

SELECT $$Hello
world$$ AS msg

┌──────────────┐
│ msg │
│ varchar │
├──────────────┤
│ Hello\nworld │
└──────────────┘

SELECT $$The price is $9.95$$ AS msg;

┌────────────────────┐
│ msg │
│ varchar │
├────────────────────┤
│ The price is $9.95 │
└────────────────────┘

406
DuckDB Documentation

Map Type

MAPs are similar to STRUCTs in that they are an ordered list of ”entries” where a key maps to a value. However, MAPs do not need to have
the same keys present for each row, and thus are suitable for other use cases. MAPs are useful when the schema is unknown beforehand
or when the schema varies per row; their flexibility is a key differentiator.

MAPs must have a single type for all keys, and a single type for all values. Keys and values can be any type, and the type of the keys does
not need to match the type of the values (Ex: a MAP of VARCHAR to INT is valid). MAPs may not have duplicate keys. MAPs return an empty
list if a key is not found rather than throwing an error as structs do.

In contrast, STRUCTs must have string keys, but each key may have a value of a different type. See the data types overview for a comparison
between nested data types.

To construct a MAP, use the bracket syntax preceded by the MAP keyword.

Creating Maps

-- A map with VARCHAR keys and INTEGER values. This returns {key1=10, key2=20, key3=30}
SELECT MAP {'key1': 10, 'key2': 20, 'key3': 30};
-- Alternatively use the map_from_entries function. This returns {key1=10, key2=20, key3=30}
SELECT map_from_entries([('key1', 10), ('key2', 20), ('key3', 30)]);
-- A map can be also created using two lists: keys and values. This returns {key1=10, key2=20, key3=30}
SELECT MAP(['key1', 'key2', 'key3'], [10, 20, 30]);
-- A map can also use INTEGER keys and NUMERIC values. This returns {1=42.001, 5=-32.100}
SELECT MAP {1: 42.001, 5: -32.1};
-- Keys and/or values can also be nested types.
-- This returns {[a, b]=[1.1, 2.2], [c, d]=[3.3, 4.4]}
SELECT MAP {['a', 'b']: [1.1, 2.2], ['c', 'd']: [3.3, 4.4]};
-- Create a table with a map column that has INTEGER keys and DOUBLE values
CREATE TABLE tbl (col MAP(INTEGER, DOUBLE));

Retrieving from Maps

MAPs use bracket notation for retrieving values. Selecting from a MAP returns a LIST rather than an individual value, with an empty LIST
meaning that the key was not found.

-- Use bracket notation to retrieve a list containing the value at a key's location. This returns [5]
-- Note that the expression in bracket notation must match the type of the map's key
SELECT MAP {'key1': 5, 'key2': 43}['key1'];
-- To retrieve the underlying value, use list selection syntax to grab the first element.
-- This returns 5
SELECT MAP {'key1': 5, 'key2': 43}['key1'][1];
-- If the element is not in the map, an empty list will be returned. Returns []
-- Note that the expression in bracket notation must match the type of the map's key else an error is
returned
SELECT MAP {'key1': 5, 'key2': 43}['key3'];
-- The element_at function can also be used to retrieve a map value. This returns [5]
SELECT element_at(MAP {'key1': 5, 'key2': 43}, 'key1');

Comparison Operators

Nested types can be compared using all the comparison operators. These comparisons can be used in logical expressions for both WHERE
and HAVING clauses, as well as for creating Boolean values.

The ordering is defined positionally in the same way that words can be ordered in a dictionary. NULL values compare greater than all other
values and are considered equal to each other.

407
DuckDB Documentation

At the top level, NULL nested values obey standard SQL NULL comparison rules: comparing a NULL nested value to a non‑NULL nested
value produces a NULL result. Comparing nested value members, however, uses the internal nested value rules for NULLs, and a NULL
nested value member will compare above a non‑NULL nested value member.

Functions

See Nested Functions.

NULL Values

NULL values are special values that are used to represent missing data in SQL. Columns of any type can contain NULL values. Logically, a
NULL value can be seen as ”the value of this field is unknown”.

-- insert a null value into a table


CREATE TABLE integers (i INTEGER);
INSERT INTO integers VALUES (NULL);

NULL values have special semantics in many parts of the query as well as in many functions:

Note. Any comparison with a NULL value returns NULL, including NULL = NULL.

You can use IS NOT DISTINCT FROM to perform an equality comparison where NULL values compare equal to each other. Use IS
(NOT) NULL to check if a value is NULL.

SELECT NULL = NULL;


-- returns NULL
SELECT NULL IS NOT DISTINCT FROM NULL;
-- returns true
SELECT NULL IS NULL;
-- returns true

NULL and Functions

A function that has input argument as NULL usually returns NULL.

SELECT cos(NULL);
-- NULL

The coalesce function is an exception to this: it takes any number of arguments, and returns for each row the first argument that is not
NULL. If all arguments are NULL, coalesce also returns NULL.

SELECT coalesce(NULL, NULL, 1);


-- 1
SELECT coalesce(10, 20);
-- 10
SELECT coalesce(NULL, NULL);
-- NULL

The ifnull function is a two‑argument version of coalesce.

SELECT ifnull(NULL, 'default_string');


-- default_string
SELECT ifnull(1, 'default_string');
-- 1

NULL and Conjunctions

NULL values have special semantics in AND/OR conjunctions. For the ternary logic truth tables, see the Boolean Type documentation.

408
DuckDB Documentation

NULL and Aggregate Functions

NULL values are ignored in most aggregate functions.

Aggregate functions that do not ignore NULL values include: first, last, list, and array_agg. To exclude NULL values from those
aggregate functions, the FILTER clause can be used.

CREATE TABLE integers (i INTEGER);


INSERT INTO integers VALUES (1), (10), (NULL);

SELECT min(i) FROM integers;


-- 1

SELECT max(i) FROM integers;


-- 10

Numeric Types

Integer Types

The types TINYINT, SMALLINT, INTEGER, BIGINT and HUGEINT store whole numbers, that is, numbers without fractional compo‑
nents, of various ranges. Attempts to store values outside of the allowed range will result in an error. The types UTINYINT, USMALLINT,
UINTEGER, UBIGINT and UHUGEINT store whole unsigned numbers. Attempts to store negative numbers or values outside of the al‑
lowed range will result in an error

Name Aliases Min Max

TINYINT INT1 ‑128 127


SMALLINT INT2, SHORT ‑32768 32767
INTEGER INT4, INT, SIGNED ‑2147483648 2147483647
BIGINT INT8, LONG ‑9223372036854775808 9223372036854775807
HUGEINT ‑ ‑ 2127 − 1
170141183460469231731687303715884105728
UTINYINT ‑ 0 255
USMALLINT ‑ 0 65535
UINTEGER ‑ 0 4294967295
UBIGINT ‑ 0 18446744073709551615
UHUGEINT ‑ 0 340282366920938463463374607431768211455

The type integer is the common choice, as it offers the best balance between range, storage size, and performance. The SMALLINT type is
generally only used if disk space is at a premium. The BIGINT and HUGEINT types are designed to be used when the range of the integer
type is insufficient.

Fixed‑Point Decimals

The data type DECIMAL(WIDTH, SCALE) (also available under the alias NUMERIC(WIDTH, SCALE)) represents an exact fixed‑point
decimal value. When creating a value of type DECIMAL, the WIDTH and SCALE can be specified to define which size of decimal values can
be held in the field. The WIDTH field determines how many digits can be held, and the scale determines the amount of digits after the
decimal point. For example, the type DECIMAL(3, 2) can fit the value 1.23, but cannot fit the value 12.3 or the value 1.234. The
default WIDTH and SCALE is DECIMAL(18, 3), if none are specified.

409
DuckDB Documentation

Internally, decimals are represented as integers depending on their specified width.

Width Internal Size (Bytes)

1‑4 INT16 2
5‑9 INT32 4
10‑18 INT64 8
19‑38 INT128 16

Performance can be impacted by using too large decimals when not required. In particular decimal values with a width above 19 are slow,
as arithmetic involving the INT128 type is much more expensive than operations involving the INT32 or INT64 types. It is therefore
recommended to stick with a width of 18 or below, unless there is a good reason for why this is insufficient.

Floating‑Point Types

The data types REAL and DOUBLE precision are inexact, variable‑precision numeric types. In practice, these types are usually implementa‑
tions of IEEE Standard 754 for Binary Floating‑Point Arithmetic (single and double precision, respectively), to the extent that the underlying
processor, operating system, and compiler support it.

Name Aliases Description

REAL FLOAT4, FLOAT single precision floating‑point number (4 bytes)


DOUBLE FLOAT8 double precision floating‑point number (8 bytes)

Inexact means that some values cannot be converted exactly to the internal format and are stored as approximations, so that storing and
retrieving a value might show slight discrepancies. Managing these errors and how they propagate through calculations is the subject of
an entire branch of mathematics and computer science and will not be discussed here, except for the following points:

• If you require exact storage and calculations (such as for monetary amounts), use the DECIMAL data type or its NUMERIC alias
instead.
• If you want to do complicated calculations with these types for anything important, especially if you rely on certain behavior in
boundary cases (infinity, underflow), you should evaluate the implementation carefully.
• Comparing two floating‑point values for equality might not always work as expected.

On most platforms, the REAL type has a range of at least 1E‑37 to 1E+37 with a precision of at least 6 decimal digits. The DOUBLE type
typically has a range of around 1E‑307 to 1E+308 with a precision of at least 15 digits. Values that are too large or too small will cause an
error. Rounding might take place if the precision of an input number is too high. Numbers too close to zero that are not representable as
distinct from zero will cause an underflow error.

In addition to ordinary numeric values, the floating‑point types have several special values:

• Infinity
• -Infinity
• NaN

These represent the IEEE 754 special values ”infinity”, ”negative infinity”, and ”not‑a‑number”, respectively. (On a machine whose floating‑
point arithmetic does not follow IEEE 754, these values will probably not work as expected.) When writing these values as constants in an
SQL command, you must put quotes around them, for example: UPDATE table SET x = '-Infinity'. On input, these strings are
recognized in a case‑insensitive manner.

410
DuckDB Documentation

Universally Unique Identifiers (UUIDs)

DuckDB supports universally unique identifiers (UUIDs) through the UUID type. These use 128 bits and are represented internally as
HUGEINT values. When printed, they are shown with hexadecimal characters, separated by dashes as follows: 8 characters - 4
characters - 4 characters - 4 characters - 12 characters (using 36 characters in total). For example, 4ac7a9e9-
607c-4c8a-84f3-843f0191e3fd is a valid UUID.

To generate a new UUID, use the uuid() utility function.

Functions

See Numeric Functions and Operators.

Struct Data Type

Conceptually, a STRUCT column contains an ordered list of columns called ”entries”. The entries are referenced by name using strings. This
document refers to those entry names as keys. Each row in the STRUCT column must have the same keys. The names of the struct entries
are part of the schema. Each row in a STRUCT column must have the same layout. The names of the struct entries are case‑insensitive.

STRUCTs are typically used to nest multiple columns into a single column, and the nested column can be of any type, including other
STRUCTs and LISTs.

STRUCTs are similar to PostgreSQL's ROW type. The key difference is that DuckDB STRUCTs require the same keys in each row of a STRUCT
column. This allows DuckDB to provide significantly improved performance by fully utilizing its vectorized execution engine, and also
enforces type consistency for improved correctness. DuckDB includes a row function as a special way to produce a STRUCT, but does not
have a ROW data type. See an example below and the nested functions docs for details.

See the data types overview for a comparison between nested data types.

Creating Structs Structs can be created using the struct_pack(name := expr, ...) function or the equivalent array notation
{'name': expr, ...} notation. The expressions can be constants or arbitrary expressions.

-- Struct of integers
SELECT {'x': 1, 'y': 2, 'z': 3};
-- Struct of strings with a NULL value
SELECT {'yes': 'duck', 'maybe': 'goose', 'huh': NULL, 'no': 'heron'};
-- Struct with a different type for each key
SELECT {'key1': 'string', 'key2': 1, 'key3': 12.345};
-- Struct using the struct_pack function.
-- Note the lack of single quotes around the keys and the use of the := operator
SELECT struct_pack(key1 := 'value1', key2 := 42);
-- Struct of structs with NULL values
SELECT {'birds':
{'yes': 'duck', 'maybe': 'goose', 'huh': NULL, 'no': 'heron'},
'aliens':
NULL,
'amphibians':
{'yes':'frog', 'maybe': 'salamander', 'huh': 'dragon', 'no':'toad'}
};
-- Create a struct from columns and/or expressions using the row function.
-- This returns {'': 1, '': 2, '': a}
SELECT row(x, x + 1, y) FROM (SELECT 1 AS x, 'a' AS y);
-- If using multiple expressions when creating a struct, the row function is optional
-- This also returns {'': 1, '': 2, '': a}
SELECT (x, x + 1, y) FROM (SELECT 1 AS x, 'a' AS y);

411
DuckDB Documentation

Adding Field(s)/Value(s) to Structs

-- Add to a Struct of integers


SELECT struct_insert({'a': 1, 'b': 2, 'c': 3}, d := 4);

Retrieving from Structs Retrieving a value from a struct can be accomplished using dot notation, bracket notation, or through struct
functions like struct_extract.

-- Use dot notation to retrieve the value at a key's location. This returns 1
-- The subquery generates a struct column "a", which we then query with a.x
SELECT a.x FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS a);
-- If key contains a space, simply wrap it in double quotes. This returns 1
-- Note: Use double quotes not single quotes
-- This is because this action is most similar to selecting a column from within the struct
SELECT a."x space" FROM (SELECT {'x space': 1, 'y': 2, 'z': 3} AS a);
-- Bracket notation may also be used. This returns 1
-- Note: Use single quotes since the goal is to specify a certain string key.
-- Only constant expressions may be used inside the brackets (no columns)
SELECT a['x space'] FROM (SELECT {'x space': 1, 'y': 2, 'z': 3} AS a);
-- The struct_extract function is also equivalent. This returns 1
SELECT struct_extract({'x space': 1, 'y': 2, 'z': 3}, 'x space');

Struct.* Rather than retrieving a single key from a struct, star notation (*) can be used to retrieve all keys from a struct as separate
columns. This is particularly useful when a prior operation creates a struct of unknown shape, or if a query must handle any potential struct
keys.

-- All keys within a struct can be returned as separate columns using *


SELECT a.*
FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS a);

x y z

1 2 3

Dot Notation Order of Operations Referring to structs with dot notation can be ambiguous with referring to schemas and tables. In
general, DuckDB looks for columns first, then for struct keys within columns. DuckDB resolves references in these orders, using the first
match to occur:

No Dots

SELECT part1
FROM tbl;

1. part1 is a column

One Dot

SELECT part1.part2
FROM tbl;

1. part1 is a table, part2 is a column


2. part1 is a column, part2 is a property of that column

412
DuckDB Documentation

Two (or More) Dots


SELECT part1.part2.part3
FROM tbl;

1. part1 is a schema, part2 is a table, part3 is a column


2. part1 is a table, part2 is a column, part3 is a property of that column
3. part1 is a column, part2 is a property of that column, part3 is a property of that column

Any extra parts (e.g., .part4.part5 etc) are always treated as properties

Creating Structs with the row Function The row function can be used to automatically convert multiple columns to a single struct
column. When using row the keys will be empty strings allowing for easy insertion into a table with a struct column. Columns, however,
cannot be initialized with the row function, and must be explicitly named. For example:

-- Inserting values into a struct column using the row function


CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER));
INSERT INTO t1 VALUES (ROW('a', 42));
-- The table will contain a single entry:
-- {'v': a, 'i': 42}

-- The following produces the same result as above


CREATE TABLE t1 AS (
SELECT ROW('a', 42)::STRUCT(v VARCHAR, i INTEGER)
);

-- Initializing a struct column with the row function will fail


CREATE TABLE t2 AS SELECT ROW('a');
-- The following error is thrown:
-- "Error: Invalid Input Error: A table cannot be created from an unnamed struct"

When casting structs, the names of fields have to match. Therefore, the following query will fail:

SELECT a::STRUCT(y INTEGER) AS b


FROM
(SELECT {'x': 42} AS a);

Error: Mismatch Type Error: Type STRUCT(x INTEGER) does not match with STRUCT(y INTEGER). Cannot cast
STRUCTs with different names

A workaround for this would be to use struct_pack instead:

SELECT struct_pack(y := a.x) AS b


FROM
(SELECT {'x': 42} AS a);

Note. This behavior was introduced in DuckDB v0.9.0. Previously, this query ran successfully and returned struct {'y': 42} as
column b.

Comparison Operators

Nested types can be compared using all the comparison operators. These comparisons can be used in logical expressions for both WHERE
and HAVING clauses, as well as for creating BOOLEAN values.

The ordering is defined positionally in the same way that words can be ordered in a dictionary. NULL values compare greater than all other
values and are considered equal to each other.

At the top level, NULL nested values obey standard SQL NULL comparison rules: comparing a NULL nested value to a non‑NULL nested
value produces a NULL result. Comparing nested value members, however, uses the internal nested value rules for NULLs, and a NULL
nested value member will compare above a non‑NULL nested value member.

413
DuckDB Documentation

Functions

See Nested Functions.

Text Types

In DuckDB, strings can be stored in the VARCHAR field. The field allows storage of Unicode characters. Internally, the data is encoded as
UTF‑8.

Name Aliases Description

VARCHAR CHAR, BPCHAR, STRING, TEXT Variable‑length character string


VARCHAR(n) STRING(n), TEXT(n) Variable‑length character string. The
maximum length n has no effect and is only
provided for compatibility.

Specifying a Length Limit

Specifying the length for the VARCHAR, STRING, and TEXT types is not required and has no effect on the system. Specifying the length
will not improve performance or reduce storage space of the strings in the database. These variants variant is supported for compatibility
reasons with other systems that do require a length to be specified for strings.

If you wish to restrict the number of characters in a VARCHAR column for data integrity reasons the CHECK constraint should be used, for
example:

CREATE TABLE strings (


val VARCHAR CHECK (length(val) <= 10) -- val has a maximum length of 10
);

The VARCHAR field allows storage of Unicode characters. Internally, the data is encoded as UTF‑8.

Formatting Strings

Strings in DuckDB are surrounded by single quote (apostrophe) characters ('):

SELECT 'Hello world' AS msg;

┌─────────────┐
│ msg │
│ varchar │
├─────────────┤
│ Hello world │
└─────────────┘

To include a single quote character in a string, use '':

SELECT 'Hello ''world''' AS msg;

┌───────────────┐
│ msg │
│ varchar │
├───────────────┤
│ Hello 'world' │
└───────────────┘

414
DuckDB Documentation

Strings with Special Characters

To use special characters in string, use escape string literals or dollar‑quoted string literals. Alternatively, you can use concatenation and
the chr character function:

SELECT 'Hello' || chr(10) || 'world' AS msg;

┌──────────────┐
│ msg │
│ varchar │
├──────────────┤
│ Hello\nworld │
└──────────────┘

Double Quote Characters

Double quote characters (") are used to denote table and column names. Surrounding their names allows the use of keywords, e.g.:

CREATE TABLE "table" ("order" BIGINT);

While DuckDB occasionally accepts both single quote and double quotes for strings (e.g., both FROM "filename.csv" and FROM
'filename.csv' work), their use is not recommended.

Functions

See Character Functions and Pattern Matching.

Time Types

The TIME and TIMETZ types specify the hour, minute, second, microsecond of a day.

Name Aliases Description

TIME TIME WITHOUT TIME ZONE time of day (ignores time zone)
TIMETZ TIME WITH TIME ZONE time of day (uses time zone)

Instances can be created using the type names as a keyword, where the data must be formatted according to the ISO 8601 format
(hh:mm:ss[.zzzzzz][+-TT[:tt]]).

SELECT TIME '1992-09-20 11:30:00.123456'; -- 11:30:00.123456


SELECT TIMETZ '1992-09-20 11:30:00.123456'; -- 11:30:00.123456+00
SELECT TIMETZ '1992-09-20 11:30:00.123456-02:00'; -- 13:30:00.123456+00
SELECT TIMETZ '1992-09-20 11:30:00.123456+05:30'; -- 06:00:00.123456+00

Note. Warning The TIME type should only be used in rare cases, where the date part of the timestamp can be disregarded. Most
applications should use the TIMESTAMP types to represent their timestamps.

Timestamp Types

Timestamps represent points in absolute time, usually called instants. DuckDB represents instants as the number of microseconds (μs)
since 1970-01-01 00:00:00+00.

415
DuckDB Documentation

Timestamp Types

Name Aliases Description

TIMESTAMP_NS TIMESTAMP, DATETIME timestamp with nanosecond precision


(ignores time zone)
TIMESTAMP_MS timestamp with millisecond precision
(ignores time zone)
TIMESTAMP_S timestamp with second precision (ignores
time zone)
TIMESTAMPTZ TIMESTAMP WITH TIME ZONE timestamp (uses time zone)

A timestamp specifies a combination of DATE (year, month, day) and a TIME (hour, minute, second, microsecond). Timestamps
can be created using the TIMESTAMP keyword, where the data must be formatted according to the ISO 8601 format (YYYY-MM-DD
hh:mm:ss[.zzzzzz][+-TT[:tt]]).

SELECT TIMESTAMP_NS '1992-09-20 11:30:00.123456'; -- 1992-09-20 11:30:00.123456


SELECT TIMESTAMP '1992-09-20 11:30:00.123456'; -- 1992-09-20 11:30:00.123456
SELECT DATETIME '1992-09-20 11:30:00.123456'; -- 1992-09-20 11:30:00.123456
SELECT TIMESTAMP_MS '1992-09-20 11:30:00.123456'; -- 1992-09-20 11:30:00.123
SELECT TIMESTAMP_S '1992-09-20 11:30:00.123456'; -- 1992-09-20 11:30:00
SELECT TIMESTAMPTZ '1992-09-20 11:30:00.123456'; -- 1992-09-20 11:30:00.123456+00
SELECT TIMESTAMP WITH TIME ZONE '1992-09-20 11:30:00.123456';
-- 1992-09-20 11:30:00.123456+00

Special Values

There are also three special date values that can be used on input:

Input string Valid types Description

epoch TIMESTAMP, TIMESTAMPTZ 1970‑01‑01 00:00:00+00 (Unix system time zero)


infinity TIMESTAMP, TIMESTAMPTZ later than all other time stamps
-infinity TIMESTAMP, TIMESTAMPTZ earlier than all other time stamps

The values infinity and -infinity are specially represented inside the system and will be displayed unchanged; but epoch is simply
a notational shorthand that will be converted to the time stamp value when read.

SELECT '-infinity'::TIMESTAMP, 'epoch'::TIMESTAMP, 'infinity'::TIMESTAMP;

Negative Epoch Positive

‑infinity 1970‑01‑01 00:00:00 infinity

Functions

See Timestamp Functions.

416
DuckDB Documentation

Time Zones

The TIMESTAMPTZ type can be binned into calendar and clock bins using a suitable extension. The built‑in ICU extension implements all
the binning and arithmetic functions using the International Components for Unicode time zone and calendar functions.

To set the time zone to use, first load the ICU extension. The ICU extension comes pre‑bundled with several DuckDB clients (including
Python, R, JDBC, and ODBC), so this step can be skipped in those cases. In other cases you might first need to install and load the ICU
extension.

INSTALL icu;
LOAD icu;

Next, use the SET TimeZone command:

SET TimeZone = 'America/Los_Angeles';

Time binning operations for TIMESTAMPTZ will then be implemented using the given time zone.

A list of available time zones can be pulled from the pg_timezone_names() table function:

SELECT
name,
abbrev,
utc_offset
FROM pg_timezone_names()
ORDER BY
name;

You can also find a reference table of available time zones.

Calendars

The ICU extension also supports non‑Gregorian calendars using the SET Calendar command. Note that the INSTALL and LOAD steps
are only required if the DuckDB client does not bundle the ICU extension.

INSTALL icu;
LOAD icu;
SET Calendar = 'japanese';

Time binning operations for TIMESTAMPTZ will then be implemented using the given calendar. In this example, the era part will now
report the Japanese imperial era number.

A list of available calendars can be pulled from the icu_calendar_names() table function:

SELECT name
FROM icu_calendar_names()
ORDER BY 1;

Settings

The current value of the TimeZone and Calendar settings are determined by ICU when it starts up. They can be queried from in the
duckdb_settings() table function:

SELECT *
FROM duckdb_settings()
WHERE name = 'TimeZone';

┌──────────┬──────────────────┬───────────────────────┬────────────┐
│ name │ value │ description │ input_type │
│ varchar │ varchar │ varchar │ varchar │
├──────────┼──────────────────┼───────────────────────┼────────────┤

417
DuckDB Documentation

│ TimeZone │ Europe/Amsterdam │ The current time zone │ VARCHAR │


└──────────┴──────────────────┴───────────────────────┴────────────┘

SELECT *
FROM duckdb_settings()
WHERE name = 'Calendar';

┌──────────┬───────────┬──────────────────────┬────────────┐
│ name │ value │ description │ input_type │
│ varchar │ varchar │ varchar │ varchar │
├──────────┼───────────┼──────────────────────┼────────────┤
│ Calendar │ gregorian │ The current calendar │ VARCHAR │
└──────────┴───────────┴──────────────────────┴────────────┘

Time Zone Reference List

An up‑to‑date version of this list can be pulled from the pg_timezone_names() table function:

SELECT name, abbrev


FROM pg_timezone_names()
ORDER BY name;

name abbrev

ACT ACT
AET AET
AGT AGT
ART ART
AST AST
Africa/Abidjan Iceland
Africa/Accra Iceland
Africa/Addis_Ababa EAT
Africa/Algiers Africa/Algiers
Africa/Asmara EAT
Africa/Asmera EAT
Africa/Bamako Iceland
Africa/Bangui Africa/Bangui
Africa/Banjul Iceland
Africa/Bissau Africa/Bissau
Africa/Blantyre CAT
Africa/Brazzaville Africa/Brazzaville
Africa/Bujumbura CAT
Africa/Cairo ART
Africa/Casablanca Africa/Casablanca
Africa/Ceuta Africa/Ceuta
Africa/Conakry Iceland
Africa/Dakar Iceland
Africa/Dar_es_Salaam EAT

418
DuckDB Documentation

name abbrev

Africa/Djibouti EAT
Africa/Douala Africa/Douala
Africa/El_Aaiun Africa/El_Aaiun
Africa/Freetown Iceland
Africa/Gaborone CAT
Africa/Harare CAT
Africa/Johannesburg Africa/Johannesburg
Africa/Juba Africa/Juba
Africa/Kampala EAT
Africa/Khartoum Africa/Khartoum
Africa/Kigali CAT
Africa/Kinshasa Africa/Kinshasa
Africa/Lagos Africa/Lagos
Africa/Libreville Africa/Libreville
Africa/Lome Iceland
Africa/Luanda Africa/Luanda
Africa/Lubumbashi CAT
Africa/Lusaka CAT
Africa/Malabo Africa/Malabo
Africa/Maputo CAT
Africa/Maseru Africa/Maseru
Africa/Mbabane Africa/Mbabane
Africa/Mogadishu EAT
Africa/Monrovia Africa/Monrovia
Africa/Nairobi EAT
Africa/Ndjamena Africa/Ndjamena
Africa/Niamey Africa/Niamey
Africa/Nouakchott Iceland
Africa/Ouagadougou Iceland
Africa/Porto‑Novo Africa/Porto‑Novo
Africa/Sao_Tome Africa/Sao_Tome
Africa/Timbuktu Iceland
Africa/Tripoli Libya
Africa/Tunis Africa/Tunis
Africa/Windhoek Africa/Windhoek
America/Adak America/Adak
America/Anchorage AST
America/Anguilla PRT
America/Antigua PRT

419
DuckDB Documentation

name abbrev

America/Araguaina America/Araguaina
America/Argentina/Buenos_Aires AGT
America/Argentina/Catamarca America/Argentina/Catamarca
America/Argentina/ComodRivadavia America/Argentina/ComodRivadavia
America/Argentina/Cordoba America/Argentina/Cordoba
America/Argentina/Jujuy America/Argentina/Jujuy
America/Argentina/La_Rioja America/Argentina/La_Rioja
America/Argentina/Mendoza America/Argentina/Mendoza
America/Argentina/Rio_Gallegos America/Argentina/Rio_Gallegos
America/Argentina/Salta America/Argentina/Salta
America/Argentina/San_Juan America/Argentina/San_Juan
America/Argentina/San_Luis America/Argentina/San_Luis
America/Argentina/Tucuman America/Argentina/Tucuman
America/Argentina/Ushuaia America/Argentina/Ushuaia
America/Aruba PRT
America/Asuncion America/Asuncion
America/Atikokan America/Atikokan
America/Atka America/Atka
America/Bahia America/Bahia
America/Bahia_Banderas America/Bahia_Banderas
America/Barbados America/Barbados
America/Belem America/Belem
America/Belize America/Belize
America/Blanc‑Sablon PRT
America/Boa_Vista America/Boa_Vista
America/Bogota America/Bogota
America/Boise America/Boise
America/Buenos_Aires AGT
America/Cambridge_Bay America/Cambridge_Bay
America/Campo_Grande America/Campo_Grande
America/Cancun America/Cancun
America/Caracas America/Caracas
America/Catamarca America/Catamarca
America/Cayenne America/Cayenne
America/Cayman America/Cayman
America/Chicago CST
America/Chihuahua America/Chihuahua
America/Ciudad_Juarez America/Ciudad_Juarez
America/Coral_Harbour America/Coral_Harbour

420
DuckDB Documentation

name abbrev

America/Cordoba America/Cordoba
America/Costa_Rica America/Costa_Rica
America/Creston PNT
America/Cuiaba America/Cuiaba
America/Curacao PRT
America/Danmarkshavn America/Danmarkshavn
America/Dawson America/Dawson
America/Dawson_Creek America/Dawson_Creek
America/Denver Navajo
America/Detroit America/Detroit
America/Dominica PRT
America/Edmonton America/Edmonton
America/Eirunepe America/Eirunepe
America/El_Salvador America/El_Salvador
America/Ensenada America/Ensenada
America/Fort_Nelson America/Fort_Nelson
America/Fort_Wayne IET
America/Fortaleza America/Fortaleza
America/Glace_Bay America/Glace_Bay
America/Godthab America/Godthab
America/Goose_Bay America/Goose_Bay
America/Grand_Turk America/Grand_Turk
America/Grenada PRT
America/Guadeloupe PRT
America/Guatemala America/Guatemala
America/Guayaquil America/Guayaquil
America/Guyana America/Guyana
America/Halifax America/Halifax
America/Havana Cuba
America/Hermosillo America/Hermosillo
America/Indiana/Indianapolis IET
America/Indiana/Knox America/Indiana/Knox
America/Indiana/Marengo America/Indiana/Marengo
America/Indiana/Petersburg America/Indiana/Petersburg
America/Indiana/Tell_City America/Indiana/Tell_City
America/Indiana/Vevay America/Indiana/Vevay
America/Indiana/Vincennes America/Indiana/Vincennes
America/Indiana/Winamac America/Indiana/Winamac
America/Indianapolis IET

421
DuckDB Documentation

name abbrev

America/Inuvik America/Inuvik
America/Iqaluit America/Iqaluit
America/Jamaica Jamaica
America/Jujuy America/Jujuy
America/Juneau America/Juneau
America/Kentucky/Louisville America/Kentucky/Louisville
America/Kentucky/Monticello America/Kentucky/Monticello
America/Knox_IN America/Knox_IN
America/Kralendijk PRT
America/La_Paz America/La_Paz
America/Lima America/Lima
America/Los_Angeles PST
America/Louisville America/Louisville
America/Lower_Princes PRT
America/Maceio America/Maceio
America/Managua America/Managua
America/Manaus America/Manaus
America/Marigot PRT
America/Martinique America/Martinique
America/Matamoros America/Matamoros
America/Mazatlan America/Mazatlan
America/Mendoza America/Mendoza
America/Menominee America/Menominee
America/Merida America/Merida
America/Metlakatla America/Metlakatla
America/Mexico_City America/Mexico_City
America/Miquelon America/Miquelon
America/Moncton America/Moncton
America/Monterrey America/Monterrey
America/Montevideo America/Montevideo
America/Montreal America/Montreal
America/Montserrat PRT
America/Nassau America/Nassau
America/New_York America/New_York
America/Nipigon America/Nipigon
America/Nome America/Nome
America/Noronha America/Noronha
America/North_Dakota/Beulah America/North_Dakota/Beulah
America/North_Dakota/Center America/North_Dakota/Center

422
DuckDB Documentation

name abbrev

America/North_Dakota/New_Salem America/North_Dakota/New_Salem
America/Nuuk America/Nuuk
America/Ojinaga America/Ojinaga
America/Panama America/Panama
America/Pangnirtung America/Pangnirtung
America/Paramaribo America/Paramaribo
America/Phoenix PNT
America/Port‑au‑Prince America/Port‑au‑Prince
America/Port_of_Spain PRT
America/Porto_Acre America/Porto_Acre
America/Porto_Velho America/Porto_Velho
America/Puerto_Rico PRT
America/Punta_Arenas America/Punta_Arenas
America/Rainy_River America/Rainy_River
America/Rankin_Inlet America/Rankin_Inlet
America/Recife America/Recife
America/Regina America/Regina
America/Resolute America/Resolute
America/Rio_Branco America/Rio_Branco
America/Rosario America/Rosario
America/Santa_Isabel America/Santa_Isabel
America/Santarem America/Santarem
America/Santiago America/Santiago
America/Santo_Domingo America/Santo_Domingo
America/Sao_Paulo BET
America/Scoresbysund America/Scoresbysund
America/Shiprock Navajo
America/Sitka America/Sitka
America/St_Barthelemy PRT
America/St_Johns CNT
America/St_Kitts PRT
America/St_Lucia PRT
America/St_Thomas PRT
America/St_Vincent PRT
America/Swift_Current America/Swift_Current
America/Tegucigalpa America/Tegucigalpa
America/Thule America/Thule
America/Thunder_Bay America/Thunder_Bay
America/Tijuana America/Tijuana

423
DuckDB Documentation

name abbrev

America/Toronto America/Toronto
America/Tortola PRT
America/Vancouver America/Vancouver
America/Virgin PRT
America/Whitehorse America/Whitehorse
America/Winnipeg America/Winnipeg
America/Yakutat America/Yakutat
America/Yellowknife America/Yellowknife
Antarctica/Casey Antarctica/Casey
Antarctica/Davis Antarctica/Davis
Antarctica/DumontDUrville Antarctica/DumontDUrville
Antarctica/Macquarie Antarctica/Macquarie
Antarctica/Mawson Antarctica/Mawson
Antarctica/McMurdo NZ
Antarctica/Palmer Antarctica/Palmer
Antarctica/Rothera Antarctica/Rothera
Antarctica/South_Pole NZ
Antarctica/Syowa Antarctica/Syowa
Antarctica/Troll Antarctica/Troll
Antarctica/Vostok Antarctica/Vostok
Arctic/Longyearbyen Arctic/Longyearbyen
Asia/Aden Asia/Aden
Asia/Almaty Asia/Almaty
Asia/Amman Asia/Amman
Asia/Anadyr Asia/Anadyr
Asia/Aqtau Asia/Aqtau
Asia/Aqtobe Asia/Aqtobe
Asia/Ashgabat Asia/Ashgabat
Asia/Ashkhabad Asia/Ashkhabad
Asia/Atyrau Asia/Atyrau
Asia/Baghdad Asia/Baghdad
Asia/Bahrain Asia/Bahrain
Asia/Baku Asia/Baku
Asia/Bangkok Asia/Bangkok
Asia/Barnaul Asia/Barnaul
Asia/Beirut Asia/Beirut
Asia/Bishkek Asia/Bishkek
Asia/Brunei Asia/Brunei
Asia/Calcutta IST

424
DuckDB Documentation

name abbrev

Asia/Chita Asia/Chita
Asia/Choibalsan Asia/Choibalsan
Asia/Chongqing CTT
Asia/Chungking CTT
Asia/Colombo Asia/Colombo
Asia/Dacca BST
Asia/Damascus Asia/Damascus
Asia/Dhaka BST
Asia/Dili Asia/Dili
Asia/Dubai Asia/Dubai
Asia/Dushanbe Asia/Dushanbe
Asia/Famagusta Asia/Famagusta
Asia/Gaza Asia/Gaza
Asia/Harbin CTT
Asia/Hebron Asia/Hebron
Asia/Ho_Chi_Minh VST
Asia/Hong_Kong Hongkong
Asia/Hovd Asia/Hovd
Asia/Irkutsk Asia/Irkutsk
Asia/Istanbul Turkey
Asia/Jakarta Asia/Jakarta
Asia/Jayapura Asia/Jayapura
Asia/Jerusalem Israel
Asia/Kabul Asia/Kabul
Asia/Kamchatka Asia/Kamchatka
Asia/Karachi PLT
Asia/Kashgar Asia/Kashgar
Asia/Kathmandu Asia/Kathmandu
Asia/Katmandu Asia/Katmandu
Asia/Khandyga Asia/Khandyga
Asia/Kolkata IST
Asia/Krasnoyarsk Asia/Krasnoyarsk
Asia/Kuala_Lumpur Singapore
Asia/Kuching Asia/Kuching
Asia/Kuwait Asia/Kuwait
Asia/Macao Asia/Macao
Asia/Macau Asia/Macau
Asia/Magadan Asia/Magadan
Asia/Makassar Asia/Makassar

425
DuckDB Documentation

name abbrev

Asia/Manila Asia/Manila
Asia/Muscat Asia/Muscat
Asia/Nicosia Asia/Nicosia
Asia/Novokuznetsk Asia/Novokuznetsk
Asia/Novosibirsk Asia/Novosibirsk
Asia/Omsk Asia/Omsk
Asia/Oral Asia/Oral
Asia/Phnom_Penh Asia/Phnom_Penh
Asia/Pontianak Asia/Pontianak
Asia/Pyongyang Asia/Pyongyang
Asia/Qatar Asia/Qatar
Asia/Qostanay Asia/Qostanay
Asia/Qyzylorda Asia/Qyzylorda
Asia/Rangoon Asia/Rangoon
Asia/Riyadh Asia/Riyadh
Asia/Saigon VST
Asia/Sakhalin Asia/Sakhalin
Asia/Samarkand Asia/Samarkand
Asia/Seoul ROK
Asia/Shanghai CTT
Asia/Singapore Singapore
Asia/Srednekolymsk Asia/Srednekolymsk
Asia/Taipei ROC
Asia/Tashkent Asia/Tashkent
Asia/Tbilisi Asia/Tbilisi
Asia/Tehran Iran
Asia/Tel_Aviv Israel
Asia/Thimbu Asia/Thimbu
Asia/Thimphu Asia/Thimphu
Asia/Tokyo JST
Asia/Tomsk Asia/Tomsk
Asia/Ujung_Pandang Asia/Ujung_Pandang
Asia/Ulaanbaatar Asia/Ulaanbaatar
Asia/Ulan_Bator Asia/Ulan_Bator
Asia/Urumqi Asia/Urumqi
Asia/Ust‑Nera Asia/Ust‑Nera
Asia/Vientiane Asia/Vientiane
Asia/Vladivostok Asia/Vladivostok
Asia/Yakutsk Asia/Yakutsk

426
DuckDB Documentation

name abbrev

Asia/Yangon Asia/Yangon
Asia/Yekaterinburg Asia/Yekaterinburg
Asia/Yerevan NET
Atlantic/Azores Atlantic/Azores
Atlantic/Bermuda Atlantic/Bermuda
Atlantic/Canary Atlantic/Canary
Atlantic/Cape_Verde Atlantic/Cape_Verde
Atlantic/Faeroe Atlantic/Faeroe
Atlantic/Faroe Atlantic/Faroe
Atlantic/Jan_Mayen Atlantic/Jan_Mayen
Atlantic/Madeira Atlantic/Madeira
Atlantic/Reykjavik Iceland
Atlantic/South_Georgia Atlantic/South_Georgia
Atlantic/St_Helena Iceland
Atlantic/Stanley Atlantic/Stanley
Australia/ACT AET
Australia/Adelaide Australia/Adelaide
Australia/Brisbane Australia/Brisbane
Australia/Broken_Hill Australia/Broken_Hill
Australia/Canberra AET
Australia/Currie Australia/Currie
Australia/Darwin ACT
Australia/Eucla Australia/Eucla
Australia/Hobart Australia/Hobart
Australia/LHI Australia/LHI
Australia/Lindeman Australia/Lindeman
Australia/Lord_Howe Australia/Lord_Howe
Australia/Melbourne Australia/Melbourne
Australia/NSW AET
Australia/North ACT
Australia/Perth Australia/Perth
Australia/Queensland Australia/Queensland
Australia/South Australia/South
Australia/Sydney AET
Australia/Tasmania Australia/Tasmania
Australia/Victoria Australia/Victoria
Australia/West Australia/West
Australia/Yancowinna Australia/Yancowinna
BET BET

427
DuckDB Documentation

name abbrev

BST BST
Brazil/Acre Brazil/Acre
Brazil/DeNoronha Brazil/DeNoronha
Brazil/East BET
Brazil/West Brazil/West
CAT CAT
CET CET
CNT CNT
CST CST
CST6CDT CST6CDT
CTT CTT
Canada/Atlantic Canada/Atlantic
Canada/Central Canada/Central
Canada/East‑Saskatchewan Canada/East‑Saskatchewan
Canada/Eastern Canada/Eastern
Canada/Mountain Canada/Mountain
Canada/Newfoundland CNT
Canada/Pacific Canada/Pacific
Canada/Saskatchewan Canada/Saskatchewan
Canada/Yukon Canada/Yukon
Chile/Continental Chile/Continental
Chile/EasterIsland Chile/EasterIsland
Cuba Cuba
EAT EAT
ECT ECT
EET EET
EST EST
EST5EDT EST5EDT
Egypt ART
Eire Eire
Etc/GMT GMT
Etc/GMT+0 GMT
Etc/GMT+1 Etc/GMT+1
Etc/GMT+10 Etc/GMT+10
Etc/GMT+11 Etc/GMT+11
Etc/GMT+12 Etc/GMT+12
Etc/GMT+2 Etc/GMT+2
Etc/GMT+3 Etc/GMT+3
Etc/GMT+4 Etc/GMT+4

428
DuckDB Documentation

name abbrev

Etc/GMT+5 Etc/GMT+5
Etc/GMT+6 Etc/GMT+6
Etc/GMT+7 Etc/GMT+7
Etc/GMT+8 Etc/GMT+8
Etc/GMT+9 Etc/GMT+9
Etc/GMT‑0 GMT
Etc/GMT‑1 Etc/GMT‑1
Etc/GMT‑10 Etc/GMT‑10
Etc/GMT‑11 Etc/GMT‑11
Etc/GMT‑12 Etc/GMT‑12
Etc/GMT‑13 Etc/GMT‑13
Etc/GMT‑14 Etc/GMT‑14
Etc/GMT‑2 Etc/GMT‑2
Etc/GMT‑3 Etc/GMT‑3
Etc/GMT‑4 Etc/GMT‑4
Etc/GMT‑5 Etc/GMT‑5
Etc/GMT‑6 Etc/GMT‑6
Etc/GMT‑7 Etc/GMT‑7
Etc/GMT‑8 Etc/GMT‑8
Etc/GMT‑9 Etc/GMT‑9
Etc/GMT0 GMT
Etc/Greenwich GMT
Etc/UCT UCT
Etc/UTC UCT
Etc/Universal UCT
Etc/Zulu UCT
Europe/Amsterdam Europe/Amsterdam
Europe/Andorra Europe/Andorra
Europe/Astrakhan Europe/Astrakhan
Europe/Athens Europe/Athens
Europe/Belfast GB
Europe/Belgrade Europe/Belgrade
Europe/Berlin Europe/Berlin
Europe/Bratislava Europe/Bratislava
Europe/Brussels Europe/Brussels
Europe/Bucharest Europe/Bucharest
Europe/Budapest Europe/Budapest
Europe/Busingen Europe/Busingen
Europe/Chisinau Europe/Chisinau

429
DuckDB Documentation

name abbrev

Europe/Copenhagen Europe/Copenhagen
Europe/Dublin Eire
Europe/Gibraltar Europe/Gibraltar
Europe/Guernsey GB
Europe/Helsinki Europe/Helsinki
Europe/Isle_of_Man GB
Europe/Istanbul Turkey
Europe/Jersey GB
Europe/Kaliningrad Europe/Kaliningrad
Europe/Kiev Europe/Kiev
Europe/Kirov Europe/Kirov
Europe/Kyiv Europe/Kyiv
Europe/Lisbon Portugal
Europe/Ljubljana Europe/Ljubljana
Europe/London GB
Europe/Luxembourg Europe/Luxembourg
Europe/Madrid Europe/Madrid
Europe/Malta Europe/Malta
Europe/Mariehamn Europe/Mariehamn
Europe/Minsk Europe/Minsk
Europe/Monaco ECT
Europe/Moscow W‑SU
Europe/Nicosia Europe/Nicosia
Europe/Oslo Europe/Oslo
Europe/Paris ECT
Europe/Podgorica Europe/Podgorica
Europe/Prague Europe/Prague
Europe/Riga Europe/Riga
Europe/Rome Europe/Rome
Europe/Samara Europe/Samara
Europe/San_Marino Europe/San_Marino
Europe/Sarajevo Europe/Sarajevo
Europe/Saratov Europe/Saratov
Europe/Simferopol Europe/Simferopol
Europe/Skopje Europe/Skopje
Europe/Sofia Europe/Sofia
Europe/Stockholm Europe/Stockholm
Europe/Tallinn Europe/Tallinn
Europe/Tirane Europe/Tirane

430
DuckDB Documentation

name abbrev

Europe/Tiraspol Europe/Tiraspol
Europe/Ulyanovsk Europe/Ulyanovsk
Europe/Uzhgorod Europe/Uzhgorod
Europe/Vaduz Europe/Vaduz
Europe/Vatican Europe/Vatican
Europe/Vienna Europe/Vienna
Europe/Vilnius Europe/Vilnius
Europe/Volgograd Europe/Volgograd
Europe/Warsaw Poland
Europe/Zagreb Europe/Zagreb
Europe/Zaporozhye Europe/Zaporozhye
Europe/Zurich Europe/Zurich
Factory Factory
GB GB
GB‑Eire GB
GMT GMT
GMT+0 GMT
GMT‑0 GMT
GMT0 GMT
Greenwich GMT
HST HST
Hongkong Hongkong
IET IET
IST IST
Iceland Iceland
Indian/Antananarivo EAT
Indian/Chagos Indian/Chagos
Indian/Christmas Indian/Christmas
Indian/Cocos Indian/Cocos
Indian/Comoro EAT
Indian/Kerguelen Indian/Kerguelen
Indian/Mahe Indian/Mahe
Indian/Maldives Indian/Maldives
Indian/Mauritius Indian/Mauritius
Indian/Mayotte EAT
Indian/Reunion Indian/Reunion
Iran Iran
Israel Israel
JST JST

431
DuckDB Documentation

name abbrev

Jamaica Jamaica
Japan JST
Kwajalein Kwajalein
Libya Libya
MET MET
MIT MIT
MST MST
MST7MDT MST7MDT
Mexico/BajaNorte Mexico/BajaNorte
Mexico/BajaSur Mexico/BajaSur
Mexico/General Mexico/General
NET NET
NST NZ
NZ NZ
NZ‑CHAT NZ‑CHAT
Navajo Navajo
PLT PLT
PNT PNT
PRC CTT
PRT PRT
PST PST
PST8PDT PST8PDT
Pacific/Apia MIT
Pacific/Auckland NZ
Pacific/Bougainville Pacific/Bougainville
Pacific/Chatham NZ‑CHAT
Pacific/Chuuk Pacific/Chuuk
Pacific/Easter Pacific/Easter
Pacific/Efate Pacific/Efate
Pacific/Enderbury Pacific/Enderbury
Pacific/Fakaofo Pacific/Fakaofo
Pacific/Fiji Pacific/Fiji
Pacific/Funafuti Pacific/Funafuti
Pacific/Galapagos Pacific/Galapagos
Pacific/Gambier Pacific/Gambier
Pacific/Guadalcanal SST
Pacific/Guam Pacific/Guam
Pacific/Honolulu Pacific/Honolulu
Pacific/Johnston Pacific/Johnston

432
DuckDB Documentation

name abbrev

Pacific/Kanton Pacific/Kanton
Pacific/Kiritimati Pacific/Kiritimati
Pacific/Kosrae Pacific/Kosrae
Pacific/Kwajalein Kwajalein
Pacific/Majuro Pacific/Majuro
Pacific/Marquesas Pacific/Marquesas
Pacific/Midway Pacific/Midway
Pacific/Nauru Pacific/Nauru
Pacific/Niue Pacific/Niue
Pacific/Norfolk Pacific/Norfolk
Pacific/Noumea Pacific/Noumea
Pacific/Pago_Pago Pacific/Pago_Pago
Pacific/Palau Pacific/Palau
Pacific/Pitcairn Pacific/Pitcairn
Pacific/Pohnpei SST
Pacific/Ponape SST
Pacific/Port_Moresby Pacific/Port_Moresby
Pacific/Rarotonga Pacific/Rarotonga
Pacific/Saipan Pacific/Saipan
Pacific/Samoa Pacific/Samoa
Pacific/Tahiti Pacific/Tahiti
Pacific/Tarawa Pacific/Tarawa
Pacific/Tongatapu Pacific/Tongatapu
Pacific/Truk Pacific/Truk
Pacific/Wake Pacific/Wake
Pacific/Wallis Pacific/Wallis
Pacific/Yap Pacific/Yap
Poland Poland
Portugal Portugal
ROC ROC
ROK ROK
SST SST
Singapore Singapore
SystemV/AST4 SystemV/AST4
SystemV/AST4ADT SystemV/AST4ADT
SystemV/CST6 SystemV/CST6
SystemV/CST6CDT SystemV/CST6CDT
SystemV/EST5 SystemV/EST5
SystemV/EST5EDT SystemV/EST5EDT

433
DuckDB Documentation

name abbrev

SystemV/HST10 SystemV/HST10
SystemV/MST7 SystemV/MST7
SystemV/MST7MDT SystemV/MST7MDT
SystemV/PST8 SystemV/PST8
SystemV/PST8PDT SystemV/PST8PDT
SystemV/YST9 SystemV/YST9
SystemV/YST9YDT SystemV/YST9YDT
Turkey Turkey
UCT UCT
US/Alaska AST
US/Aleutian US/Aleutian
US/Arizona PNT
US/Central CST
US/East‑Indiana IET
US/Eastern US/Eastern
US/Hawaii US/Hawaii
US/Indiana‑Starke US/Indiana‑Starke
US/Michigan US/Michigan
US/Mountain Navajo
US/Pacific PST
US/Pacific‑New PST
US/Samoa US/Samoa
UTC UCT
Universal UCT
VST VST
W‑SU W‑SU
WET WET
Zulu UCT

Union Type

A UNION type (not to be confused with the SQL UNION operator) is a nested type capable of holding one of multiple ”alternative” values,
much like the union in C. The main difference being that these UNION types are tagged unions and thus always carry a discriminator ”tag”
which signals which alternative it is currently holding, even if the inner value itself is null. UNION types are thus more similar to C++17's
std::variant, Rust's Enum or the ”sum type” present in most functional languages.

UNION types must always have at least one member, and while they can contain multiple members of the same type, the tag names must
be unique. UNION types can have at most 256 members.

Under the hood, UNION types are implemented on top of STRUCT types, and simply keep the ”tag” as the first entry.

UNION values can be created with the union_value(tag := expr) function or by casting from a member type.

434
DuckDB Documentation

Example

-- Create a table with a union column


CREATE TABLE tbl1 (u UNION(num INT, str VARCHAR));

-- Any type can be implicitly cast to a union containing the type.


-- Any union can also be implicitly cast to another union if
-- the source union members are a subset of the targets.
-- Note: only if the cast is unambiguous!
-- More details in the 'Union casts' section below.
INSERT INTO tbl1 values (1) , ('two') , (union_value(str := 'three'));
-- Union use the member types varchar cast functions when casting to varchar.
SELECT u FROM tbl1;
-- returns:
-- 1
-- two
-- three
-- Select all the 'str' members
SELECT union_extract(u, 'str') FROM tbl1;
-- Alternatively, you can use 'dot syntax' like with structs
SELECT u.str FROM tbl1;
-- returns:
-- NULL
-- two
-- three

-- Select the currently active tag from the union as an enum.


SELECT union_tag(u) FROM tbl1;
-- returns:
-- num
-- str
-- str

Union Casts

Compared to other nested types, UNIONs allow a set of implicit casts to facilitate unintrusive and natural usage when working with their
members as ”subtypes”. However, these casts have been designed with two principles in mind, to avoid ambiguity and to avoid casts that
could lead to loss of information. This prevents UNIONs from being completely ”transparent”, while still allowing UNION types to have a
”supertype” relationship with their members.

Thus UNION types can't be implicitly cast to any of their member types in general, since the information in the other members not matching
the target type would be ”lost”. If you want to coerce a UNION into one of its members, you should use the union_extract function
explicitly instead.

The only exception to this is when casting a UNION to VARCHAR, in which case the members will all use their corresponding VARCHAR
casts. Since everything can be cast to VARCHAR, this is ”safe” in a sense.

Casting to Unions A type can always be implicitly cast to a UNION if it can be implicitly cast to one of the UNION member types.

• If there are multiple candidates, the built in implicit casting priority rules determine the target type. For example, a FLOAT ->
UNION(i INT, v VARCHAR) cast will always cast the FLOAT to the INT member before VARCHAR.
• If the cast still is ambiguous, i.e., there are multiple candidates with the same implicit casting priority, an error is raised. This usually
happens when the UNION contains multiple members of the same type, e.g., a FLOAT -> UNION(i INT, num INT) is always
ambiguous.

So how do we disambiguate if we want to create a UNION with multiple members of the same type? By using the union_value function,
which takes a keyword argument specifying the tag. For example, union_value(num := 2::INT) will create a UNION with a single

435
DuckDB Documentation

member of type INT with the tag num. This can then be used to disambiguate in an explicit (or implicit, read on below!) UNION to UNION
cast, like CAST(union_value(b := 2) AS UNION(a INT, b INT)).

Casting between Unions UNION types can be cast between each other if the source type is a ”subset” of the target type. In other words,
all the tags in the source UNION must be present in the target UNION, and all the types of the matching tags must be implicitly castable
between source and target. In essence, this means that UNION types are covariant with respect to their members.

Ok Source Target Comments

UNION(a A, b B) UNION(a A, b B, c C)
UNION(a A, b B) UNION(a A, b C) if B can be implicitly cast to C
UNION(a A, b B, c C) UNION(a A, b B)
UNION(a A, b B) UNION(a A, b C) if B can't be implicitly cast to C
UNION(A, B, D) UNION(A, B, C)

Comparison and Sorting

Since UNION types are implemented on top of STRUCT types internally, they can be used with all the comparison operators as well as in
both WHERE and HAVING clauses with the same semantics as STRUCTs. The ”tag” is always stored as the first struct entry, which ensures
that the UNION types are compared and ordered by ”tag” first.

Functions

See Nested Functions.

Typecasting

Typecasting is an operation that converts a value in one particular data type to the closest corresponding value in another data type. Like
other SQL engines, DuckDB supports both implicit and explicit typecasting.

Explicit Casting

Explicit typecasting is performed by using a CAST expression. For example, CAST(col AS VARCHAR) or col::VARCHAR explicitly
cast the column col to VARCHAR. See the cast page for more information.

Implicit Casting

In many situations, the system will add casts by itself. This is called implicit casting. This happens for example when a function is called
with an argument that does not match the type of the function, but can be casted to the desired type.

Consider the function sin(DOUBLE). This function takes as input argument a column of type DOUBLE, however, it can be called with an
integer as well: sin(1). The integer is converted into a double before being passed to the sin function.

Implicit casts can only be added for a number of type combinations, and is generally only possible when the cast cannot fail. For example,
an implicit cast can be added from INT to DOUBLE ‑ but not from DOUBLE to INT.

436
DuckDB Documentation

Casting Operations Matrix

Values of a particular data type cannot always be cast to any arbitrary target data type. The only exception is the NULL value ‑ which can
always be converted between types. The following matrix describes which conversions are supported. When implicit casting is allowed, it
implies that explicit casting is also possible.

Even though a casting operation is supported based on the source and target data type, it does not necessarily mean the cast operation
will succeed at runtime.

437
DuckDB Documentation

Note. Deprecated Prior to version 0.10.0, DuckDB allowed any type to be implicitly cast to VARCHAR during function binding. Ver‑
sion 0.10.0 introduced a breaking change which no longer allows implicit casts to VARCHAR. The old_implicit_casting con‑
figuration option setting can be used to revert to the old behavior. However, please note that this flag will be deprecated in the
future.

Lossy Casts Casting operations that result in loss of precision are allowed. For example, it is possible to explicitly cast a numeric type
with fractional digits like DECIMAL, FLOAT or DOUBLE to an integral type like INTEGER. The number will be rounded.

SELECT CAST(3.5 AS INTEGER);

Overflows Casting operations that would result in a value overflow throw an error. For example, the value 999 is too large to be repre‑
sented by the TINYINT data type. Therefore, an attempt to cast that value to that type results in a runtime error:

SELECT CAST(999 AS TINYINT);

So even though the cast operation from INTEGER to TINYINT is supported, it is not possible for this particular value. TRY_CAST can be
used to convert the value into NULL instead of throwing an error.

Varchar The VARCHAR type acts as a univeral target: any arbitrary value of any arbitrary type can always be cast to the VARCHAR type.
This type is also used for displaying values in the shell.

SELECT CAST(42.5 AS VARCHAR);

Casting from VARCHAR to another data type is supported, but can raise an error at runtime if DuckDB cannot parse and convert the provided
text to the target data type.

SELECT CAST('NotANumber' AS INTEGER);

In general, casting to VARCHAR is a lossless operation and any type can be cast back to the original type after being converted into text.

SELECT CAST(CAST([1, 2, 3] AS VARCHAR) AS INT[]);

Literal Types Integer literals (such as 42) and string literals (such as 'string') have special implicit casting rules. See the literal types
page for more information.

Lists / Arrays Lists can be explicitly cast to other lists using the same casting rules. The cast is applied to the children of the list. For
example, if we convert a INT[] list to a VARCHAR[] list, the child INT elements are individually cast to VARCHAR and a new list is con‑
structed.

SELECT CAST([1, 2, 3] AS VARCHAR[]);

Arrays Arrays follow the same casting rules as lists. In addition, arrays can be implicitly cast to lists of the same type. For example, an
INT[3] array can be implicitly cast to an INT[] list.

Structs Structs can be cast to other structs as long as the names of the child elements match.

SELECT CAST({'a': 42} AS STRUCT(a VARCHAR));

The names of the struct can also be in a different order. The fields of the struct will be reshuffled based on the names of the structs.

SELECT CAST({'a': 42, 'b': 84} AS STRUCT(b VARCHAR, a VARCHAR));

Unions Union casting rules can be found on the UNION type page.

438
DuckDB Documentation

Expressions

Expressions

An expression is a combination of values, operators and functions. Expressions are highly composable, and range from very simple to
arbitrarily complex. They can be found in many different parts of SQL statements. In this section, we provide the different types of operators
and functions that can be used within expressions.

CASE Statement

The CASE statement performs a switch based on a condition. The basic form is identical to the ternary condition used in many programming
languages (CASE WHEN cond THEN a ELSE b END is equivalent to cond ? a : b). With a single condition this can be expressed
with IF(cond, a, b).

CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i;


SELECT i, CASE WHEN i > 2 THEN 1 ELSE 0 END AS test
FROM integers;
-- 1, 2, 3
-- 0, 0, 1

-- this is equivalent to:


SELECT i, IF(i > 2, 1, 0) AS test
FROM integers;
-- 1, 2, 3
-- 0, 0, 1

The WHEN cond THEN expr part of the CASE statement can be chained, whenever any of the conditions returns true for a single tuple,
the corresponding expression is evaluated and returned.

CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i;


SELECT i, CASE WHEN i = 1 THEN 10 WHEN i = 2 THEN 20 ELSE 0 END AS test
FROM integers;
-- 1, 2, 3
-- 10, 20, 0

The ELSE part of the CASE statement is optional. If no else statement is provided and none of the conditions match, the CASE statement
will return NULL.

CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i;


SELECT i, CASE WHEN i = 1 THEN 10 END AS test
FROM integers;
-- 1, 2, 3
-- 10, NULL, NULL

After the CASE but before the WHEN an individual expression can also be provided. When this is done, the CASE statement is essentially
transformed into a switch statement.

CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i;


SELECT i, CASE i WHEN 1 THEN 10 WHEN 2 THEN 20 WHEN 3 THEN 30 END AS test
FROM integers;
-- 1, 2, 3
-- 10, 20, 30

-- this is equivalent to:


SELECT i, CASE WHEN i = 1 THEN 10 WHEN i = 2 THEN 20 WHEN i = 3 THEN 30 END AS test
FROM integers;

439
DuckDB Documentation

Casting

Casting refers to the operation of converting a value in a particular data type to the corresponding value in another data type. Casting can
occur either implicitly or explicitly. The syntax described here performs an explicit cast. More information on casting can be found on the
typecasting page.

Explicit Casting

The standard SQL syntax for explicit casting is CAST(expr AS TYPENAME), where TYPENAME is a name (or alias) of one of DuckDB's
data types. DuckDB also supports the shorthand expr::TYPENAME, which is also present in PostgreSQL.

SELECT CAST(i AS VARCHAR) FROM generate_series(1, 3) tbl(i);


-- "1", "2", "3"

SELECT i::DOUBLE FROM generate_series(1, 3) tbl(i);


-- 1.0, 2.0, 3.0

Casting Rules Not all casts are possible. For example, it is not possible to convert an INTEGER to a DATE. Casts may also throw errors
when the cast could not be successfully performed. For example, trying to cast the string 'hello' to an INTEGER will result in an error
being thrown.

SELECT CAST('hello' AS INTEGER);


-- Conversion Error: Could not convert string 'hello' to INT32

The exact behavior of the cast depends on the source and destination types. For example, when casting from VARCHAR to any other type,
the string will be attempted to be converted.

TRY_CAST TRY_CAST can be used when the preferred behavior is not to throw an error, but instead to return a NULL value. TRY_CAST
will never throw an error, and will instead return NULL if a cast is not possible.

SELECT TRY_CAST('hello' AS INTEGER);


-- NULL

Collations

Collations provide rules for how text should be sorted or compared in the execution engine. Collations are useful for localization, as the
rules for how text should be ordered are different for different languages or for different countries. These orderings are often incompatible
with one another. For example, in English the letter ”y” comes between ”x” and ”z”. However, in Lithuanian the letter ”y” comes between
the ”i” and ”j”. For that reason, different collations are supported. The user must choose which collation they want to use when performing
sorting and comparison operations.

By default, the BINARY collation is used. That means that strings are ordered and compared based only on their binary contents. This
makes sense for standard ASCII characters (i.e., the letters A‑Z and numbers 0‑9), but generally does not make much sense for special
unicode characters. It is, however, by far the fastest method of performing ordering and comparisons. Hence it is recommended to stick
with the BINARY collation unless required otherwise.

Using Collations

In the stand‑alone installation of DuckDB three collations are included: NOCASE, NOACCENT and NFC. The NOCASE collation compares
characters as equal regardless of their casing. The NOACCENT collation compares characters as equal regardless of their accents. The NFC
collation performs NFC‑normalized comparisons, see Unicode normalization for more information.

440
DuckDB Documentation

SELECT 'hello' = 'hElLO';


-- false
SELECT 'hello' COLLATE NOCASE = 'hElLO';
-- true

SELECT 'hello' = 'hëllo';


-- false
SELECT 'hello' COLLATE NOACCENT = 'hëllo';
-- true

Collations can be combined by chaining them using the dot operator. Note, however, that not all collations can be combined together. In
general, the NOCASE collation can be combined with any other collator, but most other collations cannot be combined.

SELECT 'hello' COLLATE NOCASE = 'hElLÖ';


-- false
SELECT 'hello' COLLATE NOACCENT = 'hElLÖ';
-- false
SELECT 'hello' COLLATE NOCASE.NOACCENT = 'hElLÖ';
-- true

Default Collations

The collations we have seen so far have all been specified per expression. It is also possible to specify a default collator, either on the global
database level or on a base table column. The PRAGMA default_collation can be used to specify the global default collator. This is
the collator that will be used if no other one is specified.

SET default_collation = NOCASE;

SELECT 'hello' = 'HeLlo';


-- true

Collations can also be specified per‑column when creating a table. When that column is then used in a comparison, the per‑column collation
is used to perform that comparison.

CREATE TABLE names (name VARCHAR COLLATE NOACCENT);


INSERT INTO names VALUES ('hännes');
SELECT name
FROM names
WHERE name = 'hannes';
-- hännes

Be careful here, however, as different collations cannot be combined. This can be problematic when you want to compare columns that
have a different collation specified.

SELECT name
FROM names
WHERE name = 'hannes' COLLATE NOCASE;
-- ERROR: Cannot combine types with different collation!

CREATE TABLE other_names (name VARCHAR COLLATE NOCASE);


INSERT INTO other_names VALUES ('HÄNNES');

SELECT *
FROM names, other_names
WHERE names.name = other_names.name;
-- ERROR: Cannot combine types with different collation!

-- need to manually overwrite the collation!

441
DuckDB Documentation

SELECT *
FROM names, other_names
WHERE names.name COLLATE NOACCENT.NOCASE = other_names.name COLLATE NOACCENT.NOCASE;
-- hännes|HÄNNES

ICU Collations

The collations we have seen so far are not region‑dependent, and do not follow any specific regional rules. If you wish to follow the rules
of a specific region or language, you will need to use one of the ICU collations. For that, you need to load the ICU extension.

If you are using the C++ API, you may find the extension in the extension/icu folder of the DuckDB project. Using the C++ API, the
extension can be loaded as follows:

DuckDB db;
db.LoadExtension<ICUExtension>();

Loading this extension will add a number of language and region specific collations to your database. These can be queried using PRAGMA
collations command, or by querying the pragma_collations function.

PRAGMA collations;
SELECT * FROM pragma_collations();
-- [af, am, ar, as, az, be, bg, bn, bo, bs, bs, ca, ceb, chr, cs, cy, da, de, de_AT, dsb, dz, ee, el, en,
en_US, en_US, eo, es, et, fa, fa_AF, fi, fil, fo, fr, fr_CA, ga, gl, gu, ha, haw, he, he_IL, hi, hr, hsb,
hu, hy, id, id_ID, ig, is, it, ja, ka, kk, kl, km, kn, ko, kok, ku, ky, lb, lkt, ln, lo, lt, lv, mk, ml,
mn, mr, ms, mt, my, nb, nb_NO, ne, nl, nn, om, or, pa, pa, pa_IN, pl, ps, pt, ro, ru, se, si, sk, sl,
smn, sq, sr, sr, sr_BA, sr_ME, sr_RS, sr, sr_BA, sr_RS, sv, sw, ta, te, th, tk, to, tr, ug, uk, ur, uz,
vi, wae, wo, xh, yi, yo, zh, zh, zh_CN, zh_SG, zh, zh_HK, zh_MO, zh_TW, zu]

These collations can then be used as the other collations would be used before. They can also be combined with the NOCASE collation.
For example, to use the German collation rules you could use the following code snippet:

CREATE TABLE strings (s VARCHAR COLLATE DE);


INSERT INTO strings VALUES ('Gabel'), ('Göbel'), ('Goethe'), ('Goldmann'), ('Göthe'), ('Götz');
SELECT * FROM strings ORDER BY s;
-- "Gabel", "Göbel", "Goethe", "Goldmann", "Göthe", "Götz"

Comparisons

Comparison Operators

The table below shows the standard comparison operators. Whenever either of the input arguments is NULL, the output of the comparison
is NULL.

Operator Description Example Result

< less than 2 < 3 true


> greater than 2 > 3 false
<= less than or equal to 2 <= 3 true
>= greater than or equal to 4 >= NULL NULL
= equal NULL = NULL NULL
<> or != not equal 2 <> 2 false

The table below shows the standard distinction operators. These operators treat NULL values as equal.

442
DuckDB Documentation

Operator Description Example Result

IS DISTINCT FROM not equal, including NULL 2 IS DISTINCT FROM NULL true
IS NOT DISTINCT FROM equal, including NULL NULL IS NOT DISTINCT FROM true
NULL

BETWEEN and IS [NOT] NULL

Besides the standard comparison operators there are also the BETWEEN and IS (NOT) NULL operators. These behave much like
operators, but have special syntax mandated by the SQL standard. They are shown in the table below.

Note that BETWEEN and NOT BETWEEN are only equivalent to the examples below in the cases where both a, x and y are of the same
type, as BETWEEN will cast all of its inputs to the same type.

Predicate Description

a BETWEEN x AND y equivalent to x <= a AND a <= y


a NOT BETWEEN x AND y equivalent to x > a OR a > y
expression IS NULL true if expression is NULL, false otherwise
expression ISNULL alias for IS NULL (non‑standard)
expression IS NOT NULL false if expression is NULL, true otherwise
expression NOTNULL alias for IS NOT NULL (non‑standard)

Note. For the expression BETWEEN x AND y, x is used as the lower bound and y is used as the upper bound. Therefore, if x >
y, the result will always be false.

IN Operator

The IN operator checks containment of the left expression inside the set of expressions on the right hand side (RHS). The IN operator
returns true if the expression is present in the RHS, false if the expression is not in the RHS and the RHS has no NULL values, or NULL if the
expression is not in the RHS and the RHS has NULL values.

SELECT 'Math' IN ('CS', 'Math');


-- true
SELECT 'English' IN ('CS', 'Math');
-- false

SELECT 'Math' IN ('CS', 'Math', NULL);


-- true
SELECT 'English' IN ('CS', 'Math', NULL);
-- NULL

NOT IN can be used to check if an element is not present in the set. X NOT IN Y is equivalent to NOT(X IN Y).

The IN operator can also be used with a subquery that returns a single column. See the subqueries page for more information.

Logical Operators

The following logical operators are available: AND, OR and NOT. SQL uses a three‑valuad logic system with true, false and NULL. Note
that logical operators involving NULL do not always evaluate to NULL. For example, NULL AND false will evaluate to false, and NULL
OR true will evaluate to true. Below are the complete truth tables.

443
DuckDB Documentation

Binary Operators: AND and OR

a b a AND b a OR b

true true true true


true false false true
true NULL NULL true
false false false false
false NULL false NULL
NULL NULL NULL NULL

Unary Operator: NOT

a NOT a

true false
false true
NULL NULL

The operators AND and OR are commutative, that is, you can switch the left and right operand without affecting the result.

Star Expression

Examples

-- select all columns present in the FROM clause


SELECT * FROM table_name;
-- select all columns from the table called "table_name"
SELECT table_name.* FROM table_name JOIN other_table_name USING (id);
-- select all columns except the city column from the addresses table
SELECT * EXCLUDE (city) FROM addresses;
-- select all columns from the addresses table, but replace city with lower(city)
SELECT * REPLACE (lower(city) AS city) FROM addresses;
-- select all columns matching the given expression
SELECT COLUMNS(c -> c LIKE '%num%') FROM addresses;
-- select all columns matching the given regex from the table
SELECT COLUMNS('number\d+') FROM addresses;

Syntax

Star Expression

The * expression can be used in a SELECT statement to select all columns that are projected in the FROM clause.

SELECT *
FROM tbl;

The * expression can be modified using the EXCLUDE and REPLACE.

444
DuckDB Documentation

EXCLUDE Clause EXCLUDE allows us to exclude specific columns from the * expression.

SELECT * EXCLUDE (col)


FROM tbl;

REPLACE Clause REPLACE allows us to replace specific columns with different expressions.

SELECT * REPLACE (col / 1000 AS col)


FROM tbl;

COLUMNS Expression

The COLUMNS expression can be used to execute the same expression on multiple columns. Like the * expression, it can only be used in
the SELECT clause.

CREATE TABLE numbers (id INT, number INT);


INSERT INTO numbers VALUES (1, 10), (2, 20), (3, NULL);
SELECT min(COLUMNS(*)), count(COLUMNS(*)) FROM numbers;

min(numbers.id) min(numbers.number) count(numbers.id) count(numbers.number)

1 10 3 2

The * expression in the COLUMNS statement can also contain EXCLUDE or REPLACE, similar to regular star expressions.

SELECT min(COLUMNS(* REPLACE (number + id AS number))), count(COLUMNS(* EXCLUDE (number))) FROM numbers;

min(numbers.id) min(number := (number + id)) count(numbers.id)

1 11 3

COLUMNS expressions can also be combined, as long as the COLUMNS contains the same (star) expression:

SELECT COLUMNS(*) + COLUMNS(*) FROM numbers;

(numbers.id + numbers.id) (numbers.number + numbers.number)

2 20
4 40
6 NULL

COLUMNS Regular Expression

COLUMNS supports passing a regex in as a string constant:

SELECT COLUMNS('(id|numbers?)') FROM numbers;

id number

1 10
2 20

445
DuckDB Documentation

id number

3 NULL

COLUMNS Lambda Function

COLUMNS also supports passing in a lambda function. The lambda function will be evaluated for all columns present in the FROM clause,
and only columns that match the lambda function will be returned. This allows the execution of arbitrary expressions in order to select
columns.

SELECT COLUMNS(c -> c LIKE '%num%') FROM numbers;

number

10
20
NULL

STRUCT.*

The * expression can also be used to retrieve all keys from a struct as separate columns. This is particularly useful when a prior operation
creates a struct of unknown shape, or if a query must handle any potential struct keys. See the STRUCT data type and nested functions
pages for more details on working with structs.

-- All keys within a struct can be returned as separate columns using *


SELECT st.* FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS st);

x y z

1 2 3

Subqueries

Scalar Subquery

Scalar subqueries are subqueries that return a single value. They can be used anywhere where a regular expression can be used. If a scalar
subquery returns more than a single value, the first value returned will be used.

Consider the following table:

Grades

grade course

7 Math
9 Math
8 CS

CREATE TABLE grades (grade INTEGER, course VARCHAR);


INSERT INTO grades VALUES (7, 'Math'), (9, 'Math'), (8, 'CS');

446
DuckDB Documentation

We can run the following query to obtain the minimum grade:

SELECT min(grade) FROM grades;


-- {7}

By using a scalar subquery in the WHERE clause, we can figure out for which course this grade was obtained:

SELECT course FROM grades WHERE grade = (SELECT min(grade) FROM grades);
-- {Math}

EXISTS

The EXISTS operator tests for the existence of any row inside the subquery. It returns either true when the subquery returns one or more
records, and false otherwise. The EXISTS operator is generally the most useful as a correlated subquery to express semijoin operations.
However, it can be used as an uncorrelated subquery as well.

For example, we can use it to figure out if there are any grades present for a given course:

SELECT EXISTS (SELECT * FROM grades WHERE course = 'Math');


-- true

SELECT EXISTS (SELECT * FROM grades WHERE course = 'History');


-- false

NOT EXISTS The NOT EXISTS operator tests for the absence of any row inside the subquery. It returns either true when the subquery
returns an empty result, and false otherwise. The NOT EXISTS operator is generally the most useful as a correlated subquery to express
antijoin operations. For example, to find Person nodes without an interest:

CREATE TABLE Person (id BIGINT, name VARCHAR);


CREATE TABLE interest (PersonId BIGINT, topic VARCHAR);

INSERT INTO Person VALUES (1, 'Jane'), (2, 'Joe');


INSERT INTO interest VALUES (2, 'Music');

SELECT *
FROM Person
WHERE NOT EXISTS (SELECT * FROM interest WHERE interest.PersonId = Person.id);

┌───────┬─────────┐
│ id │ name │
│ int64 │ varchar │
├───────┼─────────┤
│ 1 │ Jane │
└───────┴─────────┘

Note. DuckDB automatically detects when a NOT EXISTS query expresses an antijoin operation. There is no need to manually
rewrite such queries to use LEFT OUTER JOIN ... WHERE ... IS NULL.

IN Operator

The IN operator checks containment of the left expression inside the result defined by the subquery or the set of expressions on the right
hand side (RHS). The IN operator returns true if the expression is present in the RHS, false if the expression is not in the RHS and the RHS
has no NULL values, or NULL if the expression is not in the RHS and the RHS has NULL values.

We can use the IN operator in a similar manner as we used the EXISTS operator:

SELECT 'Math' IN (SELECT course FROM grades);


-- true

447
DuckDB Documentation

Correlated Subqueries

All the subqueries presented here so far have been uncorrelated subqueries, where the subqueries themselves are entirely self‑contained
and can be run without the parent query. There exists a second type of subqueries called correlated subqueries. For correlated subqueries,
the subquery uses values from the parent subquery.

Conceptually, the subqueries are run once for every single row in the parent query. Perhaps a simple way of envisioning this is that the
correlated subquery is a function that is applied to every row in the source data set.

For example, suppose that we want to find the minimum grade for every course. We could do that as follows:

SELECT *
FROM grades grades_parent
WHERE grade =
(SELECT min(grade)
FROM grades
WHERE grades.course = grades_parent.course);
-- {7, Math}, {8, CS}

The subquery uses a column from the parent query (grades_parent.course). Conceptually, we can see the subquery as a function
where the correlated column is a parameter to that function:

SELECT min(grade)
FROM grades
WHERE course = ?;

Now when we execute this function for each of the rows, we can see that for Math this will return 7, and for CS it will return 8. We then
compare it against the grade for that actual row. As a result, the row (Math, 9) will be filtered out, as 9 <> 7.

Returning Each Row of the Subquery as a Struct

Using the name of a subquery in the SELECT clause (without referring to a specific column) turns each row of the subquery into a struct
whose fields correspond to the columns of the subquery. For example:

SELECT t
FROM (SELECT unnest(generate_series(41, 43)) AS x, 'hello' AS y) t;

┌─────────────────────────────┐
│ t │
│ struct(x bigint, y varchar) │
├─────────────────────────────┤
│ {'x': 41, 'y': hello} │
│ {'x': 42, 'y': hello} │
│ {'x': 43, 'y': hello} │
└─────────────────────────────┘

Functions

Functions

Function Syntax

Query Functions

The duckdb_functions() table function shows the list of functions currently built into the system.

448
DuckDB Documentation

SELECT DISTINCT ON(function_name)


function_name,
function_type,
return_type,
parameters,
parameter_types,
description
FROM duckdb_functions()
WHERE function_type = 'scalar' AND function_name LIKE 'b%'
ORDER BY function_name;

function_ function_ return_


name type type parameters parameter_types description

bar scalar VARCHAR [x, min, [DOUBLE, Draws a band whose width is proportional to (x ‑ min) and equal to
max, width] DOUBLE, width characters when x = max. width defaults to 80
DOUBLE,
DOUBLE]
base64 scalar VARCHAR [blob] [BLOB] Convert a blob to a base64 encoded string
bin scalar VARCHAR [value] [VARCHAR] Converts the value to binary representation
bit_ scalar TINYINT [x] [TINYINT] Returns the number of bits that are set
count
bit_ scalar BIGINT [col0] [VARCHAR] NULL
length
bit_ scalar INTEGER [substring, [BIT, BIT] Returns first starting index of the specified substring within bits, or
position bitstring] zero if it is not present. The first (leftmost) bit is indexed 1
bitstring scalar BIT [bitstring, [VARCHAR, Pads the bitstring until the specified length
length] INTEGER]

Note. Currently, the description and parameter names of functions are not available in the duckdb_functions() function.

Bitstring Functions

This section describes functions and operators for examining and manipulating bit values. Bitstrings must be of equal length when per‑
forming the bitwise operands AND, OR and XOR. When bit shifting, the original length of the string is preserved.

Bitstring Operators

The table below shows the available mathematical operators for BIT type.

Operator Description Example Result

& bitwise AND '10101'::BIT & '10001'::BIT 10001


| bitwise OR '1011'::BIT | '0001'::BIT 1011
xor bitwise XOR xor('101'::BIT, '001'::BIT) 100
~ bitwise NOT ~('101'::BIT) 010
<< bitwise shift left '1001011'::BIT << 3 1011000
>> bitwise shift right '1001011'::BIT >> 3 0001001

449
DuckDB Documentation

Bitstring Functions

The table below shows the available scalar functions for BIT type.

Function Description Example Result

bit_count( Returns the number of set bits in the bit_count('1101011'::BIT) 5


bitstring) bitstring.
bit_length( Returns the number of bits in the bitstring. bit_length('1101011'::BIT) 7
bitstring)
bit_position( Returns first starting index of the specified bit_position('010'::BIT, 4
substring, substring within bits, or zero if it's not '1110101'::BIT)
bitstring) present. The first (leftmost) bit is indexed
1
bitstring( Returns a bitstring of determined length. bitstring('1010'::BIT, 7) 0001010
bitstring, length)
get_bit( Extracts the nth bit from bitstring; the first get_bit('0110010'::BIT, 2) 1
bitstring, index) (leftmost) bit is indexed 0.
length( bitstring) Alias for bit_length. length('1101011'::BIT) 7
octet_length( Returns the number of bytes in the octet_length('1101011'::BIT) 1
bitstring) bitstring.
set_bit( Sets the nth bit in bitstring to newvalue; set_bit('0110010'::BIT, 2, 0) 0100010
bitstring, index, the first (leftmost) bit is indexed 0. Returns
new_value) a new bitstring.

Bitstring Aggregate Functions

These aggregate functions are available for BIT type.

Function Description Example

bit_and(arg) Returns the bitwise AND operation performed on all bit_and(A)


bitstrings in a given expression.
bit_or(arg) Returns the bitwise OR operation performed on all bit_or(A)
bitstrings in a given expression.
bit_xor(arg) Returns the bitwise XOR operation performed on all bit_xor(A)
bitstrings in a given expression.
bitstring_agg(arg, min, max) Returns a bitstring with bits set for each distinct value. bitstring_agg(A, 1,
42)
bitstring_agg(arg) Returns a bitstring with bits set for each distinct value. bitstring_agg(A)

Bitstring Aggregation The bitstring_agg function takes any integer type as input and returns a bitstring with bits set for each dis‑
tinct value. The left‑most bit represents the smallest value in the column and the right‑most bit the maximum value. If possible, the min
and max are retrieved from the column statistics. Otherwise, it is also possible to provide the min and max values.

The combination of bit_count and bitstring_agg could be used as an alternative to count(DISTINCT ...), with possible
performance improvements in cases of low cardinality and dense values.

450
DuckDB Documentation

Blob Functions

This section describes functions and operators for examining and manipulating blob values.

Function Description Example Result

blob || blob Blob concatenation '\xAA'::BLOB || '\xBB'::BLOB \xAA\xBB


decode( blob) Convert blob to varchar. Fails if decode('\xC3\xBC'::BLOB) ü
blob is not valid UTF‑8.
encode( string) Convert varchar to blob. Converts encode('my_string_with_ü') my_string_with_
UTF‑8 characters into literal \xC3\xBC
encoding.
octet_length( Number of bytes in blob octet_length('\xAA\xBB'::BLOB) 2
blob)
read_blob( Returns the content from source read_blob('hello.bin') hello\x0A
source) (a filename, a list of filenames, or a
glob pattern) as a BLOB. See the
read_blob guide for more
details.

Date Format Functions

The strftime and strptime functions can be used to convert between dates/timestamps and strings. This is often required when
parsing CSV files, displaying output to the user or transferring information between programs. Because there are many possible date
representations, these functions accept a format string that describes how the date or timestamp should be structured.

strftime Examples

strftime(timestamp, format) converts timestamps or dates to strings according to the specified pattern.

SELECT strftime(DATE '1992-03-02', '%d/%m/%Y');


-- 02/03/1992

SELECT strftime(TIMESTAMP '1992-03-02 20:32:45', '%A, %-d %B %Y - %I:%M:%S %p');


-- Monday, 2 March 1992 - 08:32:45 PM

strptime Examples

strptime(string, format) converts strings to timestamps according to the specified pattern.

SELECT strptime('02/03/1992', '%d/%m/%Y');


-- 1992-03-02 00:00:00

SELECT strptime('Monday, 2 March 1992 - 08:32:45 PM', '%A, %-d %B %Y - %I:%M:%S %p');
-- 1992-03-02 20:32:45

CSV Parsing

The date formats can also be specified during CSV parsing, either in the COPY statement or in the read_csv function. This can be done by
either specifying a DATEFORMAT or a TIMESTAMPFORMAT (or both). DATEFORMAT will be used for converting dates, and TIMESTAMP-
FORMAT will be used for converting timestamps. Below are some examples for how to use this:

451
DuckDB Documentation

-- in COPY statement
COPY dates FROM 'test.csv' (DATEFORMAT '%d/%m/%Y', TIMESTAMPFORMAT '%A, %-d %B %Y - %I:%M:%S %p');

-- in read_csv function
SELECT *
FROM read_csv('test.csv', dateformat = '%m/%d/%Y');

Format Specifiers

Below is a full list of all available format specifiers.

Specifier Description Example

%a Abbreviated weekday name. Sun, Mon, ...


%A Full weekday name. Sunday, Monday, ...
%b Abbreviated month name. Jan, Feb, ..., Dec
%B Full month name. January, February, ...
%c ISO date and time representation 1992‑03‑02 10:30:20
%d Day of the month as a zero‑padded decimal. 01, 02, ..., 31
%-d Day of the month as a decimal number. 1, 2, ..., 30
%f Microsecond as a decimal number, zero‑padded on the left. 000000 ‑ 999999
%g Millisecond as a decimal number, zero‑padded on the left. 000 ‑ 999
%G ISO 8601 year with century representing the year that contains the 0001, 0002, ..., 2013, 2014, ..., 9998, 9999
greater part of the ISO week (see %V).
%H Hour (24‑hour clock) as a zero‑padded decimal number. 00, 01, ..., 23
%-H Hour (24‑hour clock) as a decimal number. 0, 1, ..., 23
%I Hour (12‑hour clock) as a zero‑padded decimal number. 01, 02, ..., 12
%-I Hour (12‑hour clock) as a decimal number. 1, 2, ... 12
%j Day of the year as a zero‑padded decimal number. 001, 002, ..., 366
%-j Day of the year as a decimal number. 1, 2, ..., 366
%m Month as a zero‑padded decimal number. 01, 02, ..., 12
%-m Month as a decimal number. 1, 2, ..., 12
%M Minute as a zero‑padded decimal number. 00, 01, ..., 59
%-M Minute as a decimal number. 0, 1, ..., 59
%n Nanosecond as a decimal number, zero‑padded on the left. 000000000 ‑ 999999999
%p Locale's AM or PM. AM, PM
%S Second as a zero‑padded decimal number. 00, 01, ..., 59
%-S Second as a decimal number. 0, 1, ..., 59
%u ISO 8601 weekday as a decimal number where 1 is Monday. 1, 2, ..., 7
%U Week number of the year. Week 01 starts on the first Sunday of the 00, 01, ..., 53
year, so there can be week 00. Note that this is not compliant with the
week date standard in ISO‑8601.
%V ISO 8601 week as a decimal number with Monday as the first day of 01, ..., 53
the week. Week 01 is the week containing Jan 4.
%w Weekday as a decimal number. 0, 1, ..., 6

452
DuckDB Documentation

Specifier Description Example

%W Week number of the year. Week 01 starts on the first Monday of the 00, 01, ..., 53
year, so there can be week 00. Note that this is not compliant with the
week date standard in ISO‑8601.
%x ISO date representation 1992‑03‑02
%X ISO time representation 10:30:20
%y Year without century as a zero‑padded decimal number. 00, 01, ..., 99
%-y Year without century as a decimal number. 0, 1, ..., 99
%Y Year with century as a decimal number. 2013, 2019 etc.
%z Time offset from UTC in the form ±HH:MM, ±HHMM, or ±HH. ‑0700
%Z Time zone name. Europe/Amsterdam
%% A literal % character. %

Date Functions

This section describes functions and operators for examining and manipulating date values.

Date Operators

The table below shows the available mathematical operators for DATE types.

Operator Description Example Result

+ addition of days (integers) DATE '1992-03-22' + 5 1992-03-27


+ addition of an INTERVAL DATE '1992-03-22' + INTERVAL 1992-03-27
5 DAY
+ addition of a variable SELECT DATE '1992-03-22' + 1992-03-27 and
INTERVAL INTERVAL 1 DAY * d.days FROM 1992-04-02
(VALUES (5), (11)) AS d(days)
- subtraction of DATEs DATE '1992-03-27' - DATE 5
'1992-03-22'
- subtraction of an INTERVAL DATE '1992-03-27' - INTERVAL 1992-03-22
5 DAY
- subtraction of a variable SELECT DATE '1992-03-27' - 1992-03-22 and
INTERVAL INTERVAL 1 DAY * d.days FROM 1992-03-16
(VALUES (5), (11)) AS d(days)

Adding to or subtracting from infinite values produces the same infinite value.

Date Functions

The table below shows the available functions for DATE types. Dates can also be manipulated with the timestamp functions through type
promotion.

453
DuckDB Documentation

Function Description Example Result

current_date Current date (at start of current current_date 2022-10-08


transaction)
date_add( date, Add the interval to the date date_add(DATE '1992-09-15', 1992-11-15
interval) INTERVAL 2 MONTH)
date_diff( part, The number of partition date_diff('month', DATE 2
startdate, enddate) boundaries between the dates '1992-09-15', DATE
'1992-11-14')
date_part( part, date) Get the subfield (equivalent to date_part('year', DATE 1992
extract) '1992-09-20')
date_sub( part, The number of complete date_sub('month', DATE 1
startdate, enddate) partitions between the dates '1992-09-15', DATE
'1992-11-14')
date_trunc( part, date) Truncate to specified precision date_trunc('month', DATE 1992-03-01
'1992-03-07')
datediff( part, Alias of date_diff. The number datediff('month', DATE 2
startdate, enddate) of partition boundaries '1992-09-15', DATE
between the dates '1992-11-14')
datepart( part, date) Alias of date_part. Get the datepart('year', DATE 1992
subfield (equivalent to '1992-09-20')
extract)
datesub( part, Alias of date_sub. The number datesub('month', DATE 1
startdate, enddate) of complete partitions between '1992-09-15', DATE
the dates '1992-11-14')
datetrunc( part, date) Alias of date_trunc. Truncate to datetrunc('month', DATE 1992-03-01
specified precision '1992-03-07')
dayname( date) The (English) name of the dayname(DATE '1992-09-20') Sunday
weekday
extract( part fromdate) Get subfield from a date extract('year' FROM DATE 1992
'1992-09-20')
greatest( date, date) The later of two dates greatest(DATE '1992-09-20', 1992-09-20
DATE '1992-03-07')
isfinite( date) Returns true if the date is finite, isfinite(DATE '1992-03-07') true
false otherwise
isinf( date) Returns true if the date is isinf(DATE '-infinity') true
infinite, false otherwise
last_day( date) The last day of the last_day(DATE '1992-09-20') 1992-09-30
corresponding month in the
date
least( date, date) The earlier of two dates least(DATE '1992-09-20', DATE 1992-03-07
'1992-03-07')
make_date( bigint, The date for the given parts make_date(1992, 9, 20) 1992-09-20
bigint, bigint)
monthname( date) The (English) name of the monthname(DATE '1992-09-20') September
month

454
DuckDB Documentation

Function Description Example Result

strftime(date, format) Converts a date to a string strftime(date '1992-01-01', Wed, 1 January


according to the format string '%a, %-d %B %Y') 1992
time_bucket( bucket_ Truncate date by the specified time_bucket(INTERVAL '2 1992-04-01
width, date[,offset]) interval bucket_width. months', DATE '1992-04-20',
Buckets are offset by offset INTERVAL '1 month')
interval.
time_bucket( bucket_ Truncate date by the specified time_bucket(INTERVAL '2 1992-04-15
width, date[,origin]) interval bucket_width. weeks', DATE '1992-04-20',
Buckets are aligned relative to DATE '1992-04-01')
origin date. origin
defaults to 2000‑01‑03 for
buckets that don't include a
month or year interval, and to
2000‑01‑01 for month and year
buckets.
today() Current date (start of current today() 2022-10-08
transaction)

There are also dedicated extraction functions to get the subfields. A few examples include extracting the day from a date, or the day of the
week from a date.

Functions applied to infinite dates will either return the same infinite dates (e.g, greatest) or NULL (e.g., date_part) depending on
what ”makes sense”. In general, if the function needs to examine the parts of the infinite date, the result will be NULL.

Date Part Functions

The date_part and date_diff and date_trunc functions can be used to manipulate the fields of temporal types. The fields are
specified as strings that contain the part name of the field.

Part Specifiers

Below is a full list of all available date part specifiers. The examples are the corresponding parts of the timestamp 2021-08-03
11:59:44.123456.

Usable as Date Part Specifiers and in Intervals

Specifier Description Synonyms Example

'century' Gregorian century 'cent', 'centuries', 'c' 21


'day' Gregorian day 'days', 'd', 'dayofmonth' 3
'decade' Gregorian decade 'dec', 'decades', 'decs' 202
'hour' Hours 'hr', 'hours', 'hrs', 'h' 11
'microseconds' Sub‑minute microseconds 'microsecond', 'us', 'usec', 44123456
'usecs', 'usecond', 'useconds'
'millennium' Gregorian millennium 'mil', 'millenniums', 3
'millenia', 'mils', 'millenium'

455
DuckDB Documentation

Specifier Description Synonyms Example

'milliseconds' Sub‑minute milliseconds 'millisecond', 'ms', 'msec', 44123


'msecs', 'msecond', 'mseconds'
'minute' Minutes 'min', 'minutes', 'mins', 'm' 59
'month' Gregorian month 'mon', 'months', 'mons' 8
'quarter' Quarter of the year (1‑4) 'quarters' 3
'second' Seconds 'sec', 'seconds', 'secs', 's' 44
'year' Gregorian year 'yr', 'y', 'years', 'yrs' 2021

Usable in Date Part Specifiers Only

Specifier Description Synonyms Example

'dayofweek' Day of the week (Sunday = 0, 'weekday', 'dow' 2


Saturday = 6)
'dayofyear' Day of the year (1‑365/366) 'doy' 215
'epoch' Seconds since 1970‑01‑01 1627991984
'era' Gregorian era (CE/AD, BCE/BC) 1
'isodow' ISO day of the week (Monday = 2
1, Sunday = 7)
'isoyear' ISO Year number (Starts on 2021
Monday of week containing Jan
4th)
'timezone_hour' Time zone offset hour portion 0
'timezone_minute' Time zone offset minute 0
portion
'timezone' Time zone offset in seconds 0
'week' Week number 'weeks', 'w' 31
'yearweek' ISO year and week number in 202131
YYYYWW format

Note that the time zone parts are all zero unless a time zone plugin such as ICU has been installed to support TIMESTAMP WITH TIME
ZONE.

Part Functions There are dedicated extraction functions to get certain subfields:

Function Description Example Result

century( date) Century century(date '1992-02-15') 20


day( date) Day day(date '1992-02-15') 15
dayofmonth( date) Day (synonym) dayofmonth(date '1992-02-15') 15
dayofweek( date) Numeric weekday (Sunday = 0, dayofweek(date '1992-02-15') 6
Saturday = 6)

456
DuckDB Documentation

Function Description Example Result

dayofyear( date) Day of the year (starts from 1, dayofyear(date '1992-02-15') 46


i.e., January 1 = 1)
decade( date) Decade (year / 10) decade(date '1992-02-15') 199
epoch( date) Seconds since 1970‑01‑01 epoch(date '1992-02-15') 698112000
era( date) Calendar era era(date '0044-03-15 (BC)') 0
hour( date) Hours hour(timestamp '2021-08-03 11
11:59:44.123456')
isodow( date) Numeric ISO weekday (Monday isodow(date '1992-02-15') 6
= 1, Sunday = 7)
isoyear( date) ISO Year number (Starts on isoyear(date '2022-01-01') 2021
Monday of week containing Jan
4th)
microsecond( date) Sub‑minute microseconds microsecond(timestamp 44123456
'2021-08-03 11:59:44.123456')
millennium( date) Millennium millennium(date '1992-02-15') 2
millisecond( date) Sub‑minute milliseconds millisecond(timestamp 44123
'2021-08-03 11:59:44.123456')
minute( date) Minutes minute(timestamp '2021-08-03 59
11:59:44.123456')
month( date) Month month(date '1992-02-15') 2
quarter( date) Quarter quarter(date '1992-02-15') 1
second( date) Seconds second(timestamp '2021-08-03 44
11:59:44.123456')
timezone_hour( date) Time zone offset hour portion timezone_hour(date 0
'1992-02-15')
timezone_minute( date) Time zone offset minutes timezone_minute(date 0
portion '1992-02-15')
timezone( date) Time Zone offset in minutes timezone(date '1992-02-15') 0
week( date) ISO Week week(date '1992-02-15') 7
weekday( date) Numeric weekday synonym weekday(date '1992-02-15') 6
(Sunday = 0, Saturday = 6)
weekofyear( date) ISO Week (synonym) weekofyear(date '1992-02-15') 7
year( date) Year year(date '1992-02-15') 1992
yearweek( date) BIGINT of combined ISO Year yearweek(date '1992-02-15') 199207
number and 2‑digit version of
ISO Week number

Enum Functions

This section describes functions and operators for examining and manipulating ENUM values. The examples assume an enum type created
as:

CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy', 'anxious');

457
DuckDB Documentation

These functions can take NULL or a specific value of the type as argument(s). With the exception of enum_range_boundary, the result
depends only on the type of the argument and not on its value.

Function Description Example Result

enum_code( enum_value) Returns the numeric value enum_code('happy'::mood) 2


backing the given enum value
enum_first( enum) Returns the first value of the enum_first(null::mood) sad
input enum type.
enum_last( enum) Returns the last value of the enum_last(null::mood) anxious
input enum type.
enum_range( enum) Returns all values of the input enum_range(null::mood) [sad, ok,
enum type as an array. happy,
anxious]
enum_range_boundary( Returns the range between the enum_range_boundary(NULL, [sad, ok,
enum, enum) two given enum values as an 'happy'::mood) happy]
array. The values must be of the
same enum type. When the first
parameter is NULL, the result
starts with the first value of the
enum type. When the second
parameter is NULL, the result
ends with the last value of the
enum type.

Interval Functions

This section describes functions and operators for examining and manipulating INTERVAL values.

Interval Operators

The table below shows the available mathematical operators for INTERVAL types.

Operator Description Example Result

+ addition of an INTERVAL INTERVAL 1 HOUR + INTERVAL 5 HOUR INTERVAL 6 HOUR


+ addition to a DATE DATE '1992-03-22' + INTERVAL 5 DAY 1992-03-27
+ addition to a TIMESTAMP TIMESTAMP '1992-03-22 01:02:03' + 1992-03-27 01:02:03
INTERVAL 5 DAY
+ addition to a TIME TIME '01:02:03' + INTERVAL 5 HOUR 06:02:03
- subtraction of an INTERVAL INTERVAL 5 HOUR - INTERVAL 1 HOUR INTERVAL 4 HOUR
- subtraction from a DATE DATE '1992-03-27' - INTERVAL 5 DAY 1992-03-22
- subtraction from a TIMESTAMP '1992-03-27 01:02:03' - 1992-03-22 01:02:03
TIMESTAMP INTERVAL 5 DAY
- subtraction from a TIME TIME '06:02:03' - INTERVAL 5 HOUR 01:02:03

458
DuckDB Documentation

Interval Functions

The table below shows the available scalar functions for INTERVAL types.

Function Description Example Result

date_part( part, Get subfield (equivalent to date_part('year', INTERVAL 1


interval) extract) '14 months')
datepart( part, Alias of date_part. Get datepart('year', INTERVAL 1
interval) subfield (equivalent to '14 months')
extract)
extract( part from Get subfield from an interval extract('month' FROM 2
interval) INTERVAL '14 months')
to_centuries( Construct a century interval to_centuries(5) INTERVAL 500 YEAR
integer)
to_days( integer) Construct a day interval to_days(5) INTERVAL 5 DAY
to_decades( integer) Construct a decade interval to_decades(5) INTERVAL 50 YEAR
to_hours( integer) Construct a hour interval to_hours(5) INTERVAL 5 HOUR
to_microseconds( Construct a microsecond to_microseconds(5) INTERVAL 5
integer) interval MICROSECOND
to_millennia( Construct a millenium to_millennia(5) INTERVAL 5000 YEAR
integer) interval
to_milliseconds( Construct a millisecond to_milliseconds(5) INTERVAL 5
integer) interval MILLISECOND
to_minutes( integer) Construct a minute interval to_minutes(5) INTERVAL 5 MINUTE
to_months( integer) Construct a month interval to_months(5) INTERVAL 5 MONTH
to_seconds( integer) Construct a second interval to_seconds(5) INTERVAL 5 SECOND
to_weeks( integer) Construct a week interval to_weeks(5) INTERVAL 35 DAY
to_years( integer) Construct a year interval to_years(5) INTERVAL 5 YEAR

Only the documented date parts are defined for intervals.

Lambda Functions

Lambda functions enable the use of more complex and flexible expressions in queries. DuckDB supports several scalar functions that
accept lambda functions as parameters in the form (parameter1, parameter2, ...) -> expression. If the lambda function
has only one parameter, then the parentheses can be omitted. The parameters can have any names. For example, the following are all
valid lambda functions:

• param -> param > 1


• s -> contains(concat(s, 'DB'), 'duck')
• (x, y) -> x + y

Scalar Functions That Accept Lambda Functions

459
DuckDB Documentation

Function Aliases Description Example Result

list_transform( array_transform, Returns a list that is the result of list_ [5, 6,


list, lambda) apply, list_apply, applying the lambda function to transform([4, 5, 7]
array_apply each element of the input list. 6], x -> x + 1)
list_filter( array_filter, Constructs a list from those elements list_filter([4, [5, 6]
list, lambda) filter of the input list for which the lambda 5, 6], x -> x >
function returns true. 4)
list_reduce( array_reduce, Reduces all elements of the input list list_reduce([4, 15
list, lambda) reduce into a single value by executing the 5, 6], (x, y) ->
lambda function on a running result x + y)
and the next list element.

Nesting All scalar functions can be arbitrarily nested.

Nested lambda functions to get all squares of even list elements:

SELECT list_transform(
list_filter([0, 1, 2, 3, 4, 5], x -> x % 2 = 0),
y -> y * y
);

[0, 4, 16]

Nested lambda function to add each element of the first list to the sum of the second list:

SELECT list_transform(
[1, 2, 3],
x -> list_reduce([4, 5, 6], (a, b) -> a + b) + x
);

[16, 17, 18]

Scoping Lambda functions confirm to scoping rules in the following order:

• inner lambda parameters


• outer lambda parameters
• column names
• macro parameters

CREATE TABLE tbl (x INT);


INSERT INTO tbl VALUES (10);
SELECT apply([1, 2], x -> apply([4], x -> x + tbl.x)[1] + x) FROM tbl;

[15, 16]

Indexes as Parameters All lambda functions accept an optional extra parameter that represents the index of the current element. This
is always the last parameter of the lambda function, and is 1‑based (i.e., the first element has index 1).

Get all elements that are larger than their index:

SELECT list_filter([1, 3, 1, 5], (x, i) -> x > i);

[3, 5]

460
DuckDB Documentation

Transform

Signature: list_transform(list, lambda)

Description:
list_transform returns a list that is the result of applying the lambda function to each element of the input list.

Aliases:

• array_transform
• apply
• list_apply
• array_apply

Number of parameters excluding indexes: 1

Return type: Defined by the return type of the lambda function

Examples:
Incrementing each list element by one:

SELECT list_transform([1, 2, NULL, 3], x -> x + 1);

[2, 3, NULL, 4]

Transforming strings:

SELECT list_transform(['duck', 'a', 'b'], s -> concat(s, 'DB'));

[duckDB, aDB, bDB]

Combining lambda functions with other functions:

SELECT list_transform([5, NULL, 6], x -> coalesce(x, 0) + 1);

[6, 1, 7]

Filter

Signature: list_filter(list, lambda)

Description:
Constructs a list from those elements of the input list for which the lambda function returns true. DuckDB must be able to cast the lambda
function's return type to BOOL.

Aliases:

• array_filter
• filter

Number of parameters excluding indexes: 1

Return type: The same type as the input list

Examples:
Filter out negative values:

SELECT list_filter([5, -6, NULL, 7], x -> x > 0);

[5, 7]

Divisible by 2 and 5:

SELECT list_filter(list_filter([2, 4, 3, 1, 20, 10, 3, 30], x -> x % 2 == 0), y -> y % 5 == 0);

461
DuckDB Documentation

[20, 10, 30]

In combination with range(...) to construct lists:

SELECT list_filter([1, 2, 3, 4], x -> x > #1) FROM range(4);

[1, 2, 3, 4]
[2, 3, 4]
[3, 4]
[4]
[]

Reduce

Signature: list_reduce(list, lambda)

Description:
The scalar function returns a single value that is the result of applying the lambda function to each element of the input list. Starting with
the first element and then repeatedly applying the lambda function to the result of the previous application and the next element of the
list. The list must have at least one element.

Aliases:

• array_reduce
• reduce

Number of parameters excluding indexes: 2

Return type: The type of the input list's elements

Examples:
Sum of all list elements:

SELECT list_reduce([1, 2, 3, 4], (x, y) -> x + y);

10

Only add up list elements if they are greater than 2:

SELECT list_reduce(list_filter([1, 2, 3, 4], x -> x > 2), (x, y) -> x + y);

Concat all list elements:

SELECT list_reduce(['DuckDB', 'is', 'awesome'], (x, y) -> concat(x, ' ', y));

DuckDB is awesome

Nested Functions

This section describes functions and operators for examining and manipulating nested values. There are five nested data types: ARRAY,
LIST, MAP, STRUCT, and UNION.

List Functions

In the descriptions, l is the three element list [4, 5, 6].

462
DuckDB Documentation

Function Aliases Description Example Result

list[index] Bracket notation serves as an alias l[3] 6


for list_extract.
list[begin:end] Bracket notation with colon is an l[2:3] [5, 6]
alias for list_slice.
list[begin:end:step] list_slice in bracket notation l[:-:2] [4, 6]
with an added step feature.
array_pop_back( Returns the list without the last array_pop_back(l) [4, 5]
list) element.
array_pop_front( Returns the list without the first array_pop_ [5, 6]
list) element. front(l)
flatten( list_of_ Concatenate a list of lists into a single flatten([[1, 2], [1, 2,
lists) list. This only flattens one level of the [3, 4]]) 3, 4]
list (see examples).
len( list) array_length Return the length of the list. len([1, 2, 3]) 3
list_aggregate( list_aggr, Executes the aggregate function list_ 1
list, name) aggregate, array_ name on the elements of list. See aggregate([1, 2,
aggregate, array_ the List Aggregates section for more NULL], 'min')
aggr details.
list_any_value( Returns the first non‑null value in the list_any_ -3
list) list value([NULL, -3])
list_append( array_append, Appends element to list. list_append([2, [2, 3,
list, element) array_push_back 3], 4) 4]
list_concat( list_cat, array_ Concatenates two lists. list_concat([2, [2, 3,
list1, list2) concat, array_cat 3], [4, 5, 6]) 4, 5, 6]
list_contains( list_has, array_ Returns true if the list contains the list_contains([1, true
list, element) contains, array_ element. 2, NULL], 1)
has
list_cosine_ Compute the cosine similarity list_cosine_ 0.975900072948533
similarity( between two lists similarity([1, 2,
list1, list2) 3], [1, 2, 5])
list_distance( Calculates the Euclidean distance list_distance([1, 2.0
list1, list2) between two points with coordinates 2, 3], [1, 2, 5])
given in two inputs lists of equal
length.
list_distinct( array_distinct Removes all duplicates and NULLs list_distinct([1, [1, 5,
list) from a list. Does not preserve the 1, NULL, -3, 1, -3]
original order. 5])
list_dot_product( list_inner_ Computes the dot product of two list_dot_ 20.0
list1, list2) product same‑sized lists of numbers. product([1, 2,
3], [1, 2, 5])
list_extract( list_element, Extract the indexth (1‑based) value list_extract(l, 6
list, index) array_extract from the list. 3)

463
DuckDB Documentation

Function Aliases Description Example Result

list_filter( array_filter, Constructs a list from those elements list_filter(l, x [5, 6]


list, lambda) filter of the input list for which the lambda -> x > 4)
function returns true. See the
Lambda Functions page for more
details.
list_grade_up( array_grade_up Works like sort, but the results are list_grade_ [2, 4,
list) the indexes that correspond to the up([30, 10, 40, 1, 3]
position in the original list instead 20])
of the actual values.
list_has_all( array_has_all Returns true if all elements of sub‑list list_has_all(l, true
list, sub-list) exist in list. [4, 6])
list_has_any( array_has_any Returns true if any elements exist is list_has_any([1, true
list1, list2) both lists. 2, 3], [2, 3, 4])
list_intersect( array_intersect Returns a list of all the elements that list_ [2, 3]
list1, list2) exist in both l1 and l2, without intersect([1, 2,
duplicates. 3], [2, 3, 4])
list_position( list_indexof, Returns the index of the element if list_contains([1, 2
list, element) array_position, the list contains the element. 2, NULL], 2)
array_indexof
list_prepend( array_prepend, Prepends element to list. list_prepend(3, [3, 4,
element, list) array_push_front [4, 5, 6]) 5, 6]
list_reduce( array_reduce, Returns a single value that is the list_reduce(l, 15
list, lambda) reduce result of applying the lambda (x, y) -> x + y)
function to each element of the input
list. See the Lambda Functions page
for more details.
list_resize( array_resize Resizes the list to contain size list_resize([1, [1, 2,
list, elements. Initializes new elements 2, 3], 5, 0) 3, 0, 0]
size[,value]) with value or NULL if value is not
set.
list_reverse_ array_reverse_ Sorts the elements of the list in list_reverse_ [6, 3,
sort( list) sort reverse order. See the Sorting Lists sort([3, 6, 1, 2, 1]
section for more details about the 2])
null sorting order.
list_reverse( array_reverse Reverses the list. list_reverse(l) [6, 5,
list) 4]
list_select( array_select Returns a list based on the elements list_select([10, [10, 40]
value_list, selected by the index_list. 20, 30, 40], [1,
index_list) 4])
list_slice( list, array_slice list_slice with added step list_slice(l, 1, [4, 6]
begin, end, step) feature. 3, 2)
list_slice( list, array_slice Extract a sublist using slice list_slice(l, 2, [5, 6]
begin, end) conventions. Negative values are 3)
accepted. See slicing.

464
DuckDB Documentation

Function Aliases Description Example Result

list_sort( list) array_sort Sorts the elements of the list. See the list_sort([3, 6, [1, 2,
Sorting Lists section for more details 1, 2]) 3, 6]
about the sorting order and the null
sorting order.
list_transform( array_transform, Returns a list that is the result of list_transform(l, [5, 6,
list, lambda) apply, list_apply, applying the lambda function to x -> x + 1) 7]
array_apply each element of the input list. See
the Lambda Functions page for more
details.
list_unique( array_unique Counts the unique elements of a list. list_unique([1, 3
list) 1, NULL, -3, 1,
5])
list_value( any, list_pack Create a LIST containing the list_value(4, 5, [4, 5,
...) argument values. 6) 6]
list_where( array_where Returns a list with the BOOLEANs in list_where([10, [10, 40]
value_list, mask_ mask_list applied as a mask to 20, 30, 40],
list) the value_list. [true, false,
false, true])
list_zip( list1, array_zip Zips k LISTs to a new LIST whose list_zip([1, 2], [{'list_
list2, ...) length will be that of the longest list. [3, 4], [5, 6]) 1': 1,
Its elements are structs of k elements 'list_
list_1, ..., list_k. Elements 2': 3,
missing will be replaced with NULL. 'list_
3': 5},
{'list_
1': 2,
'list_
2': 4,
'list_
3': 6}]
unnest( list) Unnests a list by one level. Note that unnest([1, 2, 3]) 1, 2, 3
this is a special function that alters
the cardinality of the result. See the
unnest page for more details.

List Operators

The following operators are supported for lists:

Operator Description Example Result

&& Alias for list_intersect [1, 2, 3, 4, 5] && [2, 5, 5, 6] [2, 5]


@> Alias for list_has_all, where [1, 2, 3, 4] @> [3, 4, 3] true
the list on the right of the operator
is the sublist.
<@ Alias for list_has_all, where [1, 4] <@ [1, 2, 3, 4] true
the list on the left of the operator is
the sublist.

465
DuckDB Documentation

Operator Description Example Result

|| Alias for list_concat [1, 2, 3] || [4, 5, 6] [1, 2, 3, 4,


5, 6]
<=> Alias for list_cosine_ [1, 2, 3] <=> [1, 2, 5] 0.9759000729485332
similarity
<-> Alias for list_distance [1, 2, 3] <-> [1, 2, 5] 2.0

List Comprehension

Python‑style list comprehension can be used to compute expressions over elements in a list. For example:

SELECT [lower(x) FOR x IN strings]


FROM (VALUES (['Hello', '', 'World'])) t(strings);
-- [hello, , world]

SELECT [upper(x) FOR x IN strings IF len(x) > 0]


FROM (VALUES (['Hello', '', 'World'])) t(strings);
-- [HELLO, WORLD]

Struct Functions

Function Description Example Result

struct.entry Dot notation that serves as an alias ({'i': 3, 's': 3


for struct_extract from named 'string'}).i
STRUCTs.
struct[entry] Bracket notation that serves as an ({'i': 3, 's': 3
alias for struct_extract from 'string'})['i']
named STRUCTs.
struct[idx] Bracket notation that serves as an (row(42, 84))[1] 42
alias for struct_extract from
unnamed STRUCTs (tuples), using
an index (1‑based).
row( any, ...) Create an unnamed STRUCT (tuple) row(i, i % 4, i / 4) (10, 2, 2.5)
containing the argument values.
struct_extract( Extract the named entry from the struct_extract({'i': 3, 3
struct, 'entry') STRUCT. 'v2': 3, 'v3': 0}, 'i')
struct_extract( Extract the entry from an unnamed struct_extract(row(42, 42
struct, idx) STRUCT (tuple) using an index 84), 1)
(1‑based).
struct_insert( Add field(s)/value(s) to an existing struct_insert({'a': 1}, b {'a': 1, 'b': 2}
struct, name := STRUCT with the argument values. := 2)
any, ...) The entry name(s) will be the bound
variable name(s).
struct_pack( name Create a STRUCT containing the struct_pack(i := 4, s := {'i': 4, 's':
:= any, ...) argument values. The entry name 'string') string}
will be the bound variable name.

466
DuckDB Documentation

Map Functions

Function Description Example Result

cardinality( map) Return the size of the map (or the cardinality(map([4, 2], 2
number of entries in the map). ['a', 'b']))
element_at( map, key) Return a list containing the value for a element_at(map([100, 5], [42]
given key or an empty list if the key is [42, 43]), 100)
not contained in the map. The type of
the key provided in the second
parameter must match the type of the
map's keys else an error is returned.
map_entries( map) Return a list of struct(k, v) for each map_entries(map([100, 5], [{'key': 100,
key‑value pair in the map. [42, 43])) 'value': 42},
{'key': 5,
'value': 43}]
map_extract( map, Alias of element_at. Return a list map_extract(map([100, 5], [42]
key) containing the value for a given key or [42, 43]), 100)
an empty list if the key is not contained
in the map. The type of the key
provided in the second parameter
must match the type of the map's keys
else an error is returned.
map_from_entries( Returns a map created from the entries map_from_entries([{k: 5, v: {5=val1,
STRUCT(k, v)[]) of the array 'val1'}, {k: 3, v: 3=val2}
'val2'}])
map_keys( map) Return a list of all keys in the map. map_keys(map([100, 5], [100, 5]
[42,43]))
map_values( map) Return a list of all values in the map. map_values(map([100, 5], [42, 43]
[42, 43]))
map() Returns an empty map. map() {}
map[entry] Alias for element_at map([100, 5], ['a', [a]
'b'])[100]

Union Functions

Function Description Example Result

union.tag Dot notation serves as an alias for (union_value(k := string


union_extract. 'hello')).k
union_extract( union, Extract the value with the named tags union_extract(s, 'k') hello
'tag') from the union. NULL if the tag is not
currently selected
union_value( tag := Create a single member UNION union_value(k := 'hello') 'hello'::UNION(k
any) containing the argument value. The VARCHAR)
tag of the value will be the bound
variable name.

467
DuckDB Documentation

Function Description Example Result

union_tag( union) Retrieve the currently selected tag of union_tag(union_value(k := 'k'


the union as an Enum. 'foo'))

Range Functions

The functions range and generate_series create a list of values in the range between start and stop. The start parameter is
inclusive. For the range function, the stop parameter is exclusive, while for generate_series, it is inclusive.

Based on the number of arguments, the following variants exist:

• range( start, stop, step)


• range( start, stop)
• range( stop)
• generate_series( start, stop, step)
• generate_series( start, stop)
• generate_series( stop)

The default value of start is 0 and the default value of step is 1.

SELECT range(5);
-- [0, 1, 2, 3, 4]

SELECT range(2, 5);


-- [2, 3, 4]

SELECT range(2, 5, 3);


-- [2]

SELECT generate_series(5);
-- [0, 1, 2, 3, 4, 5]

SELECT generate_series(2, 5);


-- [2, 3, 4, 5]

SELECT generate_series(2, 5, 3);


-- [2, 5]

Date ranges are also supported:

SELECT *
FROM range(DATE '1992-01-01', DATE '1992-03-01', INTERVAL '1' MONTH);

┌─────────────────────┐
│ range │
├─────────────────────┤
│ 1992-01-01 00:00:00 │
│ 1992-02-01 00:00:00 │
└─────────────────────┘

Slicing

The function list_slice can be used to extract a sublist from a list. The following variants exist:

• list_slice( list, begin, end)


• list_slice( list, begin, end)
• array_slice( list, begin, end, step)
• array_slice( list, begin, end, step)

468
DuckDB Documentation

• list[begin:end]
• list[begin:end:step]

list

• Is the list to be sliced

begin

• Is the index of the first element to be included in the slice


• When begin < 0 the index is counted from the end of the list
• When begin < 0 and -begin > length, begin is clamped to the beginning of the list
• When begin > length, the result is an empty list
• Bracket Notation: When begin is omitted, it defaults to the beginning of the list

end

• Is the index of the last element to be included in the slice


• When end < 0 the index is counted from the end of the list
• When end > length, end is clamped to length
• When end < begin, the result is an empty list
• Bracket Notation: When end is omitted, it defaults to the end of the list. When end is omitted and a step is provided, end must
be replaced with a -

step (optional)

• Is the step size between elements in the slice


• When step < 0 the slice is reversed, and begin and end are swapped
• Must be non‑zero

SELECT list_slice([1, 2, 3, 4, 5], 2, 4);


-- [2, 3, 4]

SELECT ([1, 2, 3, 4, 5])[2:4:2];


-- [2, 4]

SELECT([1, 2, 3, 4, 5])[4:2:-2];
-- [4, 2]

SELECT ([1, 2, 3, 4, 5])[:];


-- [1, 2, 3, 4, 5]

SELECT ([1, 2, 3, 4, 5])[:-:2];


-- [1, 3, 5]

SELECT ([1, 2, 3, 4, 5])[:-:-2];


-- [5, 3, 1]

List Aggregates

The function list_aggregate allows the execution of arbitrary existing aggregate functions on the elements of a list. Its first argument
is the list (column), its second argument is the aggregate function name, e.g., min, histogram or sum.

list_aggregate accepts additional arguments after the aggregate function name. These extra arguments are passed directly to the
aggregate function, which serves as the second argument of list_aggregate.

SELECT list_aggregate([1, 2, -4, NULL], 'min');


-- -4

SELECT list_aggregate([2, 4, 8, 42], 'sum');


-- 56

469
DuckDB Documentation

SELECT list_aggregate([[1, 2], [NULL], [2, 10, 3]], 'last');


-- [2, 10, 3]

SELECT list_aggregate([2, 4, 8, 42], 'string_agg', '|');


-- 2|4|8|42

The following is a list of existing rewrites. Rewrites simplify the use of the list aggregate function by only taking the list (column) as their ar‑
gument. list_avg, list_var_samp, list_var_pop, list_stddev_pop, list_stddev_samp, list_sem, list_approx_
count_distinct, list_bit_xor, list_bit_or, list_bit_and, list_bool_and, list_bool_or, list_count, list_
entropy, list_last, list_first, list_kurtosis, list_kurtosis_pop, list_min, list_max, list_product, list_
skewness, list_sum, list_string_agg, list_mode, list_median, list_mad and list_histogram.

SELECT list_min([1, 2, -4, NULL]);


-- -4

SELECT list_sum([2, 4, 8, 42]);


-- 56

SELECT list_last([[1, 2], [NULL], [2, 10, 3]]);


-- [2, 10, 3]

array_to_string Concatenates list/array elements using an optional delimiter.

SELECT array_to_string([1, 2, 3], '-') AS str;


-- 1-2-3

-- this is equivalent to the following SQL


SELECT list_aggr([1, 2, 3], 'string_agg', '-') AS str;
-- 1-2-3

Sorting Lists

The function list_sort sorts the elements of a list either in ascending or descending order. In addition, it allows to provide whether
NULL values should be moved to the beginning or to the end of the list.

By default if no modifiers are provided, DuckDB sorts ASC NULLS FIRST, i.e., the values are sorted in ascending order and NULL values
are placed first. This is identical to the default sort order of SQLite. The default sort order can be changed using PRAGMA statements.

list_sort leaves it open to the user whether they want to use the default sort order or a custom order. list_sort takes up to two ad‑
ditional optional parameters. The second parameter provides the sort order and can be either ASC or DESC. The third parameter provides
the NULL sort order and can be either NULLS FIRST or NULLS LAST.

-- default sort order and default NULL sort order


SELECT list_sort([1, 3, NULL, 5, NULL, -5]);
----
[NULL, NULL, -5, 1, 3, 5]

-- only providing the sort order


SELECT list_sort([1, 3, NULL, 2], 'ASC');
----
[NULL, 1, 2, 3]

-- providing the sort order and the NULL sort order


SELECT list_sort([1, 3, NULL, 2], 'DESC', 'NULLS FIRST');
----
[NULL, 3, 2, 1]

list_reverse_sort has an optional second parameter providing the NULL sort order. It can be either NULLS FIRST or NULLS
LAST.

470
DuckDB Documentation

-- default NULL sort order


SELECT list_sort([1, 3, NULL, 5, NULL, -5]);
----
[NULL, NULL, -5, 1, 3, 5]

-- providing the NULL sort order


SELECT list_reverse_sort([1, 3, NULL, 2], 'NULLS LAST');
----
[3, 2, 1, NULL]

Lambda Functions

DuckDB supports lambda functions in the form (parameter1, parameter2, ...) -> expression. For details, see the lambda
functions page.

Flatten

The flatten function is a scalar function that converts a list of lists into a single list by concatenating each sub‑list together. Note that this
only flattens one level at a time, not all levels of sub‑lists.

-- Convert a list of lists into a single list


SELECT
flatten([
[1, 2],
[3, 4]
]);
----
[1, 2, 3, 4]

-- If the list has multiple levels of lists,


-- only the first level of sub-lists is concatenated into a single list
SELECT
flatten([
[
[1, 2],
[3, 4],
],
[
[5, 6],
[7, 8],
]
]);
----
[[1, 2], [3, 4], [5, 6], [7, 8]]

In general, the input to the flatten function should be a list of lists (not a single level list). However, the behavior of the flatten function has
specific behavior when handling empty lists and NULL values.

-- If the input list is empty, return an empty list


SELECT flatten([]);
----
[]

-- If the entire input to flatten is NULL, return NULL


SELECT flatten(NULL);
----
NULL

471
DuckDB Documentation

-- If a list whose only entry is NULL is flattened, return an empty list


SELECT flatten([NULL]);
----
[]

-- If the sub-list in a list of lists only contains NULL,


-- do not modify the sub-list
-- (Note the extra set of parentheses vs. the prior example)
SELECT flatten([[NULL]]);
----
[NULL]

-- Even if the only contents of each sub-list is NULL,


-- still concatenate them together
-- Note that no de-duplication occurs when flattening.
-- See list_distinct function for de-duplication.
SELECT flatten([[NULL],[NULL]]);
----
[NULL, NULL]

generate_subscripts

The generate_subscripts( arr, dim) function generates indexes along the dimth dimension of array arr.

SELECT generate_subscripts([4, 5, 6], 1) AS i;

┌───┐
│ i │
├───┤
│ 1 │
│ 2 │
│ 3 │
└───┘

Related Functions

There are also aggregate functions list and histogram that produces lists and lists of structs. The unnest function is used to unnest
a list by one level.

Numeric Functions

Numeric Operators

The table below shows the available mathematical operators for numeric types.

Operator Description Example Result

+ addition 2 + 3 5
- subtraction 2 - 3 -1
* multiplication 2 * 3 6
/ float division 5 / 2 2.5
// division 5 // 2 2
% modulo (remainder) 5 % 4 1

472
DuckDB Documentation

Operator Description Example Result

** exponent 3 ** 4 81
^ exponent (alias for **) 3 ^ 4 81
& bitwise AND 91 & 15 11
| bitwise OR 32 | 3 35
<< bitwise shift left 1 << 4 16
>> bitwise shift right 8 >> 2 2
~ bitwise negation ~15 -16
! factorial of x 4! 24

Division and Modulo Operators There are two division operators: / and //. They are equivalent when at least one of the operands is
a FLOAT or a DOUBLE. When both operands are integers, / performs floating points division (5 / 2 = 2.5) while // performs integer
division (5 // 2 = 2).

Supported Types The modulo, bitwise, and negation and factorial operators work only on integral data types, whereas the others are
available for all numeric data types.

Numeric Functions

The table below shows the available mathematical functions.

Function Description Example Result

@ Absolute value (parentheses @(-2) 2


optional if operating on a column)
abs(x) Absolute value abs(-17.4) 17.4
acos(x) Computes the arccosine of x acos(0.5) 1.0471975511965976
add(x, y) Alias for x + y add(2, 3) 5
asin(x) Computes the arcsine of x asin(0.5) 0.5235987755982989
atan(x) Computes the arctangent of x atan(0.5) 0.4636476090008061
atan2(y, x) Computes the arctangent (y, x) atan2(0.5, 0.5) 0.7853981633974483
bit_count(x) Returns the number of bits that bit_count(31) 5
are set
cbrt(x) Returns the cube root of the cbrt(8) 2
number
ceil(x) Rounds the number up ceil(17.4) 18
ceiling(x) Rounds the number up. Alias of ceiling(17.4) 18
ceil
cos(x) Computes the cosine of x cos(90) -
0.4480736161291701
cot(x) Computes the cotangent of x cot(0.5) 1.830487721712452
degrees(x) Converts radians to degrees degrees(pi()) 180
divide(x, y) Alias for x // y divide(5, 2) 2

473
DuckDB Documentation

Function Description Example Result

even(x) Round to next even number by even(2.9) 4


rounding away from zero
exp(x) Computes e ** x exp(0.693) 2
factorial(x) See ! operator. Computes the factorial(4) 24
product of the current integer and
all integers below it
fdiv(x, y) Performs integer division (x // fdiv(5, 2) 2.0
y) but returns a DOUBLE value
floor(x) Rounds the number down floor(17.4) 17
fmod(x, y) Calculates the modulo value. fmod(5, 2) 1.0
Always returns a DOUBLE value
gamma(x) Interpolation of (x‑1) factorial (so gamma(5.5) 52.34277778455352
decimal inputs are allowed)
gcd(x, y) Computes the greatest common gcd(42, 57) 3
divisor of x and y
greatest_common_ Computes the greatest common greatest_common_ 3
divisor(x, y) divisor of x and y divisor(42, 57)
greatest(x1, x2, ...) Selects the largest value greatest(3, 2, 4, 4) 4
isfinite(x) Returns true if the floating point isfinite(5.5) true
value is finite, false otherwise
isinf(x) Returns true if the floating point isinf('Infinity'::float) true
value is infinite, false otherwise
isnan(x) Returns true if the floating point isnan('NaN'::float) true
value is not a number, false
otherwise
lcm(x, y) Computes the least common lcm(42, 57) 798
multiple of x and y
least_common_ Computes the least common least_common_ 798
multiple(x, y) multiple of x and y multiple(42, 57)
least(x1, x2, ...) Selects the smallest value least(3, 2, 4, 4) 2
lgamma(x) Computes the log of the gamma lgamma(2) 0
function
ln(x) Computes the natural logarithm ln(2) 0.693
of x
log(x) Computes the 10‑log of x log(100) 2
log10(x) Alias of log. computes the 10‑log log10(1000) 3
of x
log2(x) Computes the 2‑log of x log2(8) 3
multiply(x, y) Alias for x * y multiply(2, 3) 6
nextafter(x, y) Return the next floating point nextafter(1::float, 1.0000001
value after x in the direction of y 2::float)
pi() Returns the value of pi pi() 3.141592653589793
pow(x, y) Computes x to the power of y pow(2, 3) 8

474
DuckDB Documentation

Function Description Example Result

power(x, y) Alias of pow. computes x to the power(2, 3) 8


power of y
radians(x) Converts degrees to radians radians(90) 1.5707963267948966
random() Returns a random number random() various
between 0 and 1
round_even(v NUMERIC, s Alias of roundbankers(v, s). round_even(24.5, 0) 24.0
INT) Round to s decimal places using
the rounding half to even rule.
Values s < 0 are allowed
round(v NUMERIC, s INT) Round to s decimal places. Values round(42.4332, 2) 42.43
s < 0 are allowed
setseed(x) Sets the seed to be used for the setseed(0.42)
random function
sign(x) Returns the sign of x as ‑1, 0 or 1 sign(-349) -1
signbit(x) Returns whether the signbit is set signbit(-0.0) true
or not
sin(x) Computes the sin of x sin(90) 0.8939966636005579
sqrt(x) Returns the square root of the sqrt(9) 3
number
subtract(x, y) Alias for x - y subtract(2, 3) -1
tan(x) Computes the tangent of x tan(90) -
1.995200412208242
trunc(x) Truncates the number trunc(17.4) 17
xor(x) Bitwise XOR xor(17, 5) 20

Pattern Matching

There are four separate approaches to pattern matching provided by DuckDB: the traditional SQL LIKE operator, the more recent SIMILAR
TO operator (added in SQL:1999), a GLOB operator, and POSIX‑style regular expressions.

LIKE

The LIKE expression returns true if the string matches the supplied pattern. (As expected, the NOT LIKE expression returns false if
LIKE returns true, and vice versa. An equivalent expression is NOT (string LIKE pattern).)

If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE acts like the
equals operator. An underscore (_) in pattern stands for (matches) any single character; a percent sign (%) matches any sequence of zero
or more characters.

LIKE pattern matching always covers the entire string. Therefore, if it's desired to match a sequence anywhere within a string, the pattern
must start and end with a percent sign.

Some examples:

SELECT 'abc' LIKE 'abc'; -- true


SELECT 'abc' LIKE 'a%' ; -- true
SELECT 'abc' LIKE '_b_'; -- true
SELECT 'abc' LIKE 'c'; -- false

475
DuckDB Documentation

SELECT 'abc' LIKE 'c%' ; -- false


SELECT 'abc' LIKE '%c'; -- true
SELECT 'abc' NOT LIKE '%c'; -- false

The keyword ILIKE can be used instead of LIKE to make the match case‑insensitive according to the active locale.

SELECT 'abc' ILIKE '%C'; -- true


SELECT 'abc' NOT ILIKE '%C'; -- false

To search within a string for a character that is a wildcard (% or _), the pattern must use an ESCAPE clause and an escape character to
indicate the wildcard should be treated as a literal character instead of a wildcard. See an example below.

Additionally, the function like_escape has the same functionality as a LIKE expression with an ESCAPE clause, but using function
syntax. See the Text Functions Docs for details.

-- Search for strings with 'a' then a literal percent sign then 'c'
SELECT 'a%c' LIKE 'a$%c' ESCAPE '$'; -- true
SELECT 'azc' LIKE 'a$%c' ESCAPE '$'; -- false

-- Case insensitive ILIKE with ESCAPE


SELECT 'A%c' ILIKE 'a$%c' ESCAPE '$'; -- true

There are also alternative characters that can be used as keywords in place of LIKE expressions. These enhance PostgreSQL compatibil‑
ity.

LIKE‑style PostgreSQL‑style

LIKE ~~
NOT LIKE !~~
ILIKE ~~*
NOT ILIKE !~~*

SIMILAR TO

The SIMILAR TO operator returns true or false depending on whether its pattern matches the given string. It is similar to LIKE, except
that it interprets the pattern using a regular expression. Like LIKE, the SIMILAR TO operator succeeds only if its pattern matches the
entire string; this is unlike common regular expression behavior where the pattern can match any part of the string.

A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a
regular expression if it is a member of the regular set described by the regular expression. As with LIKE, pattern characters match string
characters exactly unless they are special characters in the regular expression language — but regular expressions use different special
characters than LIKE does.

Some examples:

SELECT 'abc' SIMILAR TO 'abc'; -- true


SELECT 'abc' SIMILAR TO 'a'; -- false
SELECT 'abc' SIMILAR TO '.*(b|d).*'; -- true
SELECT 'abc' SIMILAR TO '(b|c).*'; -- false
SELECT 'abc' NOT SIMILAR TO 'abc'; -- false

There are also alternative characters that can be used as keywords in place of SIMILAR TO expressions. These follow POSIX syntax.

SIMILAR TO‑style POSIX‑style

SIMILAR TO ~
NOT SIMILAR TO !~

476
DuckDB Documentation

GLOB

The GLOB operator returns true or false if the string matches the GLOB pattern. The GLOB operator is most commonly used when
searching for filenames that follow a specific pattern (for example a specific file extension). Use the question mark (?) wildcard to match
any single character, and use the asterisk (*) to match zero or more characters. In addition, use bracket syntax ([ ]) to match any single
character contained within the brackets, or within the character range specified by the brackets. An exclamation mark (!) may be used
inside the first bracket to search for a character that is not contained within the brackets. To learn more, visit the Glob Programming
Wikipedia page.

Some examples:

SELECT 'best.txt' GLOB '*.txt'; -- true


SELECT 'best.txt' GLOB '????.txt'; -- true
SELECT 'best.txt' GLOB '?.txt'; -- false
SELECT 'best.txt' GLOB '[abc]est.txt'; -- true
SELECT 'best.txt' GLOB '[a-z]est.txt'; -- true

-- The bracket syntax is case sensitive


SELECT 'Best.txt' GLOB '[a-z]est.txt'; -- false
SELECT 'Best.txt' GLOB '[a-zA-Z]est.txt'; -- true

-- The ! applies to all characters within the brackets


SELECT 'Best.txt' GLOB '[!a-zA-Z]est.txt'; -- false

-- To negate a GLOB operator, negate the entire expression


-- (NOT GLOB is not valid syntax)
SELECT NOT 'best.txt' GLOB '*.txt'; -- false

Three tildes (~~~) may also be used in place of the GLOB keyword.

GLOB‑style Symbolic‑style

GLOB ~~~

Glob Function to Find Filenames The glob pattern matching syntax can also be used to search for filenames using the glob table func‑
tion. It accepts one parameter: the path to search (which may include glob patterns).

-- Search the current directory for all files


SELECT * FROM glob('*');

file

duckdb.exe
test.csv
test.json
test.parquet
test2.csv
test2.parquet
todos.json

Regular Expressions

DuckDB's regex support is documented on the Regular Expressions page.

477
DuckDB Documentation

Regular Expressions

DuckDB offers pattern matching operators (LIKE, SIMILAR TO, GLOB), as well as support for regular expressions via functions.

Regular Expression Syntax

DuckDB uses the RE2 library as its regular expression engine. For the regular expression syntax, see the RE2 docs.

Functions

Function Description Example Result

regexp_extract_all( Split the string along the regex regexp_extract_ [hello, world]
string, regex[,group= and extract all occurrences of all('hello_world',
0]) group '([a-z ]+)_?', 1)
regexp_extract( string, If string contains the regexp regexp_ {'y':'2023',
pattern, name_list); pattern, returns the capturing extract('2023-04-15', 'm':'04',
groups as a struct with '(\d+)-(\d+)-(\d+)', 'd':'15'}
corresponding names from ['y', 'm', 'd'])
name_list
regexp_extract( string, If string contains the regexp regexp_extract('hello_ hello
pattern[,idx]); pattern, returns the capturing world', '([a-z ]+)_?',
group specified by optional 1)
parameter idx
regexp_full_match( Returns true if the entire string regexp_full_ false
string, regex) matches the regex match('anabanana',
'(an)*')
regexp_matches( string, Returns true if string contains regexp_ true
pattern) the regexp pattern, false matches('anabanana',
otherwise '(an)*')
regexp_replace( string, If string contains the regexp regexp_replace('hello', he-lo
pattern, replacement); pattern, replaces the matching '[lo]', '-')
part with replacement
regexp_split_to_array( Alias of string_split_regex. regexp_split_to_ ['hello',
string, regex) Splits the string along the regex array('hello world; 'world', '42']
42', ';? ')
regexp_split_to_table( Splits the string along the regex regexp_split_to_ Two rows: 'hello',
string, regex) and returns a row for each part array('hello world; 'world'
42', ';? ')

The regexp_matches function is similar to the SIMILAR TO operator, however, it does not require the entire string to match. Instead,
regexp_matches returns true if the string merely contains the pattern (unless the special tokens ^ and $ are used to anchor the regular
expression to the start and end of the string). Below are some examples:

SELECT regexp_matches('abc', 'abc'); -- true


SELECT regexp_matches('abc', '^abc$'); -- true
SELECT regexp_matches('abc', 'a'); -- true
SELECT regexp_matches('abc', '^a$'); -- false
SELECT regexp_matches('abc', '.*(b|d).*'); -- true
SELECT regexp_matches('abc', '(b|c).*'); -- true

478
DuckDB Documentation

SELECT regexp_matches('abc', '^(b|c).*'); -- false


SELECT regexp_matches('abc', '(?i)A'); -- true

Options for Regular Expression Functions The regexp_matches and regexp_replace functions also support the following op‑
tions.

Option Description

'c' case‑sensitive matching


'i' case‑insensitive matching
'l' match literals instead of regular expression tokens
'm', 'n', 'p' newline sensitive matching
'g' global replace, only available for regexp_replace
's' non‑newline sensitive matching

SELECT regexp_matches('abcd', 'ABC', 'c'); -- false


SELECT regexp_matches('abcd', 'ABC', 'i'); -- true
SELECT regexp_matches('ab^/$cd', '^/$', 'l'); -- true
SELECT regexp_matches(E'hello\nworld', 'hello.world', 'p'); -- false
SELECT regexp_matches(E'hello\nworld', 'hello.world', 's'); -- true

Using regexp_matches The regexp_matches operator will be optimized to the LIKE operator when possible. To achieve best
performance, the 'c' option (case‑sensitive matching) should be passed if applicable. Note that by default the RE2 library doesn't match
the . character to newline.

Original Optimized equivalent

regexp_matches('hello world', '^hello', 'c') prefix('hello world', 'hello')


regexp_matches('hello world', 'world$', 'c') suffix('hello world', 'world')
regexp_matches('hello world', 'hello.world', LIKE 'hello_world'
'c')
regexp_matches('hello world', 'he.*rld', 'c') LIKE '%he%rld'

Using regexp_replace The regexp_replace function can be used to replace the part of a string that matches the regexp pattern
with a replacement string. The notation \d (where d is a number indicating the group) can be used to refer to groups captured in the regular
expression in the replacement string. Note that by default, regexp_replace only replaces the first occurrence of the regular expression.
To replace all occurrences, use the global replace (g) flag.

Some examples for using regexp_replace:

SELECT regexp_replace('abc', '(b|c)', 'X'); -- aXc


SELECT regexp_replace('abc', '(b|c)', 'X', 'g'); -- aXX
SELECT regexp_replace('abc', '(b|c)', '\1\1\1\1'); -- abbbbc
SELECT regexp_replace('abc', '(.*)c', '\1e'); -- abe
SELECT regexp_replace('abc', '(a)(b)', '\2\1'); -- bac

Using regexp_extract The regexp_extract function is used to extract a part of a string that matches the regexp pattern. A specific
capturing group within the pattern can be extracted using the idx parameter. If idx is not specified, it defaults to 0, extracting the first
match with the whole pattern.

479
DuckDB Documentation

SELECT regexp_extract('abc', '.b.'); -- abc


SELECT regexp_extract('abc', '.b.', 0); -- abc
SELECT regexp_extract('abc', '.b.', 1); -- (empty)
SELECT regexp_extract('abc', '([a-z])(b)', 1); -- a
SELECT regexp_extract('abc', '([a-z])(b)', 2); -- b

If ids is a LIST of strings, then regexp_extract will return the corresponding capture groups as fields of a STRUCT:

SELECT regexp_extract('2023-04-15', '(\d+)-(\d+)-(\d+)', ['y', 'm', 'd']);


-- {'y':'2023', 'm':'04', 'd':'15'}

SELECT regexp_extract('2023-04-15 07:59:56', '^(\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)', ['y', 'm', 'd']);


-- {'y':'2023', 'm':'04', 'd':'15'}

SELECT regexp_extract('duckdb_0_7_1', '^(\w+)_(\d+)_(\d+)', ['tool', 'major', 'minor', 'fix']);


-- error

If the number of column names is less than the number of capture groups, then only the first groups are returned. If the number of column
names is greater, then an error is generated.

Text Functions

This section describes functions and operators for examining and manipulating string values. The symbol denotes a space character.

Function Description Example Result Alias

string ^@ search_ Alias for starts_ 'abc' ^@ 'a' true starts_with


string with.
string || string String concatenation 'Duck' || 'DB' DuckDB
string[index] Alias for array_ 'DuckDB'[4] k array_extract
extract.
string[begin:end] Alias for array_ 'DuckDB'[:4] Duck array_slice
slice. Missing
begin or end
arguments are
interpreted as the
beginning or end of the
list respectively.
stringLIKEtarget Returns true if the 'hello' LIKE '%lo' true
string matches the like
specifier (see Pattern
Matching)
stringSIMILAR Returns true if the 'hello' SIMILAR TO false
TOregex string matches the 'l+'
regex; identical to
regexp_full_
match (see Pattern
Matching)
array_extract( Extract a single array_ u list_element,
list, index) character using a extract('DuckDB', 2) list_extract
(1‑based) index.

480
DuckDB Documentation

Function Description Example Result Alias

array_slice( Extract a string using array_slice('DuckDB', DB


list, begin, end) slice conventions. 5, NULL)
Negative values are
accepted.
ascii( string) Returns an integer that ascii('Ω') 937
represents the Unicode
code point of the first
character of the string
bar( x, min, Draw a band whose bar(5, 0, 20, 10) ██▌
max[,width]) width is proportional
to (x ‑ min) and equal
to width characters
when x = max. width
defaults to 80.
bit_length( Number of bits in a bit_length('abc') 24
string) string.
chr( x) Returns a character chr(65) A
which is corresponding
the ASCII code value or
Unicode code point
concat_ws( Concatenate strings concat_ws(', ', Banana, Apple,
separator, together separated by 'Banana', 'Apple', Melon
string, ...) the specified separator 'Melon')
concat( string, Concatenate many concat('Hello', ' ', Hello World
...) strings together 'World')
contains( Return true if search_ contains('abc', 'a') true
string, search_ string is found within
string) string
ends_with( Return true if string ends_with('abc', 'c') true suffix
string, search_ ends with search_
string) string
format_bytes( Converts bytes to a format_bytes(16384) 16.0 KiB
bytes) human‑readable
representation using
units based on powers
of 2 (KiB, MiB, GiB,
etc.).
format( format, Formats a string using format('Benchmark Benchmark "CSV"
parameters...) the fmt syntax "{}" took {} took 42 seconds
seconds', 'CSV', 42)
from_base64( Convert a base64 from_base64('QQ==') 'A'
string) encoded string to a
character string.
hash( value) Returns a UBIGINT hash(' ') 2595805878642663834
with the hash of the
value

481
DuckDB Documentation

Function Description Example Result Alias

ilike_escape( Returns true if the ilike_escape('A%c', true


string, like_ string matches the 'a$%C', '$')
specifier, like_specifier (see
escape_ Pattern Matching)
character) using case‑insensitive
matching. escape_
character is used to
search for wildcard
characters in the
string.
instr( string, Return location of first instr('test test', 2
search_string) occurrence of 'es')
search_string in
string, counting
from 1. Returns 0 if no
match found.
left_grapheme( Extract the left‑most left_ ♂
string, count) grapheme clusters grapheme(' ♂ ♀',
1)
left( string, Extract the left‑most left('Hello ', 2) He
count) count characters
length_grapheme( Number of grapheme length_ 2
string) clusters in string grapheme(' ♂ ♀')
length( string) Number of characters length('Hello ') 6
in string
like_escape( Returns true if the like_escape('a%c', true
string, like_ string matches the 'a$%c', '$')
specifier, like_specifier (see
escape_ Pattern Matching)
character) using case‑sensitive
matching. escape_
character is used to
search for wildcard
characters in the
string.
lower( string) Convert string to lower lower('Hello') hello lcase
case
lpad( string, Pads the string with lpad('hello', 10, >>>>>hello
count, the character from the '>')
character) left until it has count
characters
ltrim( string, Removes any ltrim('>>>>test<<', test<<
characters) occurrences of any of '><')
the characters from the
left sduide of the string
ltrim( string) Removes any spaces ltrim(' test ') test
from the left side of the
string

482
DuckDB Documentation

Function Description Example Result Alias

md5( value) Returns the MD5 hash md5('123') 202c...


of the value
nfc_normalize( Convert string to nfc_ ardèch
string) Unicode NFC normalize('ardèch')
normalized string.
Useful for comparisons
and ordering if text
data is mixed between
NFC normalized and
not.
not_ilike_ Returns false if the not_ilike_ false
escape( string, string matches the escape('A%c', 'a$%C',
like_specifier, like_specifier (see '$')
escape_ Pattern Matching)
character) using case‑sensitive
matching. escape_
character is used to
search for wildcard
characters in the
string.
not_like_escape( Returns false if the not_like_ false
string, like_ string matches the escape('a%c', 'a$%c',
specifier, like_specifier (see '$')
escape_ Pattern Matching)
character) using case‑insensitive
matching. escape_
character is used to
search for wildcard
characters in the
string.
ord( string) Return ASCII character ord('ü') 252
code of the leftmost
character in a string.
parse_dirname( Returns the top‑level parse_ path
path, separator) directory name from dirname('path/to/file.csv',
the given path. 'system')
separator options:
system, both_
slash (default),
forward_slash,
backslash.

483
DuckDB Documentation

Function Description Example Result Alias

parse_dirpath( Returns the head of the parse_ /path/to


path, separator) path (the pathname dirpath('/path/to/file.csv',
until the last slash) 'forward_slash')
similarly to Python's
os.path.dirname
function. separator
options: system,
both_slash
(default), forward_
slash, backslash.
parse_filename( Returns the last parse_ file
path, trim_ component of the path filename('path/to/file.csv',
extension, similarly to Python's true, 'system')
separator) os.path.basename
function. If trim_
extension is true,
the file extension will
be removed (defaults
to false).
separator options:
system, both_
slash (default),
forward_slash,
backslash.
parse_path( Returns a list of the parse_ [/, path, to,
path, separator) components file.csv]
path('/path/to/file.csv',
(directories and 'system')
filename) in the path
similarly to Python's
pathlib.parts
function. separator
options: system,
both_slash
(default), forward_
slash, backslash.
position( Return location of first position('b' in 2
search_ occurrence of 'abc')
stringinstring) search_string in
string, counting
from 1. Returns 0 if no
match found.
printf( format, Formats a string using printf('Benchmark Benchmark "CSV"
parameters...) printf syntax "%s" took %d took 42 seconds
seconds', 'CSV', 42)

484
DuckDB Documentation

Function Description Example Result Alias

read_text( Returns the content read_ hello\n


source) from source (a text('hello.txt')
filename, a list of
filenames, or a glob
pattern) as a VARCHAR.
The file content is first
validated to be valid
UTF‑8. If read_text
attempts to read a file
with invalid UTF‑8 an
error is thrown
suggesting to use
read_blob instead.
See the read_text
guide for more details.
regexp_escape( Escapes special regexp_ https\:\/\/duckdb\.org
string) patterns to turn string escape('https://duckdb.org')
into a regular
expression similarly to
Python's re.escape
function
regexp_extract_ Split the string along regexp_extract_ [hello, world]
all( string, the regex and extract all('hello_world',
regex[,group= all occurrences of '([a-z ]+)_?', 1)
0]) group
regexp_extract( If string contains the regexp_ {'y':'2023',
string, pattern, regexp pattern, returns extract('2023-04-15', 'm':'04',
name_list); the capturing groups '(\d+)-(\d+)-(\d+)', 'd':'15'}
as a struct with ['y', 'm', 'd'])
corresponding names
from name_list (see
Pattern Matching)
regexp_extract( If string contains the regexp_ hello
string, regexp pattern, returns extract('hello_
pattern[,idx]); the capturing group world', '([a-z ]+)_
specified by optional ?', 1)
parameter idx (see
Pattern Matching)
regexp_full_ Returns true if the regexp_full_ false
match( string, entire string matches match('anabanana',
regex) the regex (see Pattern '(an)*')
Matching)
regexp_matches( Returns true if string regexp_ true
string, pattern) contains the regexp matches('anabanana',
pattern, false '(an)*')
otherwise (see Pattern
Matching)

485
DuckDB Documentation

Function Description Example Result Alias

regexp_replace( If string contains the regexp_ he-lo


string, pattern, regexp pattern, replace('hello',
replacement); replaces the matching '[lo]', '-')
part with replacement
(see Pattern Matching)
regexp_split_to_ Splits the string along regexp_split_to_ ['hello', string_split_
array( string, the regex array('hello world; 'world', '42'] regex, str_split_
regex) 42', ';? ') regex
regexp_split_to_ Splits the string along regexp_split_to_ Two rows: 'hello',
table( string, the regex and returns a array('hello world; 'world'
regex) row for each part 42', ';? ')
repeat( string, Repeats the string repeat('A', 5) AAAAA
count) count number of times
replace( string, Replaces any replace('hello', 'l', he--o
source, target) occurrences of the '-')
source with target in
string
reverse( string) Reverses the string reverse('hello') olleh
right_grapheme( Extract the right‑most right_ ♀
string, count) count grapheme grapheme(' ♂ ♀',
clusters 1)
right( string, Extract the right‑most right('Hello ', 3) lo
count) count characters
rpad( string, Pads the string with rpad('hello', 10, hello<<<<<
count, the character from the '<')
character) right until it has count
characters
rtrim( string, Removes any rtrim('>>>>test<<', >>>>test
characters) occurrences of any of '><')
the characters from the
right side of the string
rtrim( string) Removes any spaces rtrim(' test ') test
from the right side of
the string
sha256( value) Returns a VARCHAR sha-256(' ') d7a5...
with the SHA‑256 hash
of the value
split_part( Split the string along split_part('a|b|c', b
string, the separator and '|', 2)
separator, return the data at the
index) (1‑based) index of the
list. If the index is
outside the bounds of
the list, return an
empty string (to match
PostgreSQL's
behavior).

486
DuckDB Documentation

Function Description Example Result Alias

starts_with( Return true if string starts_with('abc', true


string, search_ begins with search_ 'a')
string) string
str_split_regex( Splits the string along str_split_ ['hello', string_split_
string, regex) the regex regex('hello world; 'world', '42'] regex, regexp_
42', ';? ') split_to_array
string_split_ Splits the string along string_split_ ['hello', str_split_regex,
regex( string, the regex regex('hello world; 'world', '42'] regexp_split_to_
regex) 42', ';? ') array
string_split( Splits the string along string_ ['hello', str_split,
string, the separator split('hello world', 'world'] string_to_array
separator) ' ')
strip_accents( Strips accents from strip_ muhleisen
string) string accents('mühleisen')
strlen( string) Number of bytes in strlen(' ') 4
string
strpos( string, Alias for instr. strpos('test test', 2 instr
search_string) Return location of first 'es')
occurrence of search_
string in string,
counting from 1.
Returns 0 if no match
found.
substring( Extract substring of substring('Hello', 2, el substr
string, start, length characters 2)
length) starting from character
start. Note that a start
value of 1 refers to the
first character of the
string.
substring_ Extract substring of substring_ ♀
grapheme( length grapheme grapheme(' ♂ ♀ ',
string, start, clusters starting from 3, 2)
length) character start. Note
that a start value of 1
refers to the first
character of the string.
to_base64( blob) Convert a blob to a to_base64('A'::blob) QQ== base64
base64 encoded string.
trim( string, Removes any trim('>>>>test<<', test
characters) occurrences of any of '><')
the characters from
either side of the string
trim( string) Removes any spaces trim(' test ') test
from either side of the
string

487
DuckDB Documentation

Function Description Example Result Alias

unicode( string) Returns the unicode unicode('ü') 252


code of the first
character of the string
upper( string) Convert string to upper upper('Hello') HELLO ucase
case

Text Similarity Functions

These functions are used to measure the similarity of two strings using various similarity measures.

Function Description Example Result

damerau_levenshtein( Extension of Levenshtein damerau_levenshtein('duckdb', 2


s1, s2) distance to also include 'udckbd')
transposition of adjacent
characters as an allowed edit
operation. In other words, the
minimum number of edit
operations (insertions,
deletions, substitutions or
transpositions) required to
change one string to another.
Different case is considered
different.
editdist3( s1, s2) Alias of levenshtein for editdist3('duck', 'db') 3
SQLite compatibility. The
minimum number of
single‑character edits
(insertions, deletions or
substitutions) required to
change one string to the other.
Different case is considered
different.
jaccard( s1, s2) The Jaccard similarity between jaccard('duck', 'luck') 0.6
two strings. Different case is
considered different. Returns a
number between 0 and 1.
jaro_similarity( s1, The Jaro similarity between jaro_similarity('duck', 0.88
s2) two strings. Different case is 'duckdb')
considered different. Returns a
number between 0 and 1.
jaro_winkler_ The Jaro‑Winkler similarity jaro_winkler_ 0.93
similarity( s1, s2) between two strings. Different similarity('duck', 'duckdb')
case is considered different.
Returns a number between 0
and 1.

488
DuckDB Documentation

Function Description Example Result

levenshtein( s1, s2) The minimum number of levenshtein('duck', 'db') 3


single‑character edits
(insertions, deletions or
substitutions) required to
change one string to the other.
Different case is considered
different.
mismatches( s1, s2) Alias for hamming( s1, s2). mismatches('duck', 'luck') 1
The number of positions with
different characters for two
strings of equal length.
Different case is considered
different.

Formatters

fmt Syntax The format( format, parameters...) function formats strings, loosely following the syntax of the {fmt} open‑source
formatting library.

-- Format without additional parameters


SELECT format('Hello world'); -- Hello world
-- Format a string using {}
SELECT format('The answer is {}', 42); -- The answer is 42
// s == "The answer is 42."
-- Format a string using positional arguments
SELECT format('I''d rather be {1} than {0}.', 'right', 'happy'); -- I'd rather be happy than right.

Format Specifiers

Specifier Description Example

{:d} integer 123456


{:E} scientific notation 3.141593E+00
{:f} float 4.560000
{:o} octal 361100
{:s} string asd
{:x} hexadecimal 1e240
{:tX} integer, X is the thousand separator 123 456

Formatting Types
-- Integers
SELECT format('{} + {} = {}', 3, 5, 3 + 5); -- 3 + 5 = 8
-- Booleans
SELECT format('{} != {}', true, false); -- true != false
-- Format datetime values
SELECT format('{}', DATE '1992-01-01'); -- 1992-01-01
SELECT format('{}', TIME '12:01:00'); -- 12:01:00
SELECT format('{}', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00

489
DuckDB Documentation

-- Format BLOB
SELECT format('{}', BLOB '\x00hello'); -- \x00hello
-- Pad integers with 0s
SELECT format('{:04d}', 33); -- 0033
-- Create timestamps from integers
SELECT format('{:02d}:{:02d}:{:02d} {}', 12, 3, 16, 'AM'); -- 12:03:16 AM
-- Convert to hexadecimal
SELECT format('{:x}', 123_456_789); -- 75bcd15
-- Convert to binary
SELECT format('{:b}', 123_456_789); -- 111010110111100110100010101

Print Numbers with Thousand Separators

SELECT format('{:,}', 123_456_789); -- 123,456,789


SELECT format('{:t.}', 123_456_789); -- 123.456.789
SELECT format('{:''}', 123_456_789); -- 123'456'789
SELECT format('{:_}', 123_456_789); -- 123_456_789
SELECT format('{:t }', 123_456_789); -- 123 456 789
SELECT format('{:tX}', 123_456_789); -- 123X456X789

printf Syntax The printf( format, parameters...) function formats strings using the printf syntax.

-- Format without additional parameters


SELECT printf('Hello world');
-- output: Hello world

-- Format a string using arguments in a given order


SELECT printf('The answer to %s is %d', 'life', 42);
-- output: The answer to life is 42

-- Format a string using positional arguments '%position$formatter',


-- e.g., the second parameter as a string is encoded as '%2$s'
SELECT printf('I''d rather be %2$s than %1$s.', 'right', 'happy');
-- output: I'd rather be happy than right.

Format Specifiers

Specifier Description Example

%c character code to character a


%d integer 123456
%Xd integer with thousand seperarator X from ,, ., '', _ 123_456
%E scientific notation 3.141593E+00
%f float 4.560000
%hd integer 123456
%hhd integer 123456
%lld integer 123456
%o octal 361100
%s string asd
%x hexadecimal 1e240

490
DuckDB Documentation

Formatting Types
-- Integers
SELECT printf('%d + %d = %d', 3, 5, 3 + 5); -- 3 + 5 = 8
-- Booleans
SELECT printf('%s != %s', true, false); -- true != false
-- Format datetime values
SELECT printf('%s', DATE '1992-01-01'); -- 1992-01-01
SELECT printf('%s', TIME '12:01:00'); -- 12:01:00
SELECT printf('%s', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00
-- Format BLOB
SELECT printf('%s', BLOB '\x00hello'); -- \x00hello
-- Pad integers with 0s
SELECT printf('%04d', 33); -- 0033
-- Create timestamps from integers
SELECT printf('%02d:%02d:%02d %s', 12, 3, 16, 'AM'); -- 12:03:16 AM
-- Convert to hexadecimal
SELECT printf('%x', 123_456_789); -- 75bcd15
-- Convert to binary
SELECT printf('%b', 123_456_789); -- 111010110111100110100010101

Thousand Separators
SELECT printf('%,d', 123_456_789); -- 123,456,789
SELECT printf('%.d', 123_456_789); -- 123.456.789
SELECT printf('%''d', 123_456_789); -- 123'456'789
SELECT printf('%_d', 123_456_789); -- 123_456_789

Time Functions

This section describes functions and operators for examining and manipulating TIME values.

Time Operators

The table below shows the available mathematical operators for TIME types.

Operator Description Example Result

+ addition of an INTERVAL TIME '01:02:03' + INTERVAL 5 06:02:03


HOUR
- subtraction of an INTERVAL TIME '06:02:03' - INTERVAL 5 01:02:03
HOUR'

Time Functions

The table below shows the available scalar functions for TIME types.

Function Description Example Result

current_time/get_ Current time (start of current


current_time() transaction)

491
DuckDB Documentation

Function Description Example Result

date_diff( part, The number of partition date_diff('hour', TIME 5


starttime, endtime) boundaries between the '01:02:03', TIME
times '06:01:03')
date_part( part, Get subfield (equivalent to date_part('minute', TIME 21
time) extract) '14:21:13')
date_sub( part, The number of complete date_sub('hour', TIME 4
starttime, endtime) partitions between the times '01:02:03', TIME
'06:01:03')
datediff( part, Alias of date_diff. The datediff('hour', TIME 5
starttime, endtime) number of partition '01:02:03', TIME
boundaries between the '06:01:03')
times
datepart( part, time) Alias of date_part. Get datepart('minute', TIME 21
subfield (equivalent to '14:21:13')
extract)
datesub( part, Alias of date_sub. The datesub('hour', TIME 4
starttime, endtime) number of complete '01:02:03', TIME
partitions between the times '06:01:03')
extract( part FROM Get subfield from a time extract('hour' FROM TIME 14
time) '14:21:13')
make_time( bigint, The time for the given parts make_time(13, 34, 13:34:27.123456
bigint, double) 27.123456)

The only date parts that are defined for times are epoch, hours, minutes, seconds, milliseconds and microseconds.

Timestamp Functions

This section describes functions and operators for examining and manipulating TIMESTAMP values.

Timestamp Operators

The table below shows the available mathematical operators for TIMESTAMP types.

Operator Description Example Result

+ addition of an INTERVAL TIMESTAMP '1992-03-22 01:02:03' + 1992-03-27 01:02:03


INTERVAL 5 DAY
- subtraction of TIMESTAMPs TIMESTAMP '1992-03-27' - TIMESTAMP 5 days
'1992-03-22'
- subtraction of an INTERVAL TIMESTAMP '1992-03-27 01:02:03' - 1992-03-22 01:02:03
INTERVAL 5 DAY

Adding to or subtracting from infinite values produces the same infinite value.

492
DuckDB Documentation

Timestamp Functions

The table below shows the available scalar functions for TIMESTAMP values.

Function Description Example Result

age( timestamp, Subtract arguments, age(TIMESTAMP '2001-04-10', 8 years 6 months 20


timestamp) resulting in the time TIMESTAMP '1992-09-20') days
difference between the two
timestamps
age( timestamp) Subtract from current_date age(TIMESTAMP '1992-09-20') 29 years 1 month 27
days 12:39:00.844
century( timestamp) Extracts the century of a century(TIMESTAMP 20
timestamp '1992-03-22')
date_diff( part, The number of partition date_diff('hour', TIMESTAMP 2
startdate, enddate) boundaries between the '1992-09-30 23:59:59',
timestamps TIMESTAMP '1992-10-01
01:58:00')
date_part([part, Get the listed subfields as a date_part(['year', 'month', {year: 1992, month:
...],timestamp) struct. The list must be 'day'], TIMESTAMP 9, day: 20}
constant. '1992-09-20 20:38:40')
date_part( part, Get subfield (equivalent to date_part('minute', 38
timestamp) extract) TIMESTAMP '1992-09-20
20:38:40')
date_sub( part, The number of complete date_sub('hour', TIMESTAMP 1
startdate, enddate) partitions between the '1992-09-30 23:59:59',
timestamps TIMESTAMP '1992-10-01
01:58:00')
date_trunc( part, Truncate to specified date_trunc('hour', 1992-09-20 20:00:00
timestamp) precision TIMESTAMP '1992-09-20
20:38:40')
datediff( part, Alias of date_diff. The datediff('hour', TIMESTAMP 2
startdate, enddate) number of partition '1992-09-30 23:59:59',
boundaries between the TIMESTAMP '1992-10-01
timestamps 01:58:00')
datepart([part, Alias of date_part. Get the datepart(['year', 'month', {year: 1992, month:
...],timestamp) listed subfields as a struct. 'day'], TIMESTAMP 9, day: 20}
The list must be constant. '1992-09-20 20:38:40')
datepart( part, Alias of date_part. Get datepart('minute', 38
timestamp) subfield (equivalent to TIMESTAMP '1992-09-20
extract) 20:38:40')
datesub( part, Alias of date_sub. The datesub('hour', TIMESTAMP 1
startdate, enddate) number of complete '1992-09-30 23:59:59',
partitions between the TIMESTAMP '1992-10-01
timestamps 01:58:00')
datetrunc( part, Alias of date_trunc. Truncate datetrunc('hour', TIMESTAMP 1992-09-20 20:00:00
timestamp) to specified precision '1992-09-20 20:38:40')
dayname( timestamp) The (English) name of the dayname(TIMESTAMP Sunday
weekday '1992-03-22')

493
DuckDB Documentation

Function Description Example Result

epoch_ms( ms) Converts ms since epoch to a epoch_ms(701222400000) 1992-03-22 00:00:00


timestamp
epoch_ms( timestamp) Converts a timestamp to epoch_ms('2022-11-07 1667810584123
milliseconds since the epoch 08:43:04.123456'::TIMES-
TAMP);
epoch_ms( timestamp) Return the total number of epoch_ms(timestamp 1627991984123
milliseconds since the epoch '2021-08-03
11:59:44.123456')
epoch_ns( timestamp) Return the total number of epoch_ns(timestamp 1627991984123456000
nanoseconds since the epoch '2021-08-03
11:59:44.123456')
epoch_us( timestamp) Return the total number of epoch_ms(timestamp 1627991984123456
microseconds since the '2021-08-03
epoch 11:59:44.123456')
epoch( timestamp) Converts a timestamp to epoch('2022-11-07 1667810584
seconds since the epoch 08:43:04'::TIMESTAMP);
extract( field from Get subfield from a extract('hour' FROM 20
timestamp) timestamp TIMESTAMP '1992-09-20
20:38:48')
greatest( timestamp, The later of two timestamps greatest(TIMESTAMP 1992-09-20 20:38:48
timestamp) '1992-09-20 20:38:48',
TIMESTAMP '1992-03-22
01:02:03.1234')
isfinite( timestamp) Returns true if the timestamp isfinite(TIMESTAMP true
is finite, false otherwise '1992-03-07')
isinf( timestamp) Returns true if the timestamp isinf(TIMESTAMP true
is infinite, false otherwise '-infinity')
last_day( timestamp) The last day of the month. last_day(TIMESTAMP 1992-03-31
'1992-03-22 01:02:03.1234')
least( timestamp, The earlier of two least(TIMESTAMP '1992-09-20 1992-03-22
timestamp) timestamps 20:38:48', TIMESTAMP 01:02:03.1234
'1992-03-22 01:02:03.1234')
make_timestamp( The timestamp for the given make_timestamp(1992, 9, 20, 1992-09-20
bigint, bigint, parts 13, 34, 27.123456) 13:34:27.123456
bigint, bigint,
bigint, double)
make_timestamp( The timestamp for the given make_ 2022-11-07
microseconds) number of μs since the epoch timestamp(1667810584123456) 08:43:04.123456
monthname( timestamp) The (English) name of the monthname(TIMESTAMP September
month. '1992-09-20')
strftime( timestamp, Converts timestamp to string strftime(timestamp Wed, 1 January 1992
format) according to the format '1992-01-01 20:38:40', '%a, - 08:38:40 PM
string %-d %B %Y - %I:%M:%S %p')

494
DuckDB Documentation

Function Description Example Result

strptime( text, Converts string to timestamp strptime('4/15/2023 2023-04-15 10:56:00


format-list) applying the format strings in 10:56:00', ['%d/%m/%Y
the list until one succeeds. %H:%M:%S', '%m/%d/%Y
Throws on failure. %H:%M:%S'])
strptime( text, Converts string to timestamp strptime('Wed, 1 January 1992-01-01 20:38:40
format) according to the format 1992 - 08:38:40 PM', '%a,
string. Throws on failure. %-d %B %Y - %I:%M:%S %p')
time_bucket( bucket_ Truncate timestamp by the time_bucket(INTERVAL '10 1992-04-20 15:25:00
width, specified interval bucket_ minutes', TIMESTAMP
timestamp[,offset]) width. Buckets are offset by '1992-04-20 15:26:00-07',
offset interval. INTERVAL '5 minutes')
time_bucket( bucket_ Truncate timestamp by the time_bucket(INTERVAL '2 1992-04-15 00:00:00
width, specified interval bucket_ weeks', TIMESTAMP
timestamp[,origin]) width. Buckets are aligned '1992-04-20 15:26:00',
relative to origin TIMESTAMP '1992-04-01
timestamp. origin defaults 00:00:00')
to 2000‑01‑03 00:00:00 for
buckets that don't include a
month or year interval, and
to 2000‑01‑01 00:00:00 for
month and year buckets.
to_timestamp( double) Converts seconds since the to_timestamp(1284352323.5) 2010-09-13
epoch to a timestamp with 04:32:03.5+00
time zone
try_strptime( text, Converts string to timestamp try_strptime('4/15/2023 2023-04-15 10:56:00
format-list) applying the format strings in 10:56:00', ['%d/%m/%Y
the list until one succeeds. %H:%M:%S', '%m/%d/%Y
Returns NULL on failure. %H:%M:%S'])
try_strptime( text, Converts string to timestamp try_strptime('Wed, 1 1992-01-01 20:38:40
format) according to the format January 1992 - 08:38:40
string. Returns NULL on PM', '%a, %-d %B %Y -
failure. %I:%M:%S %p')

There are also dedicated extraction functions to get the subfields.

Functions applied to infinite dates will either return the same infinite dates (e.g, greatest) or NULL (e.g., date_part) depending on
what ”makes sense”. In general, if the function needs to examine the parts of the infinite date, the result will be NULL.

Timestamp Table Functions

The table below shows the available table functions for TIMESTAMP types.

Function Description Example

generate_series( timestamp, Generate a table of timestamps in the generate_series(TIMESTAMP


timestamp, interval) closed range, stepping by the interval '2001-04-10', TIMESTAMP
'2001-04-11', INTERVAL 30 MINUTE)

495
DuckDB Documentation

Function Description Example

range( timestamp, timestamp, Generate a table of timestamps in the range(TIMESTAMP '2001-04-10',


interval) half open range, stepping by the TIMESTAMP '2001-04-11', INTERVAL 30
interval MINUTE)

Note. Infinite values are not allowed as table function bounds.

Timestamp with Time Zone Functions

This section describes functions and operators for examining and manipulating TIMESTAMP WITH TIME ZONE (or TIMESTAMPTZ)
values.

Despite the name, these values do not store a time zone ‑ just an instant like TIMESTAMP. Instead, they request that the instant be binned
and formatted using the current time zone.

Time zone support is not built in but can be provided by an extension, such as the ICU extension that ships with DuckDB.

In the examples below, the current time zone is presumed to be America/Los_Angeles using the Gregorian calendar.

Built‑In Timestamp with Time Zone Functions

The table below shows the available scalar functions for TIMESTAMPTZ values. Since these functions do not involve binning or display,
they are always available.

Function Description Example Result

current_timestamp Current date and time (start current_timestamp 2022-10-08


of current transaction) 12:44:46.122-07
get_current_ Current date and time (start get_current_timestamp() 2022-10-08
timestamp() of current transaction) 12:44:46.122-07
greatest( The later of two timestamps greatest(TIMESTAMPTZ 1992-09-20
timestamptz, '1992-09-20 20:38:48', 20:38:48-07
timestamptz) TIMESTAMPTZ '1992-03-22
01:02:03.1234')
isfinite( Returns true if the timestamp isfinite(TIMESTAMPTZ true
timestamptz) with time zone is finite, false '1992-03-07')
otherwise
isinf( timestamptz) Returns true if the timestamp isinf(TIMESTAMPTZ true
with time zone is infinite, '-infinity')
false otherwise
least( timestamptz, The earlier of two least(TIMESTAMPTZ 1992-03-22
timestamptz) timestamps '1992-09-20 20:38:48', 01:02:03.1234-08
TIMESTAMPTZ '1992-03-22
01:02:03.1234')
now() Current date and time (start now() 2022-10-08
of current transaction) 12:44:46.122-07
transaction_ Current date and time (start transaction_timestamp() 2022-10-08
timestamp() of current transaction) 12:44:46.122-07

496
DuckDB Documentation

Timestamp with Time Zone Strings

With no time zone extension loaded, TIMESTAMPTZ values will be cast to and from strings using offset notation. This will let you specify
an instant correctly without access to time zone information. For portability, TIMESTAMPTZ values will always be displayed using GMT
offsets:

SELECT '2022-10-08 13:13:34-07'::TIMESTAMPTZ;


-- 2022-10-08 20:13:34+00

If a time zone extension such as ICU is loaded, then a time zone can be parsed from a string and cast to a representation in the local time
zone:

SELECT '2022-10-08 13:13:34 Europe/Amsterdam'::TIMESTAMPTZ::VARCHAR;


-- 2022-10-08 04:13:34-07 -- the offset will differ based on your local time zone

ICU Timestamp with Time Zone Operators

The table below shows the available mathematical operators for TIMESTAMP WITH TIME ZONE values provided by the ICU extension.

Operator Description Example Result

+ addition of an INTERVAL TIMESTAMPTZ '1992-03-22 01:02:03' 1992-03-27 01:02:03


+ INTERVAL 5 DAY
- subtraction of TIMESTAMPTZ '1992-03-27' - 5 days
TIMESTAMPTZs TIMESTAMPTZ '1992-03-22'
- subtraction of an INTERVAL TIMESTAMPTZ '1992-03-27 01:02:03' 1992-03-22 01:02:03
- INTERVAL 5 DAY

Adding to or subtracting from infinite values produces the same infinite value.

ICU Timestamp with Time Zone Functions

The table below shows the ICU provided scalar functions for TIMESTAMP WITH TIME ZONE values.

Function Description Example Result

age( timestamptz, Subtract arguments, resulting age(TIMESTAMPTZ 8 years 6 months 20


timestamptz) in the time difference between '2001-04-10', days
the two timestamps TIMESTAMPTZ
'1992-09-20')
age( timestamptz) Subtract from current_date age(TIMESTAMP 29 years 1 month 27
'1992-09-20') days 12:39:00.844
date_diff( part, The number of partition date_diff('hour', 2
startdate, enddate) boundaries between the TIMESTAMPTZ
timestamps '1992-09-30 23:59:59',
TIMESTAMPTZ
'1992-10-01 01:58:00')

497
DuckDB Documentation

Function Description Example Result

date_part([part, Get the listed subfields as a date_part(['year', {year: 1992, month: 9,


...],timestamptz) struct. The list must be 'month', 'day'], day: 20}
constant. TIMESTAMPTZ
'1992-09-20
20:38:40-07')
date_part( part, Get subfield (equivalent to date_part('minute', 38
timestamptz) extract) TIMESTAMPTZ
'1992-09-20 20:38:40')
date_sub( part, The number of complete date_sub('hour', 1
startdate, enddate) partitions between the TIMESTAMPTZ
timestamps '1992-09-30 23:59:59',
TIMESTAMPTZ
'1992-10-01 01:58:00')
date_trunc( part, Truncate to specified precision date_trunc('hour', 1992-09-20 20:00:00
timestamptz) TIMESTAMPTZ
'1992-09-20 20:38:40')
datediff( part, Alias of date_diff. The number datediff('hour', 2
startdate, enddate) of partition boundaries TIMESTAMPTZ
between the timestamps '1992-09-30 23:59:59',
TIMESTAMPTZ
'1992-10-01 01:58:00')
datepart([part, Alias of date_part. Get the datepart(['year', {year: 1992, month: 9,
...],timestamptz) listed subfields as a struct. 'month', 'day'], day: 20}
The list must be constant. TIMESTAMPTZ
'1992-09-20
20:38:40-07')
datepart( part, Alias of date_part. Get subfield datepart('minute', 38
timestamptz) (equivalent to extract) TIMESTAMPTZ
'1992-09-20 20:38:40')
datesub( part, Alias of date_sub. The number datesub('hour', 1
startdate, enddate) of complete partitions between TIMESTAMPTZ
the timestamps '1992-09-30 23:59:59',
TIMESTAMPTZ
'1992-10-01 01:58:00')
datetrunc( part, Alias of date_trunc. Truncate to datetrunc('hour', 1992-09-20 20:00:00
timestamptz) specified precision TIMESTAMPTZ
'1992-09-20 20:38:40')
epoch_ms( timestamptz) Converts a timestamptz to epoch_ms('2022-11-07 1667810584123
milliseconds since the epoch 08:43:04.123456+00'::TIMES-
TAMPTZ);
epoch_ns( timestamptz) Converts a timestamptz to epoch_ns('2022-11-07 1667810584123456000
nanoseconds since the epoch 08:43:04.123456+00'::TIMES-
TAMPTZ);
epoch_us( timestamptz) Converts a timestamptz to epoch_us('2022-11-07 1667810584123456
microseconds since the epoch 08:43:04.123456+00'::TIMES-
TAMPTZ);

498
DuckDB Documentation

Function Description Example Result

extract( field from Get subfield from a timestamp extract('hour' FROM 20


timestamptz) with time zone TIMESTAMPTZ
'1992-09-20 20:38:48')
last_day( timestamptz) The last day of the month. last_day(TIMESTAMPTZ 1992-03-31
'1992-03-22
01:02:03.1234')
make_timestamptz( The timestamp with time zone make_timestamptz(1992, 1992-09-20
bigint, bigint, bigint, for the given parts and time 9, 20, 15, 34, 06:34:27.123456-07
bigint, bigint, double, zone 27.123456, 'CET')
string)
make_timestamptz( The timestamp with time zone make_timestamptz(1992, 1992-09-20
bigint, bigint, bigint, for the given parts in the 9, 20, 13, 34, 13:34:27.123456-07
bigint, bigint, double) current time zone 27.123456)
make_timestamptz( The timestamp with time zone make_ 2022-11-07
microseconds) for the given μs since the epoch timestamptz(1667810584123456)
16:43:04.123456-08
strftime( timestamptz, Converts timestamp with time strftime(timestamptz Wed, 1 January 1992 -
format) zone to string according to the '1992-01-01 20:38:40', 08:38:40 PM
format string '%a, %-d %B %Y -
%I:%M:%S %p')
strptime( text, format) Converts string to timestamp strptime('Wed, 1 1992-01-01 08:38:40-08
with time zone according to the January 1992 -
format string if %Z is specified. 08:38:40 PST', '%a,
%-d %B %Y - %H:%M:%S
%Z')
time_bucket( bucket_ Truncate timestamptz by time_bucket(INTERVAL 1992-04-20 15:25:00-07
width, the specified interval bucket_ '10 minutes',
timestamptz[,offset]) width. Buckets are offset by TIMESTAMPTZ
offset interval. '1992-04-20
15:26:00-07', INTERVAL
'5 minutes')
time_bucket( bucket_ Truncate timestamptz by time_bucket(INTERVAL 1992-04-15 00:00:00-07
width, the specified interval bucket_ '2 weeks', TIMESTAMPTZ
timestamptz[,origin]) width. Buckets are aligned '1992-04-20
relative to origin 15:26:00-07',
timestamptz. origin defaults TIMESTAMPTZ
to 2000‑01‑03 00:00:00+00 for '1992-04-01
buckets that don't include a 00:00:00-07')
month or year interval, and to
2000‑01‑01 00:00:00+00 for
month and year buckets.
time_bucket( bucket_ Truncate timestamptz by time_bucket(INTERVAL 1992-04-19 15:00:00-07
width, times- the specified interval bucket_ '2 days', TIMESTAMPTZ
tamptz[,timezone]) width. Bucket starts and ends '1992-04-20
are calculated using 15:26:00-07',
timezone. timezone is a 'Europe/Berlin')
varchar and defaults to UTC.

There are also dedicated extraction functions to get the subfields.

499
DuckDB Documentation

ICU Timestamp Table Functions

The table below shows the available table functions for TIMESTAMP WITH TIME ZONE types.

Function Description Example

generate_series( Generate a table of timestamps in the closed generate_series(TIMESTAMPTZ


timestamptz, timestamptz, range (including both the starting timestamp '2001-04-10', TIMESTAMPTZ
interval) and the ending timestamp), stepping by the '2001-04-11', INTERVAL 30 MINUTE)
interval
range( timestamptz, Generate a table of timestamps in the half open range(TIMESTAMPTZ '2001-04-10',
timestamptz, interval) range (including the starting timestamp, but TIMESTAMPTZ '2001-04-11',
stopping before the ending timestamp) , INTERVAL 30 MINUTE)
stepping by the interval

Note. Infinite values are not allowed as table function bounds.

ICU Timestamp Without Time Zone Functions

The table below shows the ICU provided scalar functions that operate on plain TIMESTAMP values. These functions assume that the
TIMESTAMP is a ”local timestamp”.

A local timestamp is effectively a way of encoding the part values from a time zone into a single value. They should be used with cau‑
tion because the produced values can contain gaps and ambiguities thanks to daylight savings time. Often the same functionality can be
implemented more reliably using the struct variant of the date_part function.

Function Description Example Result

current_localtime() Returns a TIME whose GMT current_localtime() 08:47:56.497


bin values correspond to
local time in the current time
zone.
current_ Returns a TIMESTAMP current_localtimestamp() 2022-12-17
localtimestamp() whose GMT bin values 08:47:56.497
correspond to local date and
time in the current time zone.
localtime Synonym for the current_ localtime 08:47:56.497
localtime() function call.
localtimestamp Synonym for the current_ localtimestamp 2022-12-17
localtimestamp() 08:47:56.497
function call.
timezone( text, Use the date parts of the timezone('America/Denver', 2001-02-16
timestamp) timestamp in GMT to TIMESTAMP '2001-02-16 19:38:40-08
construct a timestamp in the 20:38:40')
given time zone. Effectively,
the argument is a ”local”
time.

500
DuckDB Documentation

Function Description Example Result

timezone( text, Use the date parts of the timezone('America/Denver', 2001-02-16 18:38:40
timestamptz) timestamp in the given time TIMESTAMPTZ '2001-02-16
zone to construct a 20:38:40-05')
timestamp. Effectively, the
result is a ”local” time.

At Time Zone The AT TIME ZONE syntax is syntactic sugar for the (two argument) timezone function listed above:

timestamp '2001-02-16 20:38:40' AT TIME ZONE 'America/Denver';


-- 2001-02-16 19:38:40-08
timestamp with time zone '2001-02-16 20:38:40-05' AT TIME ZONE 'America/Denver';
-- 2001-02-16 18:38:40

Infinities

Functions applied to infinite dates will either return the same infinite dates (e.g, greatest) or NULL (e.g., date_part) depending on
what ”makes sense”. In general, if the function needs to examine the parts of the infinite temporal value, the result will be NULL.

Calendars

The ICU extension also supports non‑Gregorian calendars. If such a calendar is current, then the display and binning operations will use
that calendar.

Utility Functions

Utility Functions

The functions below are difficult to categorize into specific function types and are broadly useful.

Function Description Example Result

alias( column) Return the name of the alias(column1) column1


column
checkpoint( database) Synchronize WAL with file for checkpoint(my_db) success boolean
(optional) database without
interrupting transactions.
coalesce( expr, ...) Return the first expression coalesce(NULL, NULL, default_string
that evaluates to a non‑NULL 'default_string')
value. Accepts 1 or more
parameters. Each expression
can be a column, literal value,
function result, or many
others.
constant_or_null( If arg2 is NULL, return constant_or_null(42, NULL) NULL
arg1, arg2) NULL. Otherwise, return
arg1.
count_if( x) Returns 1 if x is true or a count_if(42) 1
non‑zero number

501
DuckDB Documentation

Function Description Example Result

current_catalog() Return the name of the current_catalog() memory


currently active catalog.
Default is memory.
current_schema() Return the name of the current_schema() main
currently active schema.
Default is main.
current_schemas( Return list of schemas. Pass a current_schemas(true) ['temp', 'main',
boolean) parameter of true to 'pg_catalog']
include implicit schemas.
current_setting( Return the current value of current_setting('access_ automatic
'setting_name') the configuration setting mode')
currval( 'sequence_ Return the current value of currval('my_sequence_name') 1
name') the sequence. Note that
nextval must be called at
least once prior to calling
currval.
error( message) Throws the given error error('access_mode')
message
force_checkpoint( Synchronize WAL with file for force_checkpoint(my_db) success boolean
database) (optional) database
interrupting transactions.
gen_random_uuid() Alias of uuid. Return a gen_random_uuid() various
random UUID similar to this:
eeccb8c5-9943-b2bb-
bb5e-222f4e14b687.
hash( value) Returns a UBIGINT with the hash(' ') 2595805878642663834
hash of the value
icu_sort_key( string, Surrogate key used to sort icu_sort_key('ö', 'DE') 460145960106
collator) special characters according
to the specific locale.
Collator parameter is
optional. Valid only when
ICU extension is installed.
ifnull( expr, other) A two‑argument version of ifnull(NULL, 'default_ default_string
coalesce string')
md5( string) Return an md5 one‑way hash md5('123') 202c...
of the string.
nextval( 'sequence_ Return the following value of nextval('my_sequence_name') 2
name') the sequence.
nullif( a, b) Return null if a = b, else nullif(1+1, 2) NULL
return a. Equivalent to CASE
WHEN a = b THEN NULL
ELSE a END.
pg_typeof( Returns the lower case name pg_typeof('abc') varchar
expression) of the data type of the result
of the expression. For
PostgreSQL compatibility.

502
DuckDB Documentation

Function Description Example Result

read_blob( source) Returns the content from read_blob('hello.bin') hello\x0A


source (a filename, a list of
filenames, or a glob pattern)
as a BLOB. See the read_
blob guide for more details.
read_text( source) Returns the content from read_text('hello.txt') hello\n
source (a filename, a list of
filenames, or a glob pattern)
as a VARCHAR. The file
content is first validated to
be valid UTF‑8. If read_
text attempts to read a file
with invalid UTF‑8 an error is
thrown suggesting to use
read_blob instead. See
the read_text guide for
more details.
repeat_row( varargs, Returns a table with num_ repeat_row(1, 2, 'foo', 3 rows of 1, 2, 'foo'
num_rows) rows rows, each containing num_rows = 3)
the fields defined in
varargs
sha256( value) Returns a VARCHAR with the sha-256(' ') d7a5...
SHA‑256 hash of the value
stats( expression) Returns a string with stats(5) '[Min: 5, Max:
statistics about the 5][Has Null: false]'
expression. Expression can
be a column, constant, or
SQL expression.
txid_current() Returns the current txid_current() various
transaction's ID (a BIGINT).
It will assign a new one if the
current transaction does not
have one already.
typeof( expression) Returns the name of the data typeof('abc') VARCHAR
type of the result of the
expression.
uuid() Return a random UUID uuid() various
similar to this:
eeccb8c5-9943-b2bb-
bb5e-222f4e14b687.
version() Return the currently active version() various
version of DuckDB in this
format

Utility Table Functions

A table function is used in place of a table in a FROM clause.

503
DuckDB Documentation

Function Description Example

glob( search_path) Return filenames found at the location indicated by the glob('*')
search_path in a single column named file. The search_
path may contain glob pattern matching syntax.

Aggregate Functions

Examples

-- produce a single row containing the sum of the "amount" column


SELECT sum(amount) FROM sales;
-- produce one row per unique region, containing the sum of "amount" for each group
SELECT region, sum(amount) FROM sales GROUP BY region;
-- return only the regions that have a sum of "amount" higher than 100
SELECT region FROM sales GROUP BY region HAVING sum(amount) > 100;
-- return the number of unique values in the "region" column
SELECT count(DISTINCT region) FROM sales;
-- return two values, the total sum of "amount" and the sum of "amount" minus columns where the region is
"north"
SELECT sum(amount), sum(amount) FILTER (region != 'north') FROM sales;
-- returns a list of all regions in order of the "amount" column
SELECT list(region ORDER BY amount DESC) FROM sales;
-- returns the amount of the first sale using the first() aggregate function
SELECT first(amount ORDER BY date ASC) FROM sales;

Syntax

Aggregates are functions that combine multiple rows into a single value. Aggregates are different from scalar functions and window func‑
tions because they change the cardinality of the result. As such, aggregates can only be used in the SELECT and HAVING clauses of a SQL
query.

DISTINCT Clause in Aggregate Functions When the DISTINCT clause is provided, only distinct values are considered in the compu‑
tation of the aggregate. This is typically used in combination with the count aggregate to get the number of distinct elements; but it can
be used together with any aggregate function in the system.

ORDER BY Clause in Aggregate Functions An ORDER BY clause can be provided after the last argument of the function call. Note the
lack of the comma separator before the clause.

SELECT aggregate_function ( arg , sep ORDER BY ordering_criteria );

This clause ensures that the values being aggregated are sorted before applying the function. Most aggregate functions are order‑
insensitive, therefore, this clause is parsed and applied, which is inefficient, but has on effect on the results. However, there are some
order‑sensitive aggregates that can have non‑deterministic results without ordering, e.g., first, last, list and string_agg /
group_concat / listagg. These can be made deterministic by ordering the arguments.

For example:

CREATE TABLE tbl AS SELECT s FROM range(1, 4) r(s);


SELECT string_agg(s, ', ' ORDER BY s DESC) AS countdown FROM tbl;

┌───────────┐
│ countdown │
│ varchar │

504
DuckDB Documentation

├───────────┤
│ 3, 2, 1 │
└───────────┘

General Aggregate Functions

The table below shows the available general aggregate functions.

Function Description Example Alias(es)

any_value(arg) Returns the first non‑null value from any_value(A)


arg. This function is affected by
ordering.
arbitrary(arg) Returns the first value (null or non‑null) arbitrary(A) first(A)
from arg. This function is affected by
ordering.
arg_max(arg, val) Finds the row with the maximum val. arg_max(A, B) argMax(arg, val),
Calculates the arg expression at that max_by(arg, val)
row. This function is affected by
ordering.
arg_min(arg, val) Finds the row with the minimum val. arg_min(A, B) argMin(arg, val),
Calculates the arg expression at that min_by(arg, val)
row. This function is affected by
ordering.
avg(arg) Calculates the average value for all avg(A) mean
tuples in arg.
bit_and(arg) Returns the bitwise AND of all bits in a bit_and(A) ‑
given expression.
bit_or(arg) Returns the bitwise OR of all bits in a bit_or(A) ‑
given expression.
bit_xor(arg) Returns the bitwise XOR of all bits in a bit_xor(A) ‑
given expression.
bitstring_agg(arg) Returns a bitstring with bits set for bitstring_agg(A) ‑
each distinct value.
bool_and(arg) Returns true if every input value is bool_and(A) ‑
true, otherwise false.
bool_or(arg) Returns true if any input value is bool_or(A) ‑
true, otherwise false.
count(arg) Calculates the number of tuples in count(A) ‑
arg.
favg(arg) Calculates the average using a more favg(A) ‑
accurate floating point summation
(Kahan Sum).
first(arg) Returns the first value (null or non‑null) first(A) arbitrary(A)
from arg. This function is affected by
ordering.
fsum(arg) Calculates the sum using a more fsum(A) sumKahan, kahan_sum
accurate floating point summation
(Kahan Sum).

505
DuckDB Documentation

Function Description Example Alias(es)

geomean(arg) Calculates the geometric mean for all geomean(A) geometric_mean(A)


tuples in arg.
histogram(arg) Returns a MAP of key‑value pairs histogram(A) ‑
representing buckets and counts.
last(arg) Returns the last value of a column. This last(A) ‑
function is affected by ordering.
list(arg) Returns a LIST containing all the list(A) array_agg
values of a column. This function is
affected by ordering.
max(arg) Returns the maximum value present in max(A) ‑
arg.
max_by(arg, val) Finds the row with the maximum val. max_by(A, B) argMax(arg, val),
Calculates the arg expression at that arg_max(arg, val)
row. This function is affected by
ordering.
min(arg) Returns the minimum value present in min(A) ‑
arg.
min_by(arg, val) Finds the row with the minimum val. min_by(A, B) argMin(arg, val),
Calculates the arg expression at that arg_min(arg, val)
row. This function is affected by
ordering.
product(arg) Calculates the product of all tuples in product(A) ‑
arg.
string_agg(arg, sep) Concatenates the column string values string_agg(S, ',') group_concat(arg,
with a separator. This function is sep), listagg(arg,
affected by ordering. sep)
sum(arg) Calculates the sum value for all tuples sum(A) ‑
in arg.
sum_no_overflow(arg) Calculates the sum value for all tuples sum_no_overflow(A) ‑
in arg without overflow checks.
Unlike sum, which works on
floating‑point values, sum_no_
overflow only accepts INTEGER
and DECIMAL values.

Approximate Aggregates

The table below shows the available approximate aggregate functions.

Function Description Example

approx_count_distinct(x) Gives the approximate count of distinct approx_count_distinct(A)


elements using HyperLogLog.
approx_quantile(x, pos) Gives the approximate quantile using approx_quantile(A, 0.5)
T‑Digest.

506
DuckDB Documentation

Function Description Example

reservoir_quantile(x, Gives the approximate quantile using reservoir_quantile(A, 0.5,


quantile, sample_size = 8192) reservoir sampling, the sample size is 1024)
optional and uses 8192 as a default size.

Statistical Aggregates

The table below shows the available statistical aggregate functions.

Function Description Formula Alias

corr(y, x) Returns the correlation coefficient for covar_pop(y, x) / ‑


non‑null pairs in a group. (stddev_pop(x) *
stddev_pop(y))
covar_pop(y, x) Returns the population covariance of (sum(x*y) - sum(x) * ‑
input values. sum(y) / count(*)) /
count(*)
covar_samp(y, x) Returns the sample covariance for (sum(x*y) - sum(x) * ‑
non‑null pairs in a group. sum(y) / count(*)) /
(count(*) - 1)
entropy(x) Returns the log‑2 entropy of count ‑ ‑
input‑values.
kurtosis_pop(x) Returns the excess kurtosis (Fisher's ‑ ‑
definition) of all input values. Bias
correction is not applied.
kurtosis(x) Returns the excess kurtosis (Fisher's ‑ ‑
definition) of all input values, with a bias
correction according to the sample size.
mad(x) Returns the median absolute deviation for median(abs(x - ‑
the values within x. NULL values are median(x)))
ignored. Temporal types return a positive
INTERVAL.
median(x) Returns the middle value of the set. NULL quantile_cont(x, 0.5) ‑
values are ignored. For even value counts,
quantitative values are averaged and
ordinal values return the lower value.
mode(x) Returns the most frequent value for the ‑ ‑
values within x. NULL values are ignored.
quantile_cont(x, pos) Returns the interpolated quantile number ‑ ‑
between 0 and 1 . If pos is a LIST of
FLOATs, then the result is a LIST of the
corresponding interpolated quantiles.
quantile_disc(x, pos) Returns the exact quantile number ‑ quantile
between 0 and 1 . If pos is a LIST of
FLOATs, then the result is a LIST of the
corresponding exact quantiles.

507
DuckDB Documentation

Function Description Formula Alias

regr_avgx(y, x) Returns the average of the independent ‑ ‑


variable for non‑null pairs in a group,
where x is the independent variable and y
is the dependent variable.
regr_avgy(y, x) Returns the average of the dependent ‑ ‑
variable for non‑null pairs in a group,
where x is the independent variable and y
is the dependent variable.
regr_count(y, x) Returns the number of non‑null number (sum(x*y) - sum(x) * ‑
pairs in a group. sum(y) / count(*)) /
count(*)
regr_intercept(y, x) Returns the intercept of the univariate avg(y) - regr_slope(y, ‑
linear regression line for non‑null pairs in x) * avg(x)
a group.
regr_r2(y, x) Returns the coefficient of determination ‑ ‑
for non‑null pairs in a group.
regr_slope(y, x) Returns the slope of the linear regression covar_pop(x, y) / var_ ‑
line for non‑null pairs in a group. pop(x)
regr_sxx(y, x) ‑ regr_count(y, x) * ‑
var_pop(x)
regr_sxy(y, x) Returns the population covariance of regr_count(y, x) * ‑
input values. covar_pop(y, x)
regr_syy(y, x) ‑ regr_count(y, x) * ‑
var_pop(y)
skewness(x) Returns the skewness of all input values. ‑ ‑
stddev_pop(x) Returns the population standard sqrt(var_pop(x)) ‑
deviation.
stddev_samp(x) Returns the sample standard deviation. sqrt(var_samp(x)) stddev(x)
var_pop(x) Returns the population variance. ‑ ‑
var_samp(x) Returns the sample variance of all input (sum(x^2) - sum(x)^2 / variance(arg,
values. count(x)) / (count(x) val)
- 1)

Ordered Set Aggregate Functions

The table below shows the available ”ordered set” aggregate functions. These functions are specified using the WITHIN GROUP (ORDER
BY sort_expression) syntax, and they are converted to an equivalent aggregate function that takes the ordering expression as the
first argument.

Function Equivalent

mode() WITHIN GROUP (ORDER BY sort_expression) mode(sort_expression)


percentile_cont(fraction) WITHIN GROUP (ORDER quantile_cont(sort_expression, fraction)
BY sort_expression)

508
DuckDB Documentation

Function Equivalent

percentile_cont(fractions) WITHIN GROUP (ORDER quantile_cont(sort_expression, fractions)


BY sort_expression)
percentile_disc(fraction) WITHIN GROUP (ORDER quantile_disc(sort_expression, fraction)
BY sort_expression)
percentile_disc(fractions) WITHIN GROUP (ORDER quantile_disc(sort_expression, fractions)
BY sort_expression)

Miscellaneous Aggregate Functions

Function Description Alias

grouping() For queries with GROUP BY and either ROLLUP or grouping_id()


GROUPING SETS: Returns an integer identifying
which of the argument expressions where used to
group on to create the current supper‑aggregate
row.

Constraints

In SQL, constraints can be specified for tables. Constraints enforce certain properties over data that is inserted into a table. Constraints can
be specified along with the schema of the table as part of the CREATE TABLE statement. In certain cases, constraints can also be added
to a table using the ALTER TABLE statement, but this is not currently supported for all constraints.

Note. Warning Constraints have a strong impact on performance: they slow down loading and updates but speed up certain queries.
Please consult the Performance Guide for details.

Syntax

Check Constraint

Check constraints allow you to specify an arbitrary boolean expression. Any columns that do not satisfy this expression violate the con‑
straint. For example, we could enforce that the name column does not contain spaces using the following CHECK constraint.

CREATE TABLE students (name VARCHAR CHECK (NOT contains(name, ' ')));
INSERT INTO students VALUES ('this name contains spaces');
-- Constraint Error: CHECK constraint failed: students

Not Null Constraint

A not‑null constraint specifies that the column cannot contain any NULL values. By default, all columns in tables are nullable. Adding NOT
NULL to a column definition enforces that a column cannot contain NULL values.

CREATE TABLE students (name VARCHAR NOT NULL);


INSERT INTO students VALUES (NULL);
-- Constraint Error: NOT NULL constraint failed: students.name

509
DuckDB Documentation

Primary Key and Unique Constraint

Primary key or unique constraints define a column, or set of columns, that are a unique identifier for a row in the table. The constraint
enforces that the specified columns are unique within a table, i.e., that at most one row contains the given values for the set of columns.

CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR);


INSERT INTO students VALUES (1, 'Student 1');
INSERT INTO students VALUES (1, 'Student 2');
-- Constraint Error: Duplicate key "id: 1" violates primary key constraint

In order to enforce this property efficiently, an ART index is automatically created for every primary key or unique constraint that is defined
in the table.

Primary key constraints and unique constraints are identical except for two points:

• A table can only have one primary key constraint defined, but many unique constraints
• A primary key constraint also enforces the keys to not be NULL.

Note. Warning Indexes have certain limitations that might result in constraints being evaluated too eagerly, see the indexes section
for more details.

Foreign Keys

Foreign keys define a column, or set of columns, that refer to a primary key or unique constraint from another table. The constraint enforces
that the key exists in the other table.

CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR);


CREATE TABLE exams (exam_id INTEGER REFERENCES students(id), grade INTEGER);
INSERT INTO students VALUES (1, 'Student 1');
INSERT INTO exams VALUES (1, 10);
INSERT INTO exams VALUES (2, 10);
-- Constraint Error: Violates foreign key constraint because key "id: 2" does not exist in the
referenced table

In order to enforce this property efficiently, an ART index is automatically created for every foreign key constraint that is defined in the
table.
Note. Warning Indexes have certain limitations that might result in constraints being evaluated too eagerly, see the indexes section
for more details.

Indexes

Index Types

DuckDB currently uses two index types:

• A min‑max index (also known as zonemap and block range index) is automatically created for columns of all general‑purpose data
types.
• An Adaptive Radix Tree (ART) is mainly used to ensure primary key constraints and to speed up point and very highly selective (i.e., <
0.1%) queries. Such an index is automatically created for columns with a UNIQUE or PRIMARY KEY constraint and can be defined
using CREATE INDEX.

Note. Warning ART indexes must currently be able to fit in‑memory. Avoid creating ART indexes if the index does not fit in memory.

Persistence

Both min‑max indexes and ART indexes are persisted on disk.

510
DuckDB Documentation

CREATE INDEX and DROP INDEX

To create an index, use the CREATE INDEX statement. To drop an index, use the DROP INDEX statement.

Index Limitations

ART indexes create a secondary copy of the data in a second location ‑ this complicates processing, particularly when combined with trans‑
actions. Certain limitations apply when it comes to modifying data that is also stored in secondary indexes.

Note. As expected, indexes have a strong effect on performance, slowing down loading and updates, but speeding up certain
queries. Please consult the Performance Guide for details.

Updates Become Deletes and Inserts When an update statement is executed on a column that is present in an index ‑ the statement is
transformed into a delete of the original row followed by an insert. This has certain performance implications, particularly for wide tables,
as entire rows are rewritten instead of only the affected columns.

Over‑Eager Unique Constraint Checking Due to the presence of transactions, data can only be removed from the index after (1) the
transaction that performed the delete is committed, and (2) no further transactions exist that refer to the old entry still present in the index.
As a result of this ‑ transactions that perform deletions followed by insertions may trigger unexpected unique constraint violations, as the
deleted tuple has not actually been removed from the index yet. For example:

CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR);


INSERT INTO students VALUES (1, 'Student 1');
BEGIN;
DELETE FROM students WHERE id = 1;
INSERT INTO students VALUES (1, 'Student 2');
-- Constraint Error: Duplicate key "id: 1" violates primary key constraint

This, combined with the fact that updates are turned into deletions and insertions within the same transaction, means that updating rows
in the presence of unique or primary key constraints can often lead to unexpected unique constraint violations.

CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR);


INSERT INTO students VALUES (1, 'Student 1');
UPDATE students SET name = 'Student 2', id = 1 WHERE id = 1;
-- Constraint Error: Duplicate key "id: 1" violates primary key constraint

Currently, this is an expected limitation of the system ‑ although we aim to resolve this in the future.

Information Schema

The views in the information_schema are SQL‑standard views that describe the catalog entries of the database. These views can be
filtered to obtain information about a specific column or table.

Database, Catalog and Schema

The top level catalog view is information_schema.schemata. It lists the catalogs and the schemas present in the database and has
the following layout:

Column Description Type Example

catalog_name Name of the database that the schema is VARCHAR NULL


contained in.

511
DuckDB Documentation

Column Description Type Example

schema_name Name of the schema. VARCHAR 'main'


schema_owner Name of the owner of the schema. Not yet VARCHAR NULL
implemented.
default_character_set_ Applies to a feature not available in DuckDB. VARCHAR NULL
catalog
default_character_set_ Applies to a feature not available in DuckDB. VARCHAR NULL
schema
default_character_set_ Applies to a feature not available in DuckDB. VARCHAR NULL
name
sql_path The file system location of the database. VARCHAR NULL
Currently unimplemented.

Tables and Views

The view that describes the catalog information for tables and views is information_schema.tables. It lists the tables present in
the database and has the following layout:

Column Description Type Example

table_catalog The catalog the table or view belongs to. VARCHAR NULL
table_schema The schema the table or view belongs to. VARCHAR 'main'
table_name The name of the table or view. VARCHAR 'widgets'
table_type The type of table. One of: BASE TABLE, VARCHAR 'BASE TABLE'
LOCAL TEMPORARY, VIEW.
self_referencing_column_ Applies to a feature not available in DuckDB. VARCHAR NULL
name
reference_generation Applies to a feature not available in DuckDB. VARCHAR NULL
user_defined_type_ If the table is a typed table, the name of the VARCHAR NULL
catalog database that contains the underlying data
type (always the current database), else null.
Currently unimplemented.
user_defined_type_schema If the table is a typed table, the name of the VARCHAR NULL
schema that contains the underlying data
type, else null. Currently unimplemented.
user_defined_type_name If the table is a typed table, the name of the VARCHAR NULL
underlying data type, else null. Currently
unimplemented.
is_insertable_into YES if the table is insertable into, NO if not VARCHAR 'YES'
(Base tables are always insertable into, views
not necessarily.)
is_typed YES if the table is a typed table, NO if not. VARCHAR 'NO'
commit_action Not yet implemented. VARCHAR 'NO'

512
DuckDB Documentation

Columns

The view that describes the catalog information for columns is information_schema.columns. It lists the column present in the
database and has the following layout:

Column Description Type Example

table_catalog Name of the database containing the table. VARCHAR NULL


table_schema Name of the schema containing the table. VARCHAR 'main'
table_name Name of the table. VARCHAR 'widgets'
column_name Name of the column. VARCHAR 'price'
ordinal_position Ordinal position of the column within the INTEGER 5
table (count starts at 1).
column_default Default expression of the column. VARCHAR 1.99
is_nullable YES if the column is possibly nullable, NO if it VARCHAR 'YES'
is known not nullable.
data_type Data type of the column. VARCHAR 'DECIMAL(18, 2)'
character_maximum_length If data_type identifies a character or bit INTEGER 255
string type, the declared maximum length;
null for all other data types or if no maximum
length was declared.
character_octet_length If data_type identifies a character type, the INTEGER 1073741824
maximum possible length in octets (bytes) of
a datum; null for all other data types. The
maximum octet length depends on the
declared character maximum length (see
above) and the character encoding.
numeric_precision If data_type identifies a numeric type, this INTEGER 18
column contains the (declared or implicit)
precision of the type for this column. The
precision indicates the number of significant
digits. For all other data types, this column is
null.
numeric_scale If data_type identifies a numeric type, this INTEGER 2
column contains the (declared or implicit)
scale of the type for this column. The precision
indicates the number of significant digits. For
all other data types, this column is null.
datetime_precision If data_type identifies a date, time, timestamp, INTEGER 0
or interval type, this column contains the
(declared or implicit) fractional seconds
precision of the type for this column, that is,
the number of decimal digits maintained
following the decimal point in the seconds
value. No fractional seconds are currently
supported in DuckDB. For all other data types,
this column is null.

513
DuckDB Documentation

Catalog Functions

Several functions are also provided to see details about the catalogs and schemas that are configured in the database.

Function Description Example Result

current_catalog() Return the name of the currently active current_catalog() 'memory'


catalog. Default is memory.
current_schema() Return the name of the currently active current_schema() 'main'
schema. Default is main.
current_ Return list of schemas. Pass a current_ ['temp', 'main',
schemas(boolean) parameter of true to include implicit schemas(true) 'pg_catalog']
schemas.

DuckDB_% Metadata Functions

DuckDB offers a collection of table functions that provide metadata about the current database. These functions reside in the main schema
and their names are prefixed with duckdb_.

The resultset returned by a duckdb_ table function may be used just like an ordinary table or view. For example, you can use a duckdb_
function call in the FROM clause of a SELECT statement, and you may refer to the columns of its returned resultset elsewhere in the state‑
ment, for example in the WHERE clause.

Table functions are still functions, and you should write parenthesis after the function name to call it to obtain its returned resultset:

SELECT * FROM duckdb_settings();

Alternatively, you may execute table functions also using the CALL‑syntax:

CALL duckdb_settings();

In this case too, the parentheses are mandatory.

Note. For some of the duckdb_% functions, there is also an identically named view available, which also resides in the main
schema. Typically, these views do a SELECT on the duckdb_ table function with the same name, while filtering out those objects
that are marked as internal. We mention it here, because if you accidentally omit the parentheses in your duckdb_ table function
call, you might still get a result, but from the identically named view.

Example:

-- duckdb_views table function: returns all views, including those marked internal
SELECT * FROM duckdb_views();
-- duckdb_views view: returns views that are not marked as internal
SELECT * FROM duckdb_views;

duckdb_columns

The duckdb_columns() function provides metadata about the columns available in the DuckDB instance.

Column Description Type

database_name The name of the database that contains the column object. VARCHAR
database_oid Internal identifier of the database that contains the column BIGINT
object.

514
DuckDB Documentation

Column Description Type

schema_name The SQL name of the schema that contains the table object that VARCHAR
defines this column.
schema_oid Internal identifier of the schema object that contains the table of BIGINT
the column.
table_name The SQL name of the table that defines the column. VARCHAR
table_oid Internal identifier (name) of the table object that defines the BIGINT
column.
column_name The SQL name of the column. VARCHAR
column_index The unique position of the column within its table. INTEGER
internal true if this column built‑in, false if it is user‑defined. BOOLEAN
column_default The default value of the column (expressed in SQL) VARCHAR
is_nullable true if the column can hold NULL values; false if the column BOOLEAN
cannot hold NULL‑values.
data_type The name of the column datatype. VARCHAR
data_type_id The internal identifier of the column data type BIGINT
character_maximum_ Always NULL. DuckDB text types do not enforce a value length INTEGER
length restriction based on a length type parameter.
numeric_precision The number of units (in the base indicated by numeric_ INTEGER
precision_radix) used for storing column values. For
integral and approximate numeric types, this is the number of
bits. For decimal types, this is the number of digits positions.
numeric_precision_ The number‑base of the units in the numeric_precision INTEGER
radix column. For integral and approximate numeric types, this is 2,
indicating the precision is expressed as a number of bits. For the
decimal type this is 10, indicating the precision is expressed as
a number of decimal positions.
numeric_scale Applicable to decimal type. Indicates the maximum number of INTEGER
fractional digits (i.e., the number of digits that may appear after
the decimal separator).

The information_schema.columns system view provides a more standardized way to obtain metadata about database columns, but
the duckdb_columns function also returns metadata about DuckDB internal objects. (In fact, information_schema.columns is
implemented as a query on top of duckdb_columns())

duckdb_constraints

The duckdb_constraints() function provides metadata about the constraints available in the DuckDB instance.

Column Description Type

database_name The name of the database that contains the constraint. VARCHAR
database_oid Internal identifier of the database that contains the constraint. BIGINT
schema_name The SQL name of the schema that contains the table on which the VARCHAR
constraint is defined.

515
DuckDB Documentation

Column Description Type

schema_oid Internal identifier of the schema object that contains the table on BIGINT
which the constraint is defined.
table_name The SQL name of the table on which the constraint is defined. VARCHAR
table_oid Internal identifier (name) of the table object on which the BIGINT
constraint is defined.
constraint_index Indicates the position of the constraint as it appears in its table BIGINT
definition.
constraint_type Indicates the type of constraint. Applicable values are CHECK, VARCHAR
FOREIGN KEY, PRIMARY KEY, NOT NULL, UNIQUE.
constraint_text The definition of the constraint expressed as a SQL‑phrase. (Not VARCHAR
necessarily a complete or syntactically valid DDL‑statement.)
expression If constraint is a check constraint, the definition of the condition VARCHAR
being checked, otherwise NULL.
constraint_column_ An array of table column indexes referring to the columns that BIGINT[]
indexes appear in the constraint definition
constraint_column_ An array of table column names appearing in the constraint VARCHAR[]
names definition

duckdb_databases

The duckdb_databases() function lists the databases that are accessible from within the current DuckDB process. Apart from the
database associated at startup, the list also includes databases that were attached later on to the duckdb process

Column Description Type

database_name The name of the database, or the alias if the database was VARCHAR
attached using an ALIAS‑clause.
database_oid The internal identifier of the database. VARCHAR
path The file path associated with the database. VARCHAR
internal true indicates a system or built‑in database. False indicates a BOOLEAN
user‑defined database.
type The type indicates the type of RDBMS implemented by the VARCHAR
attached database. For DuckDB databases, that value is duckdb.

duckdb_dependencies

The duckdb_dependencies() function provides metadata about the dependencies available in the DuckDB instance.

Column Description Type

classid Always 0 BIGINT


objid The internal id of the object. BIGINT
objsubid Always 0 INTEGER
refclassid Always 0 BIGINT

516
DuckDB Documentation

Column Description Type

refobjid The internal id of the dependent object. BIGINT


refobjsubid Always 0 INTEGER
deptype The type of dependency. Either regular (n) or automatic (a). VARCHAR

duckdb_extensions

The duckdb_extensions() function provides metadata about the extensions available in the DuckDB instance.

Column Description Type

extension_name The name of the extension. VARCHAR


loaded true if the extension is loaded, false if it's not loaded. BOOLEAN
installed true if the extension is installed, false if it's not installed. BOOLEAN
install_path (BUILT-IN) if the extension is built‑in, otherwise, the filesystem path VARCHAR
where binary that implements the extension resides.
description Human readable text that describes the extension's functionality. VARCHAR
aliases List of alternative names for this extension. VARCHAR[]

duckdb_functions

The duckdb_functions() function provides metadata about the functions (including macros) available in the DuckDB instance.

Column Description Type

database_name The name of the database that contains this function. VARCHAR
schema_name The SQL name of the schema where the function resides. VARCHAR
function_name The SQL name of the function. VARCHAR
function_type The function kind. Value is one of: VARCHAR
table,scalar,aggregate,pragma,macro
description Description of this function (always NULL) VARCHAR
return_type The logical data type name of the returned value. Applicable for VARCHAR
scalar and aggregate functions.
parameters If the function has parameters, the list of parameter names. VARCHAR[]
parameter_types If the function has parameters, a list of logical data type names VARCHAR[]
corresponding to the parameter list.
varargs The name of the data type in case the function has a variable VARCHAR
number of arguments, or NULL if the function does not have a
variable number of arguments.
macro_definition If this is a macro, the SQL expression that defines it. VARCHAR
has_side_effects false if this is a pure function. true if this function changes the BOOLEAN
database state (like sequence functions nextval() and
curval()).
function_oid The internal identifier for this function BIGINT

517
DuckDB Documentation

duckdb_indexes

The duckdb_indexes() function provides metadata about secondary indexes available in the DuckDB instance.

Column Description Type

database_name The name of the database that contains this index. VARCHAR
database_oid Internal identifier of the database containing the index. BIGINT
schema_name The SQL name of the schema that contains the table with the VARCHAR
secondary index.
schema_oid Internal identifier of the schema object. BIGINT
index_name The SQL name of this secondary index VARCHAR
index_oid The object identifier of this index. BIGINT
table_name The name of the table with the index VARCHAR
table_oid Internal identifier (name) of the table object. BIGINT
is_unique true if the index was created with the UNIQUE modifier, false BOOLEAN
if it was not.
is_primary Always false BOOLEAN
expressions Always NULL VARCHAR
sql The definition of the index, expressed as a CREATE INDEX SQL VARCHAR
statement.

Note that duckdb_indexes only provides metadata about secondary indexes ‑ i.e., those indexes created by explicit CREATE IN-
DEX statements. Primary keys, foreign keys, and UNIQUE constraints are maintained using indexes, but their details are included in the
duckdb_constraints() function.

duckdb_keywords

The duckdb_keywords() function provides metadata about DuckDB's keywords and reserved words.

Column Description Type

keyword_name The keyword. VARCHAR


keyword_category Indicates the category of the keyword. Values are column_ VARCHAR
name, reserved, type_function and unreserved.

duckdb_memory

The duckdb_memory() function provides metadata about DuckDB's buffer manager.

Column Description Type

tag The memory tag. It has one of the following values: BASE_ VARCHAR
TABLE, HASH_TABLE, PARQUET_READER, CSV_READER,
ORDER_BY, ART_INDEX, COLUMN_DATA, METADATA,
OVERFLOW_STRINGS, IN_MEMORY_TABLE, ALLOCATOR,
EXTENSION.

518
DuckDB Documentation

Column Description Type

memory_usage_bytes The memory used (in bytes). BIGINT


temporary_storage_ The disk storage used (in bytes). BIGINT
bytes

duckdb_optimizers

The duckdb_optimizers() function provides metadata about the optimization rules (e.g., expression_rewriter, filter_
pushdown) available in the DuckDB instance. These can be selectively turned off using PRAGMA disabled_optimizers.

Column Description Type

name The name of the optimization rule. VARCHAR

duckdb_schemas

The duckdb_schemas() function provides metadata about the schemas available in the DuckDB instance.

Column Description Type

oid Internal identifier of the schema object. BIGINT


database_name The name of the database that contains this schema. VARCHAR
database_oid Internal identifier of the database containing the schema. BIGINT
schema_name The SQL name of the schema. VARCHAR
internal true if this is an internal (built‑in) schema, false if this is a BOOLEAN
user‑defined schema.
sql Always NULL VARCHAR

The information_schema.schemata system view provides a more standardized way to obtain metadata about database schemas.

duckdb_secrets

The duckdb_secrets() function provides metadata about the secrets available in the DuckDB instance.

Column Description Type

name The name of the secret. VARCHAR


type The type of the secret, e.g., S3, GCS, R2, AZURE. VARCHAR
provider The provider of the secret. VARCHAR
persistent Denotes whether the secret is persisent. BOOLEAN
storage The backend for storing the secret. VARCHAR
scope The scope of the secret. VARCHAR[]
secret_string Returns the content of the secret as a string. Sensitive pieces of VARCHAR
information, e.g., they access key, are redacted.

519
DuckDB Documentation

duckdb_sequences

The duckdb_sequences() function provides metadata about the sequences available in the DuckDB instance.

Column Description Type

database_name The name of the database that contains this sequence VARCHAR
database_oid Internal identifier of the database containing the sequence. BIGINT
schema_name The SQL name of the schema that contains the sequence object. VARCHAR
schema_oid Internal identifier of the schema object that contains the BIGINT
sequence object.
sequence_name The SQL name that identifies the sequence within the schema. VARCHAR
sequence_oid The internal identifier of this sequence object. BIGINT
temporary Whether this sequence is temporary. Temporary sequences are BOOLEAN
transient and only visible within the current connection.
start_value The initial value of the sequence. This value will be returned BIGINT
when nextval() is called for the very first time on this
sequence.
min_value The minimum value of the sequence. BIGINT
max_value The maximum value of the sequence. BIGINT
increment_by The value that is added to the current value of the sequence to BIGINT
draw the next value from the sequence.
cycle Whether the sequence should start over when drawing the next BOOLEAN
value would result in a value outside the range.
last_value null if no value was ever drawn from the sequence using BIGINT
nextval(...). 1 if a value was drawn.
sql The definition of this object, expressed as SQL DDL‑statement. VARCHAR

Attributes like temporary, start_value etc. correspond to the various options available in the CREATE SEQUENCE statement and
are documented there in full. Note that the attributes will always be filled out in the duckdb_sequences resultset, even if they were not
explicitly specified in the CREATE SEQUENCE statement.

Note.

1. The column name last_value suggests that it contains the last value that was drawn from the sequence, but that is not
the case. It's either null if a value was never drawn from the sequence, or 1 (when there was a value drawn, ever, from the
sequence).

2. If the sequence cycles, then the sequence will start over from the boundary of its range, not necessarily from the value specified
as start value.

duckdb_settings

The duckdb_settings() function provides metadata about the settings available in the DuckDB instance.

Column Description Type

name Name of the setting. VARCHAR


value Current value of the setting. VARCHAR

520
DuckDB Documentation

Column Description Type

description A description of the setting. VARCHAR


input_type The logical datatype of the setting's value. VARCHAR

The various settings are described in the configuration page.

duckdb_tables

The duckdb_tables() function provides metadata about the base tables available in the DuckDB instance.

Column Description Type

database_name The name of the database that contains this table VARCHAR
database_oid Internal identifier of the database containing the table. BIGINT
schema_name The SQL name of the schema that contains the base table. VARCHAR
schema_oid Internal identifier of the schema object that contains the base BIGINT
table.
table_name The SQL name of the base table. VARCHAR
table_oid Internal identifier of the base table object. BIGINT
internal false if this is a user‑defined table. BOOLEAN
temporary Whether this is a temporary table. Temporary tables are not BOOLEAN
persisted and only visible within the current connection.
has_primary_key true if this table object defines a PRIMARY KEY. BOOLEAN
estimated_size The estimated number of rows in the table. BIGINT
column_count The number of columns defined by this object BIGINT
index_count The number of indexes associated with this table. This number BIGINT
includes all secondary indexes, as well as internal indexes
generated to maintain PRIMARY KEY and/or UNIQUE
constraints.
check_constraint_count The number of check constraints active on columns within the BIGINT
table.
sql The definition of this object, expressed as SQL CREATE VARCHAR
TABLE‑statement.

The information_schema.tables system view provides a more standardized way to obtain metadata about database tables that
also includes views. But the resultset returned by duckdb_tables contains a few columns that are not included in information_
schema.tables.

duckdb_types

The duckdb_types() function provides metadata about the data types available in the DuckDB instance.

Column Description Type

database_name The name of the database that contains this schema. VARCHAR

521
DuckDB Documentation

Column Description Type

database_oid Internal identifier of the database that contains the data type. BIGINT
schema_name The SQL name of the schema containing the type definition. VARCHAR
Always main.
schema_oid Internal identifier of the schema object. BIGINT
type_name The name or alias of this data type. VARCHAR
type_oid The internal identifier of the data type object. If NULL, then this BIGINT
is an alias of the type (as identified by the value in the logical_
type column).
type_size The number of bytes required to represent a value of this type in BIGINT
memory.
logical_type The 'canonical' name of this data type. The same logical_ VARCHAR
type may be referenced by several types having different type_
names.
type_category The category to which this type belongs. Data types within the VARCHAR
same category generally expose similar behavior when values of
this type are used in expression. For example, the NUMERIC
type_category includes integers, decimals, and floating point
numbers.
internal Whether this is an internal (built‑in) or a user object. BOOLEAN

duckdb_views

The duckdb_views() function provides metadata about the views available in the DuckDB instance.

Column Description Type

database_name The name of the database that contains this view VARCHAR
database_oid Internal identifier of the database that contains this view. BIGINT
schema_name The SQL name of the schema where the view resides. VARCHAR
schema_oid Internal identifier of the schema object that contains the view. BIGINT
view_name The SQL name of the view object. VARCHAR
view_oid The internal identifier of this view object. BIGINT
internal true if this is an internal (built‑in) view, false if this is a BOOLEAN
user‑defined view.
temporary true if this is a temporary view. Temporary views are not BOOLEAN
persistent and are only visible within the current connection.
column_count The number of columns defined by this view object. BIGINT
sql The definition of this object, expressed as SQL DDL‑statement. VARCHAR

The information_schema.tables system view provides a more standardized way to obtain metadata about database views that
also includes base tables. But the resultset returned by duckdb_views contains also definitions of internal view objects as well as a few
columns that are not included in information_schema.tables.

522
DuckDB Documentation

duckdb_temporary_files

The duckdb_temporary_files() function provides metadata about the temporary files DuckDB has written to disk, to offload data
from memory. This function mostly exists for debugging and testing purposes.

Column Description Type

path The name of the temporary file VARCHAR


size The size in bytes of the temporary file INT64

Keywords and Identifiers

Identifiers

Similarly to other SQL dialects and programming languages, identifiers in DuckDB's SQL are subject to several rules.

• Unquoted identifiers need to conform to a number of rules:

– They must not be a reserved keyword (see duckdb_keywords()), e.g., SELECT 123 AS SELECT will fail.
– They must not start with a number or special character, e.g., SELECT 123 AS 1col is invalid.
– They cannot contain whitespaces (including tabs and newline characters).

• Identifiers can be quoted using double‑quote characters ("). Quoted identifiers can use any keyword, whitespace or special character,
e.g., "SELECT" and " § ¶ " are valid identifiers.
• Quotes themselves can be escaped by repeating the quote character, e.g., to create an identifier named IDENTIFIER "X", use
"IDENTIFIER ""X""".

Deduplicating Identifiers In some cases, duplicate identifiers can occur, e.g., column names may conflict when unnesting a nested data
structure. In these cases, DuckDB automatically deduplicates column names by renaming them according to the following rules:

• For a column named name , the first instance is not renamed.


• Subsequent instances are renamed to name _ count , where count starts at 1.

For example:

SELECT *
FROM (SELECT UNNEST({'a': 42, 'b': {'a': 88, 'b': 99}}, recursive := true));

┌───────┬───────┬───────┐
│ a │ a_1 │ b │
│ int32 │ int32 │ int32 │
├───────┼───────┼───────┤
│ 42 │ 88 │ 99 │
└───────┴───────┴───────┘

Database Names

Database names are subject to the rules for identifiers.

Additionally, it is best practive to avoid DuckDB's two internal database schema names, system and temp. By default, persistent
databases are named after their filename without the extension. Therefore, the filenames system.db and temp.db (as well as
system.duckdb and temp.duckdb) result in the database names system and temp, respectively. If you need to attach to a database
that has one of these names, use an alias, e.g.:

ATTACH 'temp.db' AS temp2;


USE temp2;

523
DuckDB Documentation

Numeric Literals

DuckDB's SQL dialect allows using the underscore character _ in numeric literals as an optional separator. The rules for using underscores
are as follows:

• Underscores are allowed in integer, decimal, hexadecimal and binary notation.


• Underscores can not be the first or last character in a literal.
• Underscores have to have an integer/numeric part on either side of them, i.e., there can not be multiple underscores in a row and not
immediately before/after a decimal or exponent.

Examples

SELECT 100_000_000; -- 100000000


SELECT '0xFF_FF'::INT; -- 65535
SELECT 1_2.1_2E0_1; -- 121.2
SELECT '0b0_1_0_1'::INT; -- 5

Rules for Case‑Sensitivity

Keywords and Function Names SQL keywords and function names are case‑insensitive in DuckDB.

For example, the following two queries are equivalent:

select COS(Pi()) as CosineOfPi;


SELECT cos(pi()) AS CosineOfPi;

┌────────────┐
│ CosineOfPi │
│ double │
├────────────┤
│ -1.0 │
└────────────┘

Case‑Sensitivity of Identifiers Following the convention of the SQL standard, identifiers in DuckDB are case‑insensitive. However, each
character's case (uppercase/lowercase) is maintained as originally specified by the user even if a query uses different cases when referring
to the identifier. For example:

CREATE TABLE tbl AS SELECT cos(pi()) AS CosineOfPi;


SELECT cosineofpi FROM tbl;

┌────────────┐
│ CosineOfPi │
│ double │
├────────────┤
│ -1.0 │
└────────────┘

To change this behavior, set the preserve_identifier_case configuration option to false.

Handling Conflicts In case of a conflict, when the same identifier is spelt with different cases, one will be selected randomly. For exam‑
ple:

CREATE TABLE t1 (idfield INT, x INT);


CREATE TABLE t2 (IdField INT, y INT);
SELECT * FROM t1 NATURAL JOIN t2;

524
DuckDB Documentation

┌─────────┬───────┬───────┐
│ idfield │ x │ y │
│ int32 │ int32 │ int32 │
├─────────────────────────┤
│ 0 rows │
└─────────────────────────┘

Disabling Preserving Cases With the preserve_identifier_case configuration option set to false, all identifiers are turned
into lowercase:

SET preserve_identifier_case = false;


CREATE TABLE tbl AS SELECT cos(pi()) AS CosineOfPi;
SELECT CosineOfPi FROM tbl;

┌────────────┐
│ cosineofpi │
│ double │
├────────────┤
│ -1.0 │
└────────────┘

Samples

Samples are used to randomly select a subset of a dataset.

Examples
-- select a sample of 5 rows from "tbl" using reservoir sampling
SELECT * FROM tbl USING SAMPLE 5;
-- select a sample of 10% of the table using system sampling (cluster sampling)
SELECT * FROM tbl USING SAMPLE 10%;
-- select a sample of 10% of the table using bernoulli sampling
SELECT * FROM tbl USING SAMPLE 10 PERCENT (bernoulli);
-- select a sample of 50 rows of the table using reservoir sampling with a fixed seed (100)
SELECT * FROM tbl USING SAMPLE reservoir(50 ROWS) REPEATABLE (100);
-- select a sample of 20% of the table using system sampling with a fixed seed (377)
SELECT * FROM tbl USING SAMPLE 10% (system, 377);
-- select a sample of 10% of "tbl" BEFORE the join with tbl2
SELECT * FROM tbl TABLESAMPLE reservoir(20%), tbl2 WHERE tbl.i = tbl2.i;
-- select a sample of 10% of "tbl" AFTER the join with tbl2
SELECT * FROM tbl, tbl2 WHERE tbl.i = tbl2.i USING SAMPLE reservoir(20%);

Syntax Samples allow you to randomly extract a subset of a dataset. Samples are useful for exploring a dataset faster, as often you might
not be interested in the exact answers to queries, but only in rough indications of what the data looks like and what is in the data. Samples
allow you to get approximate answers to queries faster, as they reduce the amount of data that needs to pass through the query engine.

DuckDB supports three different types of sampling methods: reservoir, bernoulli and system. By default, DuckDB uses reser-
voir sampling when an exact number of rows is sampled, and system sampling when a percentage is specified. The sampling methods
are described in detail below.

Samples require a sample size, which is an indication of how many elements will be sampled from the total population. Samples can either
be given as a percentage (10%) or as a fixed number of rows (10 rows). All three sampling methods support sampling over a percentage,
but only reservoir sampling supports sampling a fixed number of rows.

Samples are probablistic, that is to say, samples can be different between runs unless the seed is specifically specified. Specifying the seed
only guarantees that the sample is the same if multi‑threading is not enabled (i.e., SET threads = 1). In the case of multiple threads
running over a sample, samples are not necessarily consistent even with a fixed seed.

525
DuckDB Documentation

reservoir Reservoir sampling is a stream sampling technique that selects a random sample by keeping a reservoir of size equal to
the sample size, and randomly replacing elements as more elements come in. Reservoir sampling allows us to specify exactly how many
elements we want in the resulting sample (by selecting the size of the reservoir). As a result, reservoir sampling always outputs the same
amount of elements, unlike system and bernoulli sampling.

Reservoir sampling is only recommended for small sample sizes, and is not recommended for use with percentages. That is because reser‑
voir sampling needs to materialize the entire sample and randomly replace tuples within the materialized sample. The larger the sample
size, the higher the performance hit incurred by this process.

Reservoir sampling also incurs an additional performance penalty when multi‑processing is used, since the reservoir is to be shared
amongst the different threads to ensure unbiased sampling. This is not a big problem when the reservoir is very small, but becomes costly
when the sample is large.

Note. Bestpractice Avoid using Reservoir Sample with large sample sizes if possible. Reservoir sampling requires the entire sample
to be materialized in memory.

bernoulli Bernoulli sampling can only be used when a sampling percentage is specified. It is rather straightforward: every tuple in the
underlying table is included with a chance equal to the specified percentage. As a result, bernoulli sampling can return a different number
of tuples even if the same percentage is specified. The amount of rows will generally be more or less equal to the specified percentage of
the table, but there will be some variance.

Because bernoulli sampling is completely independent (there is no shared state), there is no penalty for using bernoulli sampling together
with multiple threads.

system System sampling is a variant of bernoulli sampling with one crucial difference: every vector is included with a chance equal
to the sampling percentage. This is a form of cluster sampling. System sampling is more efficient than bernoulli sampling, as no per‑
tuple selections have to be performed. There is almost no extra overhead for using system sampling, whereas bernoulli sampling can add
additional cost as it has to perform random number generation for every single tuple.

System sampling is not suitable for smaller data sets as the granularity of the sampling is on the order of ~1000 tuples. That means that if
system sampling is used for small data sets (e.g., 100 rows) either all the data will be filtered out, or all the data will be included.

Table Samples

The TABLESAMPLE and USING SAMPLE clauses are identical in terms of syntax and effect, with one important difference: tablesamples
sample directly from the table for which they are specified, whereas the sample clause samples after the entire from clause has been
resolved. This is relevant when there are joins present in the query plan.

The TABLESAMPLE clause is essentially equivalent to creating a subquery with the USING SAMPLE clause, i.e., the following two queries
are identical:

-- sample 20% of tbl BEFORE the join


SELECT * FROM tbl TABLESAMPLE reservoir(20%), tbl2 WHERE tbl.i = tbl2.i;

-- sample 20% of tbl BEFORE the join


SELECT *
FROM (SELECT * FROM tbl USING SAMPLE reservoir(20%)) tbl, tbl2
WHERE tbl.i = tbl2.i;

-- sample 20% AFTER the join (i.e., sample 20% of the join result)
SELECT * FROM tbl, tbl2 WHERE tbl.i = tbl2.i USING SAMPLE reservoir(20%);

526
DuckDB Documentation

Window Functions

Examples

-- generate a "row_number" column with containing incremental identifiers for each row
SELECT row_number() OVER () FROM sales;
-- generate a "row_number" column, by order of time
SELECT row_number() OVER (ORDER BY time) FROM sales;
-- generate a "row_number" column, by order of time partitioned by region
SELECT row_number() OVER (PARTITION BY region ORDER BY time) FROM sales;
-- compute the difference between the current amount, and the previous amount,
-- by order of time
SELECT amount - lag(amount) OVER (ORDER BY time) FROM sales;
-- compute the percentage of the total amount of sales per region for each row
SELECT amount / sum(amount) OVER (PARTITION BY region) FROM sales;

Syntax

Window functions can only be used in the SELECT clause. To share OVER specifications between functions, use the statement's WINDOW
clause and use the OVER window-name syntax.

General‑Purpose Window Functions

The table below shows the available general window functions.

Function Return Type Description Example

cume_dist() DOUBLE The cumulative distribution: (number cume_dist()


of partition rows preceding or peer
with current row) / total partition rows.
dense_rank() BIGINT The rank of the current row without dense_rank()
gaps; this function counts peer groups.
first_value(expr any) same type as expr Returns expr evaluated at the row first_value(column)
that is the first row of the window
frame.
first(expr any) same type as expr Alias for first_value. first(column)
lag(expr any [, offset same type as expr Returns expr evaluated at the row lag(column, 3, 0)
integer [, default any ]]) that is offset rows before the current
row within the partition; if there is no
such row, instead return default
(which must be of the same type as
expr). Both offset and default
are evaluated with respect to the
current row. If omitted, offset
defaults to 1 and default to null.
last_value(expr any) same type as expr Returns expr evaluated at the row last_value(column)
that is the last row of the window
frame.
last(expr any) same type as expr Alias for last_value. last(column)

527
DuckDB Documentation

Function Return Type Description Example

lead(expr any [, offset same type as expr Returns expr evaluated at the row lead(column, 3, 0)
integer [, default any ]]) that is offset rows after the current
row within the partition; if there is no
such row, instead return default
(which must be of the same type as
expr). Both offset and default
are evaluated with respect to the
current row. If omitted, offset
defaults to 1 and default to null.
nth_value(expr any, nth same type as expr Returns expr evaluated at the nth row nth_value(column, 2)
integer) of the window frame (counting from 1);
null if no such row.
ntile(num_buckets integer) BIGINT An integer ranging from 1 to the ntile(4)
argument value, dividing the partition
as equally as possible.
percent_rank() DOUBLE The relative rank of the current row: percent_rank()
(rank() - 1) / (total
partition rows - 1).
rank_dense() BIGINT Alias for dense_rank. rank_dense()
rank() BIGINT The rank of the current row with gaps; rank()
same as row_number of its first peer.
row_number() BIGINT The number of the current row within row_number()
the partition, counting from 1.

Aggregate Window Functions

All aggregate functions can be used in a windowing context.

Ignoring NULLs

The following functions support the IGNORE NULLS specification:

Function Description Example

lag(expr any [, offset Skips NULL values lag(column, 3 IGNORE NULLS)


integer [, default any ]]) when counting.
lead(expr any [, offset Skips NULL values lead(column, 3 IGNORE
integer [, default any ]]) when counting. NULLS)
first_value(expr any) Skips leading first_value(column IGNORE
NULLs NULLS)
last_value(expr any) Skips trailing last_value(column IGNORE
NULLs NULLS)
nth_value(expr any, nth Skips NULL values nth_value(column, 2 IGNORE
integer) when counting. NULLS)

Note that there is no comma separating the arguments from the IGNORE NULLS specification.

528
DuckDB Documentation

The inverse of IGNORE NULLS is RESPECT NULLS, which is the default for all functions.

Evaluation

Windowing works by breaking a relation up into independent partitions, ordering those partitions, and then computing a new column for
each row as a function of the nearby values. Some window functions depend only on the partition boundary and the ordering, but a few
(including all the aggregates) also use a frame. Frames are specified as a number of rows on either side (preceding or following) of the current
row. The distance can either be specified as a number of rows or a range of values using the partition's ordering value and a distance.

The full syntax is shown in the diagram at the top of the page, and this diagram visually illustrates computation environment:

Partition and Ordering Partitioning breaks the relation up into independent, unrelated pieces. Partitioning is optional, and if none is
specified then the entire relation is treated as a single partition. Window functions cannot access values outside of the partition containing
the row they are being evaluated at.

Ordering is also optional, but without it the results are not well‑defined. Each partition is ordered using the same ordering clause.

Here is a table of power generation data, available as a CSV file (power-plant-generation-history.csv). To load the data, run:

CREATE TABLE "Generation History" AS


FROM 'power-plant-generation-history.csv';

After partitioning by plant and ordering by date, it will have this layout:

Plant Date MWh

Boston 2019‑01‑02 564337


Boston 2019‑01‑03 507405
Boston 2019‑01‑04 528523
Boston 2019‑01‑05 469538
Boston 2019‑01‑06 474163
Boston 2019‑01‑07 507213
Boston 2019‑01‑08 613040
Boston 2019‑01‑09 582588
Boston 2019‑01‑10 499506
Boston 2019‑01‑11 482014
Boston 2019‑01‑12 486134
Boston 2019‑01‑13 531518
Worcester 2019‑01‑02 118860
Worcester 2019‑01‑03 101977
Worcester 2019‑01‑04 106054
Worcester 2019‑01‑05 92182
Worcester 2019‑01‑06 94492
Worcester 2019‑01‑07 99932
Worcester 2019‑01‑08 118854
Worcester 2019‑01‑09 113506
Worcester 2019‑01‑10 96644
Worcester 2019‑01‑11 93806

529
DuckDB Documentation

Plant Date MWh

Worcester 2019‑01‑12 98963


Worcester 2019‑01‑13 107170

In what follows, we shall use this table (or small sections of it) to illustrate various pieces of window function evaluation.

The simplest window function is row_number(). This function just computes the 1‑based row number within the partition using the
query:

SELECT
"Plant",
"Date",
row_number() OVER (PARTITION BY "Plant" ORDER BY "Date") AS "Row"
FROM "Generation History"
ORDER BY 1, 2;

The result will be the following:

Plant Date Row

Boston 2019‑01‑02 1
Boston 2019‑01‑03 2
Boston 2019‑01‑04 3
... ... ...
Worcester 2019‑01‑02 1
Worcester 2019‑01‑03 2
Worcester 2019‑01‑04 3
... ... ...

Note that even though the function is computed with an ORDER BY clause, the result does not have to be sorted, so the SELECT also
needs to be explicitly sorted if that is desired.

Framing Framing specifies a set of rows relative to each row where the function is evaluated. The distance from the current row is given
as an expression either PRECEDING or FOLLOWING the current row. This distance can either be specified as an integral number of ROWS
or as a RANGE delta expression from the value of the ordering expression. For a RANGE specification, there must be only one ordering ex‑
pression, and it has to support addition and subtraction (i.e., numbers or INTERVALs). The default values for frames are from UNBOUNDED
PRECEDING to CURRENT ROW. It is invalid for a frame to start after it ends. Using the EXCLUDE clause, rows around the current row can
be excluded from the frame.

ROW Framing Here is a simple ROW frame query, using an aggregate function:

SELECT points,
sum(points) OVER (
ROWS BETWEEN 1 PRECEDING
AND 1 FOLLOWING) we
FROM results;

This query computes the sum of each point and the points on either side of it:

Notice that at the edge of the partition, there are only two values added together. This is because frames are cropped to the edge of the
partition.

530
DuckDB Documentation

RANGE Framing Returning to the power data, suppose the data is noisy. We might want to compute a 7 day moving average for each
plant to smooth out the noise. To do this, we can use this window query:

SELECT "Plant", "Date",


avg("MWh") OVER (
PARTITION BY "Plant"
ORDER BY "Date" ASC
RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
AND INTERVAL 3 DAYS FOLLOWING)
AS "MWh 7-day Moving Average"
FROM "Generation History"
ORDER BY 1, 2;

This query partitions the data by Plant (to keep the different power plants' data separate), orders each plant's partition by Date (to put
the energy measurements next to each other), and uses a RANGE frame of three days on either side of each day for the avg (to handle any
missing days). This is the result:

Plant Date MWh 7‑day Moving Average

Boston 2019‑01‑02 517450.75


Boston 2019‑01‑03 508793.20
Boston 2019‑01‑04 508529.83
... ... ...
Boston 2019‑01‑13 499793.00
Worcester 2019‑01‑02 104768.25
Worcester 2019‑01‑03 102713.00
Worcester 2019‑01‑04 102249.50
... ... ...

EXCLUDE Clause The EXCLUDE clause allows rows around the current row to be excluded from the frame. It has the following options:

• EXCLUDE NO OTHERS: exclude nothing (default)


• EXCLUDE CURRENT ROW: exclude the current row from the window frame
• EXCLUDE GROUP: exclude the current row and all its peers (according to the columns specified by ORDER BY) from the window
frame
• EXCLUDE TIES: exclude only the current row's peers from the window frame

WINDOW Clauses Multiple different OVER clauses can be specified in the same SELECT, and each will be computed separately. Often,
however, we want to use the same layout for multiple window functions. The WINDOW clause can be used to define a named window that
can be shared between multiple window functions:

SELECT "Plant", "Date",


min("MWh") OVER seven AS "MWh 7-day Moving Minimum",
avg("MWh") OVER seven AS "MWh 7-day Moving Average",
max("MWh") OVER seven AS "MWh 7-day Moving Maximum"
FROM "Generation History"
WINDOW seven AS (
PARTITION BY "Plant"
ORDER BY "Date" ASC
RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
AND INTERVAL 3 DAYS FOLLOWING)
ORDER BY 1, 2;

531
DuckDB Documentation

The three window functions will also share the data layout, which will improve performance.

Multiple windows can be defined in the same WINDOW clause by comma‑separating them:

SELECT "Plant", "Date",


min("MWh") OVER seven AS "MWh 7-day Moving Minimum",
avg("MWh") OVER seven AS "MWh 7-day Moving Average",
max("MWh") OVER seven AS "MWh 7-day Moving Maximum",
min("MWh") OVER three AS "MWh 3-day Moving Minimum",
avg("MWh") OVER three AS "MWh 3-day Moving Average",
max("MWh") OVER three AS "MWh 3-day Moving Maximum"
FROM "Generation History"
WINDOW
seven AS (
PARTITION BY "Plant"
ORDER BY "Date" ASC
RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
AND INTERVAL 3 DAYS FOLLOWING),
three AS (
PARTITION BY "Plant"
ORDER BY "Date" ASC
RANGE BETWEEN INTERVAL 1 DAYS PRECEDING
AND INTERVAL 1 DAYS FOLLOWING)
ORDER BY 1, 2;

The queries above do not use a number of clauses commonly found in select statements, like WHERE, GROUP BY, etc. For more complex
queries you can find where WINDOW clauses fall in the canonical order of the SELECT statement.

Filtering the Results of Window Functions Using QUALIFY Window functions are executed after the WHERE and HAVING clauses have
been already evaluated, so it's not possible to use these clauses to filter the results of window functions The QUALIFY clause avoids the
need for a subquery or WITH clause to perform this filtering.

Box and Whisker Queries All aggregates can be used as windowing functions, including the complex statistical functions. These function
implementations have been optimised for windowing, and we can use the window syntax to write queries that generate the data for moving
box‑and‑whisker plots:

SELECT "Plant", "Date",


min("MWh") OVER seven AS "MWh 7-day Moving Minimum",
quantile_cont("MWh", [0.25, 0.5, 0.75]) OVER seven
AS "MWh 7-day Moving IQR",
max("MWh") OVER seven AS "MWh 7-day Moving Maximum",
FROM "Generation History"
WINDOW seven AS (
PARTITION BY "Plant"
ORDER BY "Date" ASC
RANGE BETWEEN INTERVAL 3 DAYS PRECEDING
AND INTERVAL 3 DAYS FOLLOWING)
ORDER BY 1, 2;

532
Extensions

Extensions

Overview

DuckDB has a flexible extension mechanism that allows for dynamically loading extensions. These may extend DuckDB's functionality by
providing support for additional file formats, introducing new types, and domain‑specific functionality.

Note. Extensions are loadable on all clients (e.g., Python and R). Extensions distributed via the official repository are built and
tested on macOS (AMD64 and ARM64), Windows (AMD64) and Linux (AMD64 and ARM64).

We maintain a list of official extensions.

Using Extensions

Listing Extensions To get a list of extensions, run:

FROM duckdb_extensions();

extension_ install_
name loaded installed path description aliases

arrow false false A zero‑copy data integration between Apache Arrow and DuckDB []
autocomplete false false Adds support for autocomplete in the shell []
... ... ... ... ... ...

Extension Types

DuckDB has three types of extensions.

Built‑In Extensions Built‑in extensions are loaded at startup and are immediately available for use.

SELECT *
FROM 'test.json';

This will use the json extension to read the JSON file.

Note. To make the DuckDB distribution lightweight, it only contains a few fundamental built‑in extensions (e.g., autocomplete,
json, parquet), which are loaded automatically.

Autoloadable Extensions Autoloadable extensions are loaded on first use.

SELECT *
FROM 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv';

To access files via the HTTPS protocol, DuckDB will automatically load the httpfs extension. Similarly, other autoloadable extensions
(aws, fts) will be loaded on‑demand. If an extension is not already available locally, it will be installed from the official extension repository
(extensions.duckdb.org).

533
DuckDB Documentation

Explicitly Loadable Extensions Some extensions make several changes to the running DuckDB instance, hence, autoloading them may
not be possible. These extensions have to be installed and loaded using the following SQL statements:

INSTALL spatial;
LOAD spatial;

Extension Handling through the Python API If you are using the Python API client, you can install and load them with the install_
extension(name: str) and load_extension(name: str) methods.

Note. Autoloadable extensions can also be installed explicitly.

Ensuring the Integrity of Extensions

Extensions are signed with a cryptographic key, which also simplifies distribution (this is why they are served over HTTP and not HTTPS). By
default, DuckDB uses its built‑in public keys to verify the integrity of extension before loading them. All extensions provided by the DuckDB
core team are signed.

If you wish to load your own extensions or extensions from third‑parties you will need to enable the allow_unsigned_extensions
flag. To load unsigned extensions using the CLI client, pass the -unsigned flag to it on startup. For the Python client, see the Loading
and Installing Extensions section in the Python API documentation.

Sharing Extensions between Clients The shared installation location allows extensions to be shared between the client APIs of the same
DuckDB version, as long as they share the same platfrom or ABI. For example, if an extension is installed with version 0.10.0 of the CLI
client on macOS, it is available from the Python, R, etc. client libraries provided that they have access to the user's home directory and use
DuckDB version 0.10.0.

See the Working with Extensions page for details on available platforms.

Installation Location

Extensions are by default installed under the user's home directory:

~/.duckdb/extensions/v{duckdb_version}/{platform_name}/

For example, the extensions for DuckDB version 0.10.0 on macOS ARM64 (Apple Silicon) are installed to ~/.duckdb/extensions/v0.10.0/osx_
arm64/.

Note. For development builds, the directory of the extensions corresponds to the Git hash of the build, e.g., ~/.duckdb/extensions/fc2e4b
amd64_gcc4.

Changing the Extension Directory To specify a different extension directory, use the extension_directory configuration option:

SET extension_directory = '/path/to/your/extension/directory';

Developing Extensions

The same API that the official extensions use is available for developing extensions. This allows users to extend the functionality of DuckDB
such that it suits their domain the best. A template for creating extensions is available in the extension-template repository.

Working with Extensions

For advanced installation instructions and more details on extensions, see the Working with Extensions page.

534
DuckDB Documentation

Official Extensions

List of Official Extensions

Extension name Description Aliases

arrow GitHub A zero‑copy data integration between Apache Arrow and DuckDB
autocomplete Adds support for autocomplete in the shell
aws GitHub Provides features that depend on the AWS SDK
azure GitHub Adds a filesystem abstraction for Azure blob storage to DuckDB
excel Adds support for Excel‑like format strings
fts Adds support for Full‑Text Search Indexes
httpfs Adds support for reading and writing files over an HTTP(S) or S3 http, https, s3
connection
iceberg GitHub Adds support for Apache Iceberg
icu Adds support for time zones and collations using the ICU library
inet Adds support for IP‑related data types and functions
jemalloc Overwrites system allocator with jemalloc
json Adds support for JSON operations
mysql GitHub Adds support for reading from and writing to a MySQL database
parquet Adds support for reading and writing Parquet files
postgres GitHub Adds support for reading from and writing to a Postgres database postgres_scanner
spatial GitHub Geospatial extension that adds support for working with spatial
data and functions
sqlite GitHub Adds support for reading from and writing to SQLite database sqlite_scanner, sqlite3
files
substrait GitHub Adds support for the Substrait integration
tpcds Adds TPC‑DS data generation and query support
tpch Adds TPC‑H data generation and query support

Default Extensions

Different DuckDB clients ship a different set of extensions. We summarize the main distributions in the table below.

Extension name CLI (duckdb.org) CLI (Homebrew) Python R Java Node.js

autocomplete yes yes


excel yes
fts yes yes
httpfs yes
icu yes yes yes yes yes
json yes yes yes yes yes
parquet yes yes yes yes yes yes
tpcds yes

535
DuckDB Documentation

Extension name CLI (duckdb.org) CLI (Homebrew) Python R Java Node.js

tpch yes yes

The jemalloc extension's availability is based on the operating system. It is a built‑in extension on Linux versions and optionally available
on macOS (via compiling from source). On Windows, it is not available.

Working with Extensions

Downloading Extensions Directly from S3

Downloading an extension directly could be helpful when building a lambda service or container that uses DuckDB. DuckDB extensions are
stored in public S3 buckets, but the directory structure of those buckets is not searchable. As a result, a direct URL to the file must be used.
To directly download an extension file, use the following format:

http://extensions.duckdb.org/v{duckdb_version}/{platform_name}/{extension_name}.duckdb_extension.gz

For example:

http://extensions.duckdb.org/v{{ site.currentduckdbversion }}/windows_amd64/json.duckdb_extension.gz

Platforms

Extension binaries must be built for each platform. We distribute pre‑built binaries for several platforms (see below). For platforms where
packages for certain extensions are not available, users can build them from source and install the resulting binaries manually.

All official extensions are distributed for the following platforms.

Platform name Description

linux_amd64 Linux AMD64 (Node.js packages, etc.)


linux_amd64_gcc4 Linux AMD64 (Python packages, CLI, etc.)
linux_arm64 Linux ARM64 (e.g., AWS Graviton)
osx_amd64 macOS (Intel CPUs)
osx_arm64 macOS (Apple Silicon: M1, M2, M3 CPUs)
windows_amd64 Windows on Intel and AMD CPUs (x86_64)

Note. For some Linux ARM distributions (e.g., Python), two different binaries are distributed. These target either the linux_
arm64 or linux_arm64_gcc4 platforms. Note that extension binaries are distributed for the first, but not the second. Effectively
that means that on these platforms your glibc version needs to be 2.28 or higher to use the distributed extension binaries.

Some extensions are distributed for the following platforms:

• windows_amd64_rtools
• wasm_eh and wasm_mvp (see DuckDB‑Wasm's extensions)

For platforms outside the ones listed above, we do not officially distribute extensions (e.g., linux_arm64_gcc4, windows_amd64_
mingw).

Using a Custom Extension Repository

To load extensions from a custom extension repository, set the following configuration option.

536
DuckDB Documentation

Local Files

SET custom_extension_repository = 'path/to/folder';

This assumes the pointed folder has a structure similar to:

folder
└── 0fd6fb9198
└── osx_arm64
├── autocomplete.duckdb_extension
├── httpfs.duckdb_extension
├── icu.duckdb_extension.gz
├── inet.duckdb_extension
├── json.duckdb_extension
├── parquet.duckdb_extension
├── tpcds.duckdb_extension
├── tpcds.duckdb_extension.gz
└── tpch.duckdb_extension.gz

With at the first level the DuckDB version, at the second the DuckDB platform, and then extensions either as name.duckdb_extension
or gzip‑compressed files name.duckdb_extension.gz.

INSTALL icu;

The execution of this statement will first look icu.duckdb_extension.gz, then icu.duckdb_extension the folder's file structure.
If it finds either of the extension binaries, it will install the extension to the location specified by the extension_directory option
(which defaults to ~/.duckdb/extensions). If the file is compressed, it will be decompressed at this step.

Remote File over http

SET custom_extension_repository = 'http://nightly-extensions.duckdb.org';

They work the same as local ones, and expect a similar folder structure.

Remote Files over https or s3 Protocol

SET custom_extension_repository = 's3://bucket/your-repository-name/';

Remote extension repositories act similarly to local ones, as in the file structure should be the same and either gzipped or non‑gzipped file
are supported.

Only special case here is that httpfs extension should be available locally. You can get it for example doing:

RESET custom_extension_repository;
INSTALL httpfs;

That will install the official httpfs extension locally.

This is since httpfs extension will be needed to actually access remote encrypted files.

INSTALL x FROM y You can also use the INSTALL command's FROM clause to specify the path of the custom extension repository.
For example:

FORCE INSTALL azure FROM 'http://nightly-extensions.duckdb.org';

This will force install the azure extension from the specified URL.

537
DuckDB Documentation

Loading and Installing an Extension from Explicit Paths

Installing Extensions from an Explicit Path INSTALL can be used with the path to either a .duckdb_extension file or a .duckdb_
extension.gz file. For example, if the file was available into the same directory as where DuckDB is being executed, you can install it as
follows:

-- uncompressed file
INSTALL 'path/to/httpfs.duckdb_extension';
-- gzip-compressed file
INSTALL 'path/to/httpfs.duckdb_extension.gz';

These will have the same results.

It is also possible to specify remote paths.

Force Installing Extensions

When DuckDB installs an extension, it is copied to a local directory to be cached, avoiding any network traffic. Any subsequent calls to IN-
STALL extension_name will use the local version instead of downloading the extension again. To force re‑downloading the extension,
run:

FORCE INSTALL extension_name;

For more details, see the Versioning of Extensions page.

Loading Extension from a Path

LOAD can be used with the path to a .duckdb_extension. For example, if the file was available at the (relative) path path/to/httpfs.duckdb_
extension, you can load it as follows:

-- uncompressed file
LOAD 'path/to/httpfs.duckdb_extension';

This will skip any currently installed file in the specifed path.

Using remote paths for compressed files is currently not possible.

Building and Installing Extensions

For building and installing extensions from source, see the building guide.

Statically Linking Extensions

To statically link extensions, follow the developer documentation's ”Using extension config files” section.

Versioning of Extensions

Extension Versions

Note. An extension can have in‑version upgrades. You can run FORCE INSTALL extension to ensure you're on the latest
version of the extension.

538
DuckDB Documentation

DuckDB extensions currently don't have an internal version. This means that, in general, when staying on a DuckDB version, the version of
the extension will be fixed. When a new version of DuckDB is released, this also marks a new release for all extension versions. However,
there are two important sidenotes to make here.

Firstly, some DuckDB extension's may be updated within a DuckDB release in case of bugs. These are considered ”hotfixes” and should not
introduce compatibility breaking changes. This means that when running into issues with an extension, it makes sense to double‑check
that you are on the latest version of an extension by running FORCE INSTALL extension .

Secondly, in the (near) future DuckDB aims to untie extension versions from DuckDB versions by adding version tags to extensions, the
ability to inspect which versions are installed, and installing specific extension versions. So keep in mind that this is likely to change and in
the future extension may introduce compatibility‑breaking updates within a DuckDB release.

Target DuckDB Version

Currently, when extensions are compiled, they are tied to a specific version of DuckDB. What this means is that an extension binary com‑
piled for v0.9.2 does not work for v0.10.0 for example. In most cases, this will not cause any issues and is fully transparent; DuckDB will
automatically ensure it installs the correct binary for its version. For extension developers, this means that they must ensure that new bi‑
naries are created whenever a new version of DuckDB is released. However, note that DuckDB provides an extension template that makes
this fairly simple.

Arrow Extension

The arrow extension implements features for using Apache Arrow, a cross‑language development platform for in‑memory analytics.

Installing and Loading

The arrow extension will be transparently autoloaded on first use from the official extension repository. If you would like to install and
load it manually, run:

INSTALL arrow;
LOAD arrow;

Functions

Function Type Description

to_arrow_ipc Table in‑out‑function Serializes a table into a stream of blobs containing Arrow IPC buffers
scan_arrow_ Table function Scan a list of pointers pointing to Arrow IPC buffers
ipc

AutoComplete Extension

The autocomplete extension adds supports for autocomplete in the CLI client. The extension is shipped by default with the CLI client.

Behavior

For the behavior of the autocomplete extension, see the documentation of the CLI client.

539
DuckDB Documentation

Functions

Function Description

sql_auto_complete( query_string) Attempts autocompletion on the given query_string.

Example

SELECT *
FROM sql_auto_complete('SEL');

Returns:

suggestion suggestion_start

SELECT 0
DELETE 0
INSERT 0
CALL 0
LOAD 0
CALL 0
ALTER 0
BEGIN 0
EXPORT 0
CREATE 0
PREPARE 0
EXECUTE 0
EXPLAIN 0
ROLLBACK 0
DESCRIBE 0
SUMMARIZE 0
CHECKPOINT 0
DEALLOCATE 0
UPDATE 0
DROP 0

GitHub

The autocomplete extension is part of the main DuckDB repository.

AWS Extension

The aws extension adds functionality (e.g., authentication) on top of the httpfs extension's S3 capabilities, using the AWS SDK.

540
DuckDB Documentation

Installing and Loading

To install and load the aws extension, run:

INSTALL aws;
LOAD aws;

Features

Function Type Description

load_aws_credentials PRAGMA function Automatically loads the AWS credentials through the AWS Default
Credentials Provider Chain

Usage

Load AWS Credentials To load the AWS credentials, run:

CALL load_aws_credentials();

┌──────────────────────┬──────────────────────────┬──────────────────────┬───────────────┐
│ loaded_access_key_id │ loaded_secret_access_key │ loaded_session_token │ loaded_region │
│ varchar │ varchar │ varchar │ varchar │
├──────────────────────┼──────────────────────────┼──────────────────────┼───────────────┤
│ AKIAIOSFODNN7EXAMPLE │ <redacted> │ │ eu-west-1 │
└──────────────────────┴──────────────────────────┴──────────────────────┴───────────────┘

The function takes a string parameter to specify a specific profile:

CALL load_aws_credentials('minio-testing-2');

┌──────────────────────┬──────────────────────────┬──────────────────────┬───────────────┐
│ loaded_access_key_id │ loaded_secret_access_key │ loaded_session_token │ loaded_region │
│ varchar │ varchar │ varchar │ varchar │
├──────────────────────┼──────────────────────────┼──────────────────────┼───────────────┤
│ minio_duckdb_user_2 │ <redacted> │ │ eu-west-2 │
└──────────────────────┴──────────────────────────┴──────────────────────┴───────────────┘

There are several parameters to tweak the behavior of the call:

CALL load_aws_credentials('minio-testing-2', set_region = false, redact_secret = false);

┌──────────────────────┬──────────────────────────────┬──────────────────────┬───────────────┐
│ loaded_access_key_id │ loaded_secret_access_key │ loaded_session_token │ loaded_region │
│ varchar │ varchar │ varchar │ varchar │
├──────────────────────┼──────────────────────────────┼──────────────────────┼───────────────┤
│ minio_duckdb_user_2 │ minio_duckdb_user_password_2 │ │ │
└──────────────────────┴──────────────────────────────┴──────────────────────┴───────────────┘

Related Extensions

aws depends on httpfs extension capablities, and both will be autoloaded on the first call to load_aws_credentials. If autoinstall
or autoload are disabled, you can always explicitly install and load them as follows:

INSTALL aws;
INSTALL httpfs;
LOAD aws;
LOAD httpfs;

541
DuckDB Documentation

Usage

See the httpfs extension's S3 capabilities for instructions.

Azure Extension

The azure extension is a loadable extension that adds a filesystem abstraction for the Azure Blob storage to DuckDB.

Installing and Loading

To install and load the azure extension, run:

INSTALL azure;
LOAD azure;

Usage

Once the authentication is set up, the Azure Blob Storage can be queried as follows:

SELECT count(*)
FROM 'az:// my_container / my_file . parquet_or_csv ';

Or with a fully qualified path syntax:

SELECT count(*)
FROM 'az:// my_storage_account .blob.core.windows.net/ my_container / my_file . parquet_or_csv ';

Globs are also supported:

SELECT *
FROM 'az:// my_container /*.csv';

Alternatively:

SELECT *
FROM 'az:// my_storage_account .blob.core.windows.net/ my_container /*.csv';

Configuration

Use the following configuration options how the extension reads remote files:

Name Description Type Default

azure_http_stats Include http info from Azure BOOLEAN false


Storage in the EXPLAIN
ANALYZE statement.

542
DuckDB Documentation

Name Description Type Default

azure_read_transfer_ Maximum number of threads BIGINT 5


concurrency the Azure client can use for a
single parallel read. If azure_
read_transfer_chunk_
size is less than azure_
read_buffer_size then
setting this > 1 will allow the
Azure client to do concurrent
requests to fill the buffer.
azure_read_transfer_ Maximum size in bytes that the BIGINT 1024*1024
chunk_size Azure client will read in a single
request. It is recommended
that this is a factor of azure_
read_buffer_size.
azure_read_buffer_size Size of the read buffer. It is UBIGINT 1024*1024
recommended that this is
evenly divisible by azure_
read_transfer_chunk_
size.
azure_transport_ Underlying adapter to use in VARCHAR default
option_type the Azure SDK. Valid values are:
default or curl.
azure_context_caching Enable/disable the caching of BOOLEAN true
the underlying Azure SDK HTTP
connection in the DuckDB
connection context when
performing queries. If you
suspect that this is causing
some side effect, you can try to
disable it by setting it to false
(not recommended).

Note. Setting azure_transport_option_type explicitly to curl with have the following effect:

• On Linux, this may solve certificates issue (Error: Invalid Error: Fail to get a new connection for:
https:// storage account name .blob.core.windows.net/. Problem with the SSL CA cert
(path? access rights?)) because when specifying the extension will try to find the bundle certificate in various paths
(that is not done by curl by default and might be wrong due to static linking).
• On Windows, this replaces the default adapter (WinHTTP) allowing you to use all curl capabilities (for example using a socks
proxies).
• On all operating systems, it will honor the following environment variables:
– CURL_CA_INFO: Path to a PEM encoded file containing the certificate authorities sent to libcurl. Note that this option is
known to only work on Linux and might throw if set on other platforms.
– CURL_CA_PATH: Path to a directory which holds PEM encoded file, containing the certificate authorities sent to libcurl.

Example:

SET azure_http_stats = false;


SET azure_read_transfer_concurrency = 5;
SET azure_read_transfer_chunk_size = 1048576;
SET azure_read_buffer_size = 1048576;

543
DuckDB Documentation

Authentication

The Azure extension has two ways to configure the authentication. The preferred way is to use Secrets.

Authentication with Secret Multiple Secret Providers are available for the Azure extension:

Note.

• If you need to define different secrets for different storage accounts you can use the SCOPE configuration.
• If you use fully qualified path then the ACCOUNT_NAME attribute is optional.

CONFIG Provider The default provider, CONFIG (i.e., user‑configured), allows access to the storage account using a connection string
or anonymously. For example:

CREATE SECRET secret1 (


TYPE AZURE,
CONNECTION_STRING ' value '
);

If you do not use authentication, you still need to specify the storage account name. For example:

-- Note that PROVIDER CONFIG is optional as it is the default one


CREATE SECRET secret2 (
TYPE AZURE,
PROVIDER CONFIG,
ACCOUNT_NAME ' storage account name '
);

CREDENTIAL_CHAIN Provider The CREDENTIAL_CHAIN provider allows connecting using credentials automatically fetched by the
Azure SDK via the Azure credential chain. By default, the DefaultAzureCredential chain used, which tries credentials according to
the order specified by the Azure documentation. For example:

CREATE SECRET secret3 (


TYPE AZURE,
PROVIDER CREDENTIAL_CHAIN,
ACCOUNT_NAME ' storage account name '
);

DuckDB also allows specifying a specific chain using the CHAIN keyword. This takes a ; separated list of providers that will be tried in order.
For example:

CREATE SECRET secret4 (


TYPE AZURE,
PROVIDER CREDENTIAL_CHAIN,
CHAIN 'cli;env',
ACCOUNT_NAME ' storage account name '
);

The possible values are the following: cli; managed_identity; env; default;

If no explicit CHAIN is provided, the default one will be default

SERVICE_PRINCIPAL Provider The SERVICE_PRINCIPAL provider allows connecting using a Azure Service Principal (SPN).

Either with a secret:

544
DuckDB Documentation

CREATE SECRET azure_spn (


TYPE AZURE,
PROVIDER SERVICE_PRINCIPAL,
TENANT_ID ' tenant id ',
CLIENT_ID ' client id ',
CLIENT_SECRET ' client secret ',
ACCOUNT_NAME ' storage account name '
);

Or with a certificate:

CREATE SECRET azure_spn_cert (


TYPE AZURE,
PROVIDER SERVICE_PRINCIPAL,
TENANT_ID ' tenant id ',
CLIENT_ID ' client id ',
CLIENT_CERTIFICATE_PATH ' client cert path ',
ACCOUNT_NAME ' storage account name '
);

Configuring a Proxy To configure proxy information when using secrets, you can add HTTP_PROXY, PROXY_USER_NAME, and PROXY_
PASSWORD in the secret definition. For example:

CREATE SECRET secret5 (


TYPE AZURE,
CONNECTION_STRING ' value ',
HTTP_PROXY 'http://localhost:3128',
PROXY_USER_NAME 'john',
PROXY_PASSWORD 'doe'
);

Note.

• When using secrets, the HTTP_PROXY environment variable will still be honored except if you provide an explicit value for it.
• When using secrets, the SET variable of the Authentication with variables session will be ignored.
• The Azure CREDENTIAL_CHAIN provider, the actual token is fetched at query time, not at the time of creating the secret.

Authentication with Variables (Deprecated)


SET variable_name = variable_value;

Where variable_name can be one of the following:

Name Description Type Default

azure_storage_ Azure connection string, used STRING ‑


connection_string for authenticating and
configuring azure requests.
azure_account_name Azure account name, when set, STRING ‑
the extension will attempt to
automatically detect
credentials (not used if you
pass the connection string).
azure_endpoint Override the azure endpoint for STRING blob.core.windows.net
when the Azure credential
providers are used.

545
DuckDB Documentation

Name Description Type Default

azure_credential_chain Ordered list of Azure credential STRING ‑


providers, in string format
separated by ;. For example:
'cli;managed_
identity;env'. See the list
of possible values in the
CREDENTIAL_CHAIN
provider section. Not used if
you pass the connection string.
azure_http_proxy Proxy to use when login & STRING HTTP_PROXY environment
performing request to azure. variable (if set).
azure_proxy_user_name Http proxy username if needed. STRING ‑
azure_proxy_password Http proxy password if needed. STRING ‑

Excel Extension

This extension, contrary to its name, does not provide support for reading Excel files. It instead provides a function that wraps the number
formatting functionality of the i18npool library, which formats numbers per Excel's formatting rules.

Excel files can be handled through the spatial extension: see the Excel Import and Excel Export pages for instructions.

Installing and Loading

The excel extension will be transparently autoloaded on first use from the official extension repository. If you would like to install and
load it manually, run:

INSTALL excel;
LOAD excel;

Usage

SELECT excel_text(1234567.897, 'h:mm AM/PM') AS timestamp;

┌───────────┐
│ timestamp │
│ varchar │
├───────────┤
│ 9:31 PM │
└───────────┘

Functions

Function Description Example Result

excel_text( number, Alias for text. excel_ 9:31 PM


format_string) text(1234567.897,
'h:mm AM/PM')

546
DuckDB Documentation

Function Description Example Result

text( number, format_ Format the given number per the rules text(1234567.897, 'h 9 PM
string) given in the format_string AM/PM')

GitHub

The excel extension is part of the main DuckDB repository.

Full‑Text Search Extension

Full‑Text Search is an extension to DuckDB that allows for search through strings, similar to SQLite's FTS5 extension.

Installing and Loading

The fts extension will be transparently autoloaded on first use from the official extension repository. If you would like to install and load
it manually, run:

INSTALL fts;
LOAD fts;

Usage

The extension adds two PRAGMA statements to DuckDB: one to create, and one to drop an index. Additionally, a scalar macro stem is
added, which is used internally by the extension.

PRAGMA create_fts_index
create_fts_index(input_table, input_id, *input_values, stemmer = 'porter',
stopwords = 'english', ignore = '(\\.|[^a-z])+',
strip_accents = 1, lower = 1, overwrite = 0)

PRAGMA that creates a FTS index for the specified table.

Name Type Description

input_table VARCHAR Qualified name of specified table, e.g., 'table_name' or 'main.table_name'


input_id VARCHAR Column name of document identifier, e.g., 'document_identifier'
input_values… VARCHAR Column names of the text fields to be indexed (vararg), e.g., 'text_field_1',
'text_field_2', ..., 'text_field_N', or '\*' for all columns in input_table
of type VARCHAR
stemmer VARCHAR The type of stemmer to be used. One of 'arabic', 'basque', 'catalan',
'danish', 'dutch', 'english', 'finnish', 'french', 'german',
'greek', 'hindi', 'hungarian', 'indonesian', 'irish', 'italian',
'lithuanian', 'nepali', 'norwegian', 'porter', 'portuguese',
'romanian', 'russian', 'serbian', 'spanish', 'swedish', 'tamil',
'turkish', or 'none' if no stemming is to be used. Defaults to 'porter'
stopwords VARCHAR Qualified name of table containing a single VARCHAR column containing the desired
stopwords, or 'none' if no stopwords are to be used. Defaults to 'english' for a
pre‑defined list of 571 English stopwords

547
DuckDB Documentation

Name Type Description

ignore VARCHAR Regular expression of patterns to be ignored. Defaults to '(\\.|[^a-z])+',


ignoring all escaped and non‑alphabetic lowercase characters
strip_accents BOOLEAN Whether to remove accents (e.g., convert á to a). Defaults to 1
lower BOOLEAN Whether to convert all text to lowercase. Defaults to 1
overwrite BOOLEAN Whether to overwrite an existing index on a table. Defaults to 0

This PRAGMA builds the index under a newly created schema. The schema will be named after the input table: if an index is created on
table 'main.table_name', then the schema will be named 'fts_main_table_name'.

PRAGMA drop_fts_index
drop_fts_index(input_table)

Drops a FTS index for the specified table.

Name Type Description

input_table VARCHAR Qualified name of input table, e.g., 'table_name' or 'main.table_name'

match_bm25 Function
match_bm25(input_id, query_string, fields := NULL, k := 1.2, b:= 0.75, conjunctive := 0)

When an index is built, this retrieval macro is created that can be used to search the index.

Name Type Description

input_id VARCHAR Column name of document identifier, e.g., 'document_identifier'


query_string VARCHAR The string to search the index for
fields VARCHAR Comma‑separarated list of fields to search in, e.g., 'text_field_2, text_
field_N'. Defaults to NULL to search all indexed fields
k DOUBLE Parameter k1 in the Okapi BM25 retrieval model. Defaults to 1.2
b DOUBLE Parameter b in the Okapi BM25 retrieval model. Defaults to 0.75
conjunctive BOOLEAN Whether to make the query conjunctive i.e., all terms in the query string must be
present in order for a document to be retrieved

stem Function
stem(input_string, stemmer)

Reduces words to their base. Used internally by the extension.

Name Type Description

input_string VARCHAR The column or constant to be stemmed

548
DuckDB Documentation

Name Type Description

stemmer VARCHAR The type of stemmer to be used. One of 'arabic', 'basque', 'catalan',
'danish', 'dutch', 'english', 'finnish', 'french', 'german',
'greek', 'hindi', 'hungarian', 'indonesian', 'irish', 'italian',
'lithuanian', 'nepali', 'norwegian', 'porter', 'portuguese',
'romanian', 'russian', 'serbian', 'spanish', 'swedish', 'tamil',
'turkish', or 'none' if no stemming is to be used.

Example Usage

Create a table and fill it with text data:

CREATE TABLE documents (


document_identifier VARCHAR,
text_content VARCHAR,
author VARCHAR,
doc_version INTEGER
);
INSERT INTO documents
VALUES ('doc1',
'The mallard is a dabbling duck that breeds throughout the temperate.',
'Hannes Mühleisen',
3),
('doc2',
'The cat is a domestic species of small carnivorous mammal.',
'Laurens Kuiper',
2
);

Build the index, and make both the text_content and author columns searchable.

PRAGMA create_fts_index(
'documents', 'document_identifier', 'text_content', 'author'
);

Search the author field index for documents that are authored by ”Muhleisen”. This retrieves ”doc1”:

SELECT document_identifier, text_content, score


FROM (
SELECT *, fts_main_documents.match_bm25(
document_identifier,
'Muhleisen',
fields := 'author'
) AS score
FROM documents
) sq
WHERE score IS NOT NULL
AND doc_version > 2
ORDER BY score DESC;

┌─────────────────────┬──────────────────────────────────────────────────────────────────────┬────────┐
│ document_identifier │ text_content │ score │
│ varchar │ varchar │ double │
├─────────────────────┼──────────────────────────────────────────────────────────────────────┼────────┤
│ doc1 │ The mallard is a dabbling duck that breeds throughout the temperate. │ 0.0 │
└─────────────────────┴──────────────────────────────────────────────────────────────────────┴────────┘

Search for documents about ”small cats”. This retrieves ”doc2”:

549
DuckDB Documentation

SELECT document_identifier, text_content, score


FROM (
SELECT *, fts_main_documents.match_bm25(
document_identifier,
'small cats'
) AS score
FROM documents
) sq
WHERE score IS NOT NULL
ORDER BY score DESC;

┌─────────────────────┬────────────────────────────────────────────────────────────┬────────┐
│ document_identifier │ text_content │ score │
│ varchar │ varchar │ double │
├─────────────────────┼────────────────────────────────────────────────────────────┼────────┤
│ doc2 │ The cat is a domestic species of small carnivorous mammal. │ 0.0 │
└─────────────────────┴────────────────────────────────────────────────────────────┴────────┘

Note. Warning The FTS index will not update automatically when input table changes. A workaround of this limitation can be
recreating the index to refresh.

GitHub

The fts extension is part of the main DuckDB repository.

httpfs (HTTP and S3)

httpfs Extension for HTTP and S3 Support

The httpfs extension is an autoloadable extension implementing a file system that allows reading remote/writing remote files. For plain
HTTP(S), only file reading is supported. For object storage using the S3 API, the httpfs extension supports reading/writing/globbing
files.

Installation and Loading

The httpfs extension will be, by default, autoloaded on first use of any functionality exposed by this extension.

To manually install and load the httpfs extension, run:

INSTALL httpfs;
LOAD httpfs;

HTTP(S)

The httpfs extension supports connecting to HTTP(S) endpoints.

S3 API

The httpfs extension supports connecting to S3 API endpoints.

GitHub

The httpfs extension is part of the main DuckDB repository.

550
DuckDB Documentation

HTTP(S) Support

With the httpfs extension, it is possible to directly query files over the HTTP(S) protocol. This works for all files supported by DuckDB or
its various extensions, and provides read‑only access.

SELECT *
FROM 'https://domain.tld/file.extension';

For CSV files, files will be downloaded entirely in most cases, due to the row‑based nature of the format. For Parquet files, DuckDB can use
a combination of the Parquet metadata and HTTP range requests to only download the parts of the file that are actually required by the
query. For example, the following query will only read the Parquet metadata and the data for the column_a column:

SELECT column_a
FROM 'https://domain.tld/file.parquet';

In some cases even, no actual data needs to be read at all as they only require reading the metadata:

SELECT count(*)
FROM 'https://domain.tld/file.parquet';

Scanning multiple files over HTTP(S) is also supported:

SELECT *
FROM read_parquet([
'https://domain.tld/file1.parquet',
'https://domain.tld/file2.parquet'
]);

S3 API Support

The httpfs extension supports reading/writing/globbing files on object storage servers using the S3 API. S3 offers a standard API to read
and write to remote files (while regular http servers, predating S3, do not offer a common write API). DuckDB conforms to the S3 API, that
is now common among industry storage providers.

Platforms

The httpfs filesystem is tested with AWS S3, Minio, Google Cloud, and lakeFS. Other services that implement the S3 API (such as Cloudflare
R2) should also work, but not all features may be supported.

The following table shows which parts of the S3 API are required for each httpfs feature.

Feature Required S3 API features

Public file reads HTTP Range requests


Private file reads Secret key or session token authentication
File glob ListObjectV2
File writes Multipart upload

Configuration and Authentication

The preferred way to configure and authenticate to S3 endpoints is to use secrets. Multiple secret providers are available.

Note. Deprecated Prior to version 0.10.0, DuckDB did not have a Secrets manager. Hence, the configuration of and authentication
to S3 endpoints was handled via variables. See the legacy authentication scheme for the S3 API.

551
DuckDB Documentation

CONFIG Provider The default provider, CONFIG (i.e., user‑configured), allows access to the S3 bucket by manually providing a key. For
example:

CREATE SECRET secret1 (


TYPE S3,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
REGION 'us-east-1'
);

Note. Tip If you get an IO Error (Connection error for HTTP HEAD), configure the endpoint explicitly via ENDPOINT
's3. your-region .amazonaws.com'.

Now, to query using the above secret, simply query any s3:// prefixed file:

SELECT *
FROM 's3://my-bucket/file.parquet';

CREDENTIAL_CHAIN Provider The CREDENTIAL_CHAIN provider allows automatically fetching credentials using mechanisms pro‑
vided by the AWS SDK. For example, to use the AWS SDK default provider:

CREATE SECRET secret2 (


TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);

Again, to query a file using the above secret, simply query any s3:// prefixed file.

DuckDB also allows specifying a specific chain using the CHAIN keyword. This takes a ; separated list of providers that will be tried in order.
For example:

CREATE SECRET secret3 (


TYPE S3,
PROVIDER CREDENTIAL_CHAIN,
CHAIN 'env;config'
);

The possible values for CHAIN are the following:

• config
• sts
• sso
• env
• instance
• process
• task_role

The CREDENTIAL_CHAIN provider also allows overriding the automatically fetched config. For example, to automatically load creden‑
tials, and then override the region, run:

CREATE SECRET secret4 (


TYPE AZURE,
PROVIDER CREDENTIAL_CHAIN,
CHAIN 'config',
REGION 'eu-west-1'
);

552
DuckDB Documentation

Overview of S3 Secret Parameters Below is a complete list of the supported parameters that can be used for both the CONFIG and
CREDENTIAL_CHAIN providers:

Name Description Secret Type Default

KEY_ID The ID of the key to use S3, GCS, R2 STRING ‑


SECRET The secret of the key to use S3, GCS, R2 STRING ‑
REGION The region for which to authenticate (should match the S3, GCS, R2 STRING us-east-1
region of the bucket to query)
SESSION_TOKEN Optionally, a session token can be passed to use S3, GCS, R2 STRING ‑
temporary credentials
ENDPOINT Specify a custom S3 endpoint S3, GCS, R2 STRING s3.amazonaws.com for
S3,
URL_STYLE Either vhost or path S3, GCS, R2 STRING vhost for S3, path for R2
and GCS
USE_SSL Whether to use HTTPS or HTTP S3, GCS, R2 BOOLEANtrue
URL_ Can help when urls contain problematic characters. S3, GCS, R2 BOOLEANtrue
COMPATIBILITY_
MODE
ACCOUNT_ID The R2 account ID to use for generating the endpoint url R2 STRING ‑

Platform‑Specific Secret Types

R2 Secrets While Cloudflare R2 uses the regular S3 API, DuckDB has a special Secret type, R2, to make configuring it a bit simpler:

CREATE SECRET secret5 (


TYPE R2,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
ACCOUNT_ID 'my_account_id'
);

Note the addition of the ACCOUNT_ID which is used to generate to correct endpoint url for you. Also note that for R2 Secrets can also
use both the CONFIG and CREDENTIAL_CHAIN providers. Finally, R2 secrets are only available when using urls starting with r2://, for
example:

SELECT *
FROM read_parquet('r2://some/file/that/uses/r2/secret/file.parquet');

GCS Secrets While Google Cloud Storage is accessed by DuckDB using the S3 API, DuckDB has a special Secret type, GCS, to make con‑
figuring it a bit simpler:

CREATE SECRET secret6 (


TYPE GCS,
KEY_ID 'my_key',
SECRET 'my_secret'
);

Note that the above secret, will automatically have the correct Google Cloud Storage endpoint configured. Also note that for GCS Secrets
can also use both the CONFIG and CREDENTIAL_CHAIN providers. Finally, GCS secrets are only available when using urls starting with
gcs:// or gs://, for example:

SELECT *
FROM read_parquet('gcs://some/file/that/uses/gcs/secret/file.parquet');

553
DuckDB Documentation

Reading

Reading files from S3 is now as simple as:

SELECT *
FROM 's3://bucket/file.extension';

Multiple files are also possible, for example:

SELECT *
FROM read_parquet([
's3://bucket/file1.parquet',
's3://bucket/file2.parquet'
]);

Glob File globbing is implemented using the ListObjectV2 API call and allows to use filesystem‑like glob patterns to match multiple files,
for example:

SELECT *
FROM read_parquet('s3://bucket/*.parquet');

This query matches all files in the root of the bucket with the Parquet extension.

Several features for matching are supported, such as * to match any number of any character, ? for any single character or [0-9] for a
single character in a range of characters:

SELECT count(*) FROM read_parquet('s3://bucket/folder*/100?/t[0-9].parquet');

A useful feature when using globs is the filename option, which adds a column named filename that encodes the file that a particular
row originated from:

SELECT *
FROM read_parquet('s3://bucket/*.parquet', filename = true);

could for example result in:

column_a column_b filename

1 examplevalue1 s3://bucket/file1.parquet
2 examplevalue1 s3://bucket/file2.parquet

Hive Partitioning DuckDB also offers support for the Hive partitioning scheme, which is available when using HTTP(S) and S3 end‑
points.

Writing

Writing to S3 uses the multipart upload API. This allows DuckDB to robustly upload files at high speed. Writing to S3 works for both CSV
and Parquet:

COPY table_name TO 's3://bucket/file.extension';

Partitioned copy to S3 also works:

COPY table TO 's3://my-bucket/partitioned' (


FORMAT PARQUET,
PARTITION_BY (part_col_a, part_col_b)
);

554
DuckDB Documentation

An automatic check is performed for existing files/directories, which is currently quite conservative (and on S3 will add a bit of latency). To
disable this check and force writing, an OVERWRITE_OR_IGNORE flag is added:

COPY table TO 's3://my-bucket/partitioned' (


FORMAT PARQUET,
PARTITION_BY (part_col_a, part_col_b),
OVERWRITE_OR_IGNORE true
);

The naming scheme of the written files looks like this:

s3://my-bucket/partitioned/part_col_a= val /part_col_b= val /data_ thread_number .parquet

Configuration Some additional configuration options exist for the S3 upload, though the default values should suffice for most use
cases.

Name Description

s3_uploader_max_parts_per_file used for part size calculation, see AWS docs


s3_uploader_max_filesize used for part size calculation, see AWS docs
s3_uploader_thread_limit maximum number of uploader threads

Legacy Authentication Scheme for S3 API

Prior to version 0.10.0, DuckDB did not have a Secrets manager. Hence, the configuration of and authentication to S3 endpoints was han‑
dled via variables. This page documents the legacy authentication scheme for the S3 API.

Note. The recommended way to configuration and authentication of S3 endpoints is to use secrets.

Legacy Authentication Scheme

To be able to read or write from S3, the correct region should be set:

SET s3_region = 'us-east-1';

Optionally, the endpoint can be configured in case a non‑AWS object storage server is used:

SET s3_endpoint = ' domain . tld : port ';

If the endpoint is not SSL‑enabled then run:

SET s3_use_ssl = false;

Switching between path‑style and vhost‑style URLs is possible using:

SET s3_url_style = 'path';

However, note that this may also require updating the endpoint. For example for AWS S3 it is required to change the endpoint to
s3. region .amazonaws.com.

After configuring the correct endpoint and region, public files can be read. To also read private files, authentication credentials can be
added:

SET s3_access_key_id = ' AWS access key id ';


SET s3_secret_access_key = ' AWS secret access key ';

Alternatively, session tokens are also supported and can be used instead:

SET s3_session_token = ' AWS session token ';

The aws extension allows for loading AWS credentials.

555
DuckDB Documentation

Per‑Request Configuration

Aside from the global S3 configuration described above, specific configuration values can be used on a per‑request basis. This allows for
use of multiple sets of credentials, regions, etc. These are used by including them on the S3 URI as query parameters. All the individual
configuration values listed above can be set as query parameters. For instance:

SELECT *
FROM 's3://bucket/file.parquet?s3_access_key_id=accessKey&s3_secret_access_key=secretKey';

Multiple configurations per query are also allowed:

SELECT *
FROM 's3://bucket/file.parquet?s3_region=region&s3_session_token=session_token' t1
INNER JOIN 's3://bucket/file.csv?s3_access_key_id=accessKey&s3_secret_access_key=secretKey' t2;

Configuration

Some additional configuration options exist for the S3 upload, though the default values should suffice for most use cases.

Additionally, most of the configuration options can be set via environment variables:

DuckDB setting Environment variable Note

s3_region AWS_REGION Takes priority over AWS_DEFAULT_REGION


s3_region AWS_DEFAULT_REGION
s3_access_key_id AWS_ACCESS_KEY_ID
s3_secret_access_key AWS_SECRET_ACCESS_KEY
s3_session_token AWS_SESSION_TOKEN
s3_endpoint DUCKDB_S3_ENDPOINT
s3_use_ssl DUCKDB_S3_USE_SSL

Iceberg Extension

The iceberg extension is a loadable extension that implements support for the Apache Iceberg format.

Installing and Loading

To install and load the iceberg extension, run:

INSTALL iceberg;
LOAD iceberg;

Usage

To test the examples, download the iceberg_data.zip file and unzip it.

Querying Individual Tables


SELECT count(*)
FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true);

51793

556
DuckDB Documentation

Note. The allow_moved_paths option ensures that some path resolution is performed, which allows scanning Iceberg tables
that are moved.

Access Iceberg Metadata


SELECT *
FROM iceberg_metadata('data/iceberg/lineitem_iceberg', allow_moved_paths = true);

┌────────────────────────────────────────────────────────┬──────────────────────────┬──────────────────┬─────────
│ manifest_path │ manifest_sequence_number │ manifest_content │
status │ content │ file_path │
file_format │ record_count │
│ varchar │ int64 │ varchar │
varchar │ varchar │ varchar │
varchar │ int64 │
├────────────────────────────────────────────────────────┼──────────────────────────┼──────────────────┼─────────
│ lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b2… │ 2 │ DATA │
ADDED │ EXISTING │ lineitem_iceberg/data/00041-414-f3c73457-bbd6-4b92-9c15-17b241171b16-00001.parquet
│ PARQUET │ 51793 │
│ lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b2… │ 2 │ DATA │
DELETED │ EXISTING │ lineitem_iceberg/data/00000-411-0792dcfe-4e25-4ca3-8ada-175286069a47-00001.parquet
│ PARQUET │ 60175 │
└────────────────────────────────────────────────────────┴──────────────────────────┴──────────────────┴─────────

Visualizing Snapshots
SELECT *
FROM iceberg_snapshots('data/iceberg/lineitem_iceberg');

┌─────────────────┬─────────────────────┬─────────────────────────┬──────────────────────────────────────────────
│ sequence_number │ snapshot_id │ timestamp_ms │
manifest_list │
│ uint64 │ uint64 │ timestamp │
varchar │
├─────────────────┼─────────────────────┼─────────────────────────┼──────────────────────────────────────────────
│ 1 │ 3776207205136740581 │ 2023-02-15 15:07:54.504 │ lineitem_
iceberg/metadata/snap-3776207205136740581-1-cf3d0be5-cf70-453d-ad8f-48fdc412e608.avro │
│ 2 │ 7635660646343998149 │ 2023-02-15 15:08:14.73 │ lineitem_
iceberg/metadata/snap-7635660646343998149-1-10eaca8a-1e1c-421e-ad6d-b232e5ee23d3.avro │
└─────────────────┴─────────────────────┴─────────────────────────┴──────────────────────────────────────────────

Limitations

Writing (i.e., exporting to) Iceberg files is currently not supported.

ICU Extension

The icu extension contains an easy‑to‑use version of the collation/timezone part of the ICU library.

Installing and Loading

To install and load the icu extension, run:

INSTALL icu;
LOAD icu;

557
DuckDB Documentation

Features

The icu extension introduces the following features:

• region‑dependent collations
• time zones, used for timestamp data types and timestamp functions

GitHub

The icu extension is part of the main DuckDB repository.

inet Extension

The inet extension defines the INET data type for storing IPv4 network addresses. It supports the CIDR notation for subnet masks (e.g.,
198.51.100.0/22).

Installing and Loading

To install and load the inet extension, run:

INSTALL inet;
LOAD inet;

Examples

SELECT '127.0.0.1'::INET AS addr;

┌───────────┐
│ addr │
│ inet │
├───────────┤
│ 127.0.0.1 │
└───────────┘

CREATE TABLE tbl (id INTEGER, ip INET);


INSERT INTO tbl VALUES (1, '192.168.0.0/16'), (2, '127.0.0.1'), (2, '8.8.8.8');

┌───────┬────────────────┐
│ id │ ip │
│ int32 │ inet │
├───────┼────────────────┤
│ 1 │ 192.168.0.0/16 │
│ 2 │ 127.0.0.1 │
│ 2 │ 8.8.8.8 │
└───────┴────────────────┘

GitHub

The inet extension is part of the main DuckDB repository.

jemalloc Extension

The jemalloc extension replaces the system's memory allocator with jemalloc. Unlike other DuckDB extensions, the jemalloc exten‑
sion is statically linked and cannot be installed or loaded during runtime.

558
DuckDB Documentation

Operating System Support

The availability of the jemalloc extension depends on the operating system.

Linux The Linux version of DuckDB ships with the jemalloc extension by default.

To disable the jemalloc extension, build DuckDB from source and set the SKIP_EXTENSIONS flag as follows:

GEN=ninja SKIP_EXTENSIONS="jemalloc" make

macOS The macOS version of DuckDB does not ship with the jemalloc extension but can be built from source to include it:

GEN=ninja BUILD_JEMALLOC=1 make

Windows On Windows, this extension is not available.

GitHub

The jemalloc extension is part of the main DuckDB repository.

JSON Extension

The json extension is a loadable extension that implements SQL functions that are useful for reading values from existing JSON, and
creating new JSON data.

Installing and Loading

The json extension is shipped by default in DuckDB builds, otherwise it will be transparently autoloaded on first use. If you would like to
install and load it manually, run:

INSTALL json;
LOAD json;

Example Uses

-- read a JSON file from disk, auto-infer options


SELECT * FROM 'todos.json';

-- read_json with custom options


SELECT *
FROM read_json('todos.json',
format = 'array',
columns = {userId: 'UBIGINT',
id: 'UBIGINT',
title: 'VARCHAR',
completed: 'BOOLEAN'});

-- write the result of a query to a JSON file


COPY (SELECT * FROM todos) TO 'todos.json';

See more examples on the JSON data page.

559
DuckDB Documentation

JSON Type

The JSON extension makes use of the JSON logical type. The JSON logical type is interpreted as JSON, i.e., parsed, in JSON functions rather
than interpreted as VARCHAR, i.e., a regular string. All JSON creation functions return values of this type.

We also allow any of our types to be casted to JSON, and JSON to be casted back to any of our types, for example:

-- Cast JSON to our STRUCT type


SELECT '{"duck": 42}'::JSON::STRUCT(duck INTEGER);
-- {'duck': 42}

-- And back:
SELECT {duck: 42}::JSON;
-- {"duck":42}

This works for our nested types as shown in the example, but also for non‑nested types:

SELECT '2023-05-12'::DATE::JSON;
-- "2023-05-12"

The only exception to this behavior is the cast from VARCHAR to JSON, which does not alter the data, but instead parses and validates the
contents of the VARCHAR as JSON.

JSON Table Functions

The following table functions are used to read JSON:

Function Description

read_json_objects( filename) Read a JSON object from filename, where filename can also
be a list of files or a glob pattern
read_ndjson_objects( filename) Alias for read_json_objects with parameter format set to
'newline_delimited'
read_json_objects_auto( filename) Alias for read_json_objects with parameter format set to
'auto'

These functions have the following parameters:

Name Description Type Default

compression The compression type for the file. By default this will be VARCHAR 'auto'
detected automatically from the file extension (e.g.,
t.json.gz will use gzip, t.json will use none). Options
are 'none', 'gzip', 'zstd', and 'auto'.
filename Whether or not an extra filename column should be BOOL false
included in the result.
format Can be one of ['auto', 'unstructured', VARCHAR 'array'
'newline_delimited', 'array'].
hive_partitioning Whether or not to interpret the path as a Hive partitioned BOOL false
path.
ignore_errors Whether to ignore parse errors (only possible when BOOL false
format is 'newline_delimited').

560
DuckDB Documentation

Name Description Type Default

maximum_sample_files The maximum number of JSON files sampled for BIGINT 32


auto‑detection.
maximum_object_size The maximum size of a JSON object (in bytes). UINTEGER 16777216

The format parameter specifies how to read the JSON from a file. With 'unstructured', the top‑level JSON is read, e.g.:

{
"duck": 42
}
{
"goose": [1, 2, 3]
}

will result in two objects being read.

With 'newline_delimited', NDJSON is read, where each JSON is separated by a newline (\n), e.g.:

{"duck": 42}
{"goose": [1, 2, 3]}

will also result in two objects being read.

With 'array', each array element is read, e.g.:

[
{
"duck": 42
},
{
"goose": [1, 2, 3]
}
]

Again, will result in two objects being read.

Example usage:

SELECT * FROM read_json_objects('my_file1.json');


-- {"duck":42,"goose":[1,2,3]}

SELECT * FROM read_json_objects(['my_file1.json', 'my_file2.json']);


-- {"duck":42,"goose":[1,2,3]}
-- {"duck":43,"goose":[4,5,6],"swan":3.3}

SELECT * FROM read_ndjson_objects('*.json.gz');


-- {"duck":42,"goose":[1,2,3]}
-- {"duck":43,"goose":[4,5,6],"swan":3.3}

DuckDB also supports reading JSON as a table, using the following functions:

Function Description

read_json( filename) Read JSON from filename, where filename can also be a list of files, or a glob
pattern
read_json_auto( filename) Alias for read_json with all auto‑detection enabled
read_ndjson( filename) Alias for read_json with parameter format set to 'newline_delimited'

561
DuckDB Documentation

Function Description

read_ndjson_auto( filename) Alias for read_json_auto with parameter format set to 'newline_
delimited'

Besides the maximum_object_size, format, ignore_errors and compression, these functions have additional parameters:

Name Description Type Default

auto_detect Whether to auto‑detect detect the names of the keys and data BOOL false
types of the values automatically
columns A struct that specifies the key names and value types contained STRUCT (empty)
within the JSON file (e.g., {key1: 'INTEGER', key2:
'VARCHAR'}). If auto_detect is enabled these will be
inferred
dateformat Specifies the date format to use when parsing dates. See Date VARCHAR 'iso'
Format
maximum_depth Maximum nesting depth to which the automatic schema BIGINT -1
detection detects types. Set to ‑1 to fully detect nested JSON
types
records Can be one of ['auto', 'true', 'false'] VARCHAR 'records'
sample_size Option to define number of sample objects for automatic JSON UBIGINT 20480
type detection. Set to ‑1 to scan the entire input file
timestampformat Specifies the date format to use when parsing timestamps. See VARCHAR 'iso'
Date Format
union_by_name Whether the schema's of multiple JSON files should be unified BOOL false

Example usage:

SELECT * FROM read_json('my_file1.json', columns = {duck: 'INTEGER'});

duck

42

DuckDB can convert JSON arrays directly to its internal LIST type, and missing keys become NULL.

SELECT *
FROM read_json(['my_file1.json', 'my_file2.json'],
columns = {duck: 'INTEGER', goose: 'INTEGER[]', swan: 'DOUBLE'});

duck goose swan

42 [1, 2, 3] NULL
43 [4, 5, 6] 3.3

DuckDB can automatically detect the types like so:

562
DuckDB Documentation

SELECT goose, duck FROM read_json_auto('*.json.gz');


SELECT goose, duck FROM '*.json.gz'; -- equivalent

goose duck

[1, 2, 3] 42
[4, 5, 6] 43

DuckDB can read (and auto‑detect) a variety of formats, specified with the format parameter. Querying a JSON file that contains an
'array', e.g.:

[
{
"duck": 42,
"goose": 4.2
},
{
"duck": 43,
"goose": 4.3
}
]

Can be queried exactly the same as a JSON file that contains 'unstructured' JSON, e.g.:

{
"duck": 42,
"goose": 4.2
}
{
"duck": 43,
"goose": 4.3
}

Both can be read as the table:

duck goose

42 4.2
43 4.3

If your JSON file does not contain 'records', i.e., any other type of JSON than objects, DuckDB can still read it. This is specified with the
records parameter. The records parameter specifies whether the JSON contains records that should be unpacked into individual
columns, i.e., reading the following file with records:

{"duck": 42, "goose": [1, 2, 3]}


{"duck": 43, "goose": [4, 5, 6]}

Results in two columns:

duck goose

42 [1,2,3]
42 [4,5,6]

You can read the same file with records set to 'false', to get a single column, which is a STRUCT containing the data:

563
DuckDB Documentation

json

{'duck': 42, 'goose': [1,2,3]}


{'duck': 43, 'goose': [4,5,6]}

For additional examples reading more complex data, please see the Shredding Deeply Nested JSON, One Vector at a Time blog post.

JSON Import/Export

When the JSON extension is installed, FORMAT JSON is supported for COPY FROM, COPY TO, EXPORT DATABASE and IMPORT
DATABASE. See Copy and Import/Export.

By default, COPY expects newline‑delimited JSON. If you prefer copying data to/from a JSON array, you can specify ARRAY true, e.g.,

COPY (SELECT * FROM range(5)) TO 'my.json' (ARRAY true);

will create the following file:

[
{"range":0},
{"range":1},
{"range":2},
{"range":3},
{"range":4}
]

This can be read like so:

CREATE TABLE test (range BIGINT);


COPY test FROM 'my.json' (ARRAY true);

The format can be detected automatically the format like so:

COPY test FROM 'my.json' (AUTO_DETECT true);

JSON Scalar Functions

The following scalar JSON functions can be used to gain information about the stored JSON values. With the exception of json_valid(
json), all JSON functions produce an error when invalid JSON is supplied.

We support two kinds of notations to describe locations within JSON: JSON Pointer and JSONPath.

Function Description

json_array_length( json[,path]) Return the number of elements in the JSON array json, or 0 if it is not a
JSON array. If path is specified, return the number of elements in the
JSON array at the given path. If path is a LIST, the result will be LIST
of array lengths
json_contains( json_haystack, json_needle) Returns true if json_needle is contained in json_haystack. Both
parameters are of JSON type, but json_needle can also be a numeric
value or a string, however the string must be wrapped in double quotes
json_keys( json[,path]) Returns the keys of json as a LIST of VARCHAR, if json is a JSON
object. If path is specified, return the keys of the JSON object at the given
path. If path is a LIST, the result will be LIST of LIST of VARCHAR

564
DuckDB Documentation

Function Description

json_structure( json) Return the structure of json. Defaults to JSON the structure is
inconsistent (e.g., incompatible types in an array)
json_type( json[,path]) Return the type of the supplied json, which is one of OBJECT, ARRAY,
BIGINT, UBIGINT, VARCHAR, BOOLEAN, NULL. If path is specified,
return the type of the element at the given path. If path is a LIST, the
result will be LIST of types
json_valid( json) Return whether json is valid JSON
json( json) Parse and minify json

The JSONPointer syntax separates each field with a /. For example, to extract the first element of the array with key "duck", you can do:

SELECT json_extract('{"duck": [1, 2, 3]}', '/duck/0');


-- 1

The JSONPath syntax separates fields with a ., and accesses array elements with [i], and always starts with $. Using the same example,
we can do the following:

SELECT json_extract('{"duck": [1, 2, 3]}', '$.duck[0]');


-- 1

Note that DuckDB's JSON data type uses 0‑based indexing.

JSONPath is more expressive, and can also access from the back of lists:

SELECT json_extract('{"duck": [1, 2, 3]}', '$.duck[#-1]');


-- 3

JSONPath also allows escaping syntax tokens, using double quotes:

SELECT json_extract('{"duck.goose": [1, 2, 3]}', '$."duck.goose"[1]');


-- 2

Examples using the anatidae biological family:

CREATE TABLE example (j JSON);


INSERT INTO example VALUES
('{ "family": "anatidae", "species": [ "duck", "goose", "swan", null ] }');

SELECT json(j) FROM example;


-- {"family":"anatidae","species":["duck","goose","swan",null]}
SELECT j.family FROM example;
-- "anatidae"
SELECT j.species[0] FROM example;
-- "duck"
SELECT json_valid(j) FROM example;
-- true
SELECT json_valid('{');
-- false
SELECT json_array_length('["duck", "goose", "swan", null]');
-- 4
SELECT json_array_length(j, 'species') FROM example;
-- 4
SELECT json_array_length(j, '/species') FROM example;
-- 4
SELECT json_array_length(j, '$.species') FROM example;
-- 4
SELECT json_array_length(j, ['$.species']) FROM example;

565
DuckDB Documentation

-- [4]
SELECT json_type(j) FROM example;
-- OBJECT
SELECT json_keys(j) FROM example;
-- [family, species]
SELECT json_structure(j) FROM example;
-- {"family":"VARCHAR","species":["VARCHAR"]}
SELECT json_structure('["duck", {"family": "anatidae"}]');
-- ["JSON"]
SELECT json_contains('{"key": "value"}', '"value"');
-- true
SELECT json_contains('{"key": 1}', '1');
-- true
SELECT json_contains('{"top_key": {"key": "value"}}', '{"key": "value"}');
-- true

JSON Extraction Functions

There are two extraction functions, which have their respective operators. The operators can only be used if the string is stored as the JSON
logical type. These functions supports the same two location notations as the previous functions.

Function Alias Operator

json_extract( json,path) json_extract_path ->


json_extract_string( json,path) json_extract_path_text ->>

Note that the equality comparison operator (=) has a higher precedence than the -> JSON extract operator. Therefore, surround the uses
of the -> operator with parentheses when making equality comparisons. For example:

SELECT ((JSON '{"field": 42}')->'field') = 42;

Note. Warning DuckDB's JSON data type uses 0‑based indexing.

Examples:

CREATE TABLE example (j JSON);


INSERT INTO example VALUES
(' { "family": "anatidae", "species": [ "duck", "goose", "swan", null ] }');

SELECT json_extract(j, '$.family') FROM example;


-- "anatidae"
SELECT j->'$.family' FROM example;
-- "anatidae"
SELECT j->'$.species[0]' FROM example;
-- "duck"
SELECT j->'$.species[*]' FROM example;
-- ["duck", "goose", "swan", null]
SELECT j->>'$.species[*]' FROM example;
-- [duck, goose, swan, null]
SELECT j->'$.species'->0 FROM example;
-- "duck"
SELECT j->'species'->['0','1'] FROM example;
-- ["duck", "goose"]
SELECT json_extract_string(j, '$.family') FROM example;
-- anatidae
SELECT j->>'$.family' FROM example;

566
DuckDB Documentation

-- anatidae
SELECT j->>'$.species[0]' FROM example;
-- duck
SELECT j->'species'->>0 FROM example;
-- duck
SELECT j->'species'->>['0','1'] FROM example;
-- [duck, goose]

Note that DuckDB's JSON data type uses 0‑based indexing.

If multiple values need to be extracted from the same JSON, it is more efficient to extract a list of paths:

-- The following will cause the JSON to be parsed twice,


-- resulting in a slower query that uses more memory
SELECT json_extract(j, 'family') AS family,
json_extract(j, 'species') AS species
FROM example;

-- The following is faster and more memory efficient


WITH extracted AS (
SELECT json_extract(j, ['family', 'species']) extracted_list
FROM example
)
SELECT extracted_list[1] AS family,
extracted_list[2] AS species
FROM extracted;

JSON Creation Functions

The following functions are used to create JSON.

Function Description

to_json( any) Create JSON from a value of any type. Our LIST is converted to a JSON array, and
our STRUCT and MAP are converted to a JSON object
json_quote( any) Alias for to_json
array_to_json( list) Alias for to_json that only accepts LIST
row_to_json( list) Alias for to_json that only accepts STRUCT
json_array([any, ...]) Create a JSON array from any number of values
json_object([key,value, ...]) Create a JSON object from any number of key, value pairs
json_merge_patch( json,json) Merge two JSON documents together

Examples:

SELECT to_json('duck');
-- "duck"
SELECT to_json([1, 2, 3]);
-- [1,2,3]
SELECT to_json({duck : 42});
-- {"duck":42}
SELECT to_json(map(['duck'],[42]));
-- {"duck":42}
SELECT json_array(42, 'duck', NULL);
-- [42,"duck",null]
SELECT json_object('duck', 42);

567
DuckDB Documentation

-- {"duck":42}
SELECT json_merge_patch('{"duck": 42}', '{"goose": 123}');
-- {"goose":123,"duck":42}

JSON Aggregate Functions

There are three JSON aggregate functions.

Function Description

json_group_array( any) Return a JSON array with all values of any in the aggregation
json_group_object( key, value) Return a JSON object with all key, value pairs in the aggregation
json_group_structure( json) Return the combined json_structure of all json in the aggregation

Examples:

CREATE TABLE example1 (k VARCHAR, v INTEGER);


INSERT INTO example1 VALUES ('duck', 42), ('goose', 7);

SELECT json_group_array(v) FROM example1;


-- [42, 7]
SELECT json_group_object(k, v) FROM example1;
-- {"duck":42,"goose":7}

CREATE TABLE example2 (j JSON);


INSERT INTO example2 VALUES
('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'),
('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}');

SELECT json_group_structure(j) FROM example2;


-- {"family":"VARCHAR","species":["VARCHAR"],"coolness":"DOUBLE","hair":"BOOLEAN"}

Transforming JSON

In many cases, it is inefficient to extract values from JSON one‑by‑one. Instead, we can ”extract” all values at once, transforming JSON to
the nested types LIST and STRUCT.

Function Description

json_transform( json, structure) Transform json according to the specified structure


from_json( json, structure) Alias for json_transform
json_transform_strict( json, structure) Same as json_transform, but throws an error when type
casting fails
from_json_strict( json, structure) Alias for json_transform_strict

The structure argument is JSON of the same form as returned by json_structure. The structure argument can be modified to
transform the JSON into the desired structure and types. It is possible to extract fewer key/value pairs than are present in the JSON, and it
is also possible to extract more: missing keys become NULL.

Examples:

CREATE TABLE example (j JSON);


INSERT INTO example VALUES
('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'),
('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}');

568
DuckDB Documentation

SELECT json_transform(j, '{"family": "VARCHAR", "coolness": "DOUBLE"}') FROM example;


-- {'family': anatidae, 'coolness': 42.420000}
-- {'family': canidae, 'coolness': NULL}

SELECT json_transform(j, '{"family": "TINYINT", "coolness": "DECIMAL(4, 2)"}') FROM example;


-- {'family': NULL, 'coolness': 42.42}
-- {'family': NULL, 'coolness': NULL}

SELECT json_transform_strict(j, '{"family": "TINYINT", "coolness": "DOUBLE"}') FROM example;


-- Invalid Input Error: Failed to cast value: "anatidae"

Serializing and Deserializing SQL to JSON and Vice Versa

The JSON extension also provides functions to serialize and deserialize SELECT statements between SQL and JSON, as well as executing
JSON serialized statements.

Function Type Description

json_deserialize_sql( json) Scalar Deserialize one or many json serialized statements back to an
equivalent sql string
json_execute_serialized_sql( Table Execute json serialized statements and return the resulting rows.
varchar) Only one statement at a time is supported for now.
json_serialize_sql( varchar, skip_ Scalar Serialize a set of ; separated select statments to an equivalent list of
empty :=boolean, skip_null json serialized statements
:=boolean, format :=boolean)
PRAGMA json_execute_serialized_ Pragma Pragma version of the json_execute_serialized_sql
sql( varchar) function.

The json_serialize_sql(varchar) function takes three optional parameters, skip_empty, skip_null, and format that can
be used to control the output of the serialized statements.

If you run the json_execute_serialize_sql(varchar) table function inside of a transaction the serialized statements will not
be able to see any transaction local changes. This is because the statements are executed in a separate query context. You can use the
PRAGMA json_execute_serialize_sql(varchar) pragma version to execute the statements in the same query context as the
pragma, although with the limitation that the serialized JSON must be provided as a constant string, i.e., you cannot do PRAGMA json_
execute_serialize_sql(json_serialize_sql(...)).

Note that these functions do not preserve syntactic sugar such as FROM * SELECT ..., so a statement round‑tripped through json_
deserialize_sql(json_serialize_sql(...)) may not be identical to the original statement, but should always be semanti‑
cally equivalent and produce the same output.

Examples:

-- Simple example
SELECT json_serialize_sql('SELECT 2');
-- '{"error":false,"statements":[{"node":{"type":"SELECT_NODE","modifiers":[],"cte_
map":{"map":[]},"select_list":[{"class":"CONSTANT","type":"VALUE_
CONSTANT","alias":"","value":{"type":{"id":"INTEGER","type_info":null},"is_
null":false,"value":2}}],"from_table":{"type":"EMPTY","alias":"","sample":null},"where_
clause":null,"group_expressions":[],"group_sets":[],"aggregate_handling":"STANDARD_
HANDLING","having":null,"sample":null,"qualify":null}}]}'

-- Example with multiple statements and skip options


SELECT json_serialize_sql('SELECT 1 + 2; SELECT a + b FROM tbl1', skip_empty := true, skip_null := true);

569
DuckDB Documentation

-- '{"error":false,"statements":[{"node":{"type":"SELECT_NODE","select_
list":[{"class":"FUNCTION","type":"FUNCTION","function_
name":"+","children":[{"class":"CONSTANT","type":"VALUE_CONSTANT","value":{"type":{"id":"INTEGER"},"is_
null":false,"value":1}},{"class":"CONSTANT","type":"VALUE_
CONSTANT","value":{"type":{"id":"INTEGER"},"is_null":false,"value":2}}],"order_bys":{"type":"ORDER_
MODIFIER"},"distinct":false,"is_operator":true,"export_state":false}],"from_
table":{"type":"EMPTY"},"aggregate_handling":"STANDARD_HANDLING"}},{"node":{"type":"SELECT_
NODE","select_list":[{"class":"FUNCTION","type":"FUNCTION","function_
name":"+","children":[{"class":"COLUMN_REF","type":"COLUMN_REF","column_names":["a"]},{"class":"COLUMN_
REF","type":"COLUMN_REF","column_names":["b"]}],"order_bys":{"type":"ORDER_
MODIFIER"},"distinct":false,"is_operator":true,"export_state":false}],"from_table":{"type":"BASE_
TABLE","table_name":"tbl1"},"aggregate_handling":"STANDARD_HANDLING"}}]}'

-- Example with a syntax error


SELECT json_serialize_sql('TOTALLY NOT VALID SQL');
-- '{"error":true,"error_type":"parser","error_message":"syntax error at or near \"TOTALLY\"\nLINE 1:
TOTALLY NOT VALID SQL\n ^"}'

-- Example with deserialize


SELECT json_deserialize_sql(json_serialize_sql('SELECT 1 + 2'));
-- 'SELECT (1 + 2)'

-- Example with deserialize and syntax sugar


SELECT json_deserialize_sql(json_serialize_sql('FROM x SELECT 1 + 2'));
-- 'SELECT (1 + 2) FROM x'

-- Example with execute


SELECT * FROM json_execute_serialized_sql(json_serialize_sql('SELECT 1 + 2'));
-- 3

-- Example with error


SELECT * FROM json_execute_serialized_sql(json_serialize_sql('TOTALLY NOT VALID SQL'));
-- Error: Parser Error: Error parsing json: parser: syntax error at or near "TOTALLY"

Indexing

Note. Warning Following PostgreSQL's conventions, DuckDB uses 1‑based indexing for arrays and lists but 0‑based indexing for the
JSON data type.

GitHub

The json extension is part of the main DuckDB repository.

MySQL Extension

The mysql extension allows DuckDB to directly read and write data from/to a running MySQL instance. The data can be queried directly
from the underlying MySQL database. Data can be loaded from MySQL tables into DuckDB tables, or vice versa.

Installing and Loading

To install the mysql extension, run:

INSTALL mysql;

The extension is loaded automatically upon first use. If you prefer to load it manually, run:

LOAD mysql;

570
DuckDB Documentation

Reading Data from MySQL

To make a MySQL database accessible to DuckDB use the ATTACH command:

ATTACH 'host=localhost user=root port=0 database=mysql' AS mysqldb (TYPE mysql)


USE mysqldb;

The connection string determines the parameters for how to connect to MySQL as a set of key=value pairs. Any options not provided are
replaced by their default values, as per the table below.

Setting Default

database NULL
host localhost
password
port 0
socket NULL
user current user

The tables in the MySQL database can be read as if they were normal DuckDB tables, but the underlying data is read directly from MySQL
at query time.

SHOW TABLES;

┌───────────────────────────────────────┐
│ name │
│ varchar │
├───────────────────────────────────────┤
│ signed_integers │
└───────────────────────────────────────┘

SELECT * FROM signed_integers;

┌──────┬────────┬──────────┬─────────────┬──────────────────────┐
│ t │ s │ m │ i │ b │
│ int8 │ int16 │ int32 │ int32 │ int64 │
├──────┼────────┼──────────┼─────────────┼──────────────────────┤
│ -128 │ -32768 │ -8388608 │ -2147483648 │ -9223372036854775808 │
│ 127 │ 32767 │ 8388607 │ 2147483647 │ 9223372036854775807 │
│ NULL │ NULL │ NULL │ NULL │ NULL │
└──────┴────────┴──────────┴─────────────┴──────────────────────┘

It might be desirable to create a copy of the MySQL databases in DuckDB to prevent the system from re‑reading the tables from MySQL
continuously, particularly for large tables.

Data can be copied over from MySQL to DuckDB using standard SQL, for example:

CREATE TABLE duckdb_table AS FROM mysqlscanner.mysql_table;

Writing Data to MySQL

In addition to reading data from MySQL, create tables, ingest data into MySQL and make other modifications to a MySQL database using
standard SQL queries.

This allows you to use DuckDB to, for example, export data that is stored in a MySQL database to Parquet, or read data from a Parquet file
into MySQL.

Below is a brief example of how to create a new table in MySQL and load data into it.

571
DuckDB Documentation

ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE mysql_scanner);


CREATE TABLE mysql_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO mysql_db.tbl VALUES (42, 'DuckDB');

Many operations on MySQL tables are supported. All these operations directly modify the MySQL database, and the result of subsequent
operations can then be read using MySQL. Note that if modifications are not desired, ATTACH can be run with the READ_ONLY property
which prevents making modifications to the underlying database. For example:

ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE mysql_scanner, READ_


ONLY);

Supported Operations

Below is a list of supported operations.

CREATE TABLE

CREATE TABLE mysql_db.tbl (id INTEGER, name VARCHAR);

INSERT INTO

INSERT INTO mysql_db.tbl VALUES (42, 'DuckDB');

SELECT

SELECT * FROM mysql_db.tbl;

┌───────┬─────────┐
│ id │ name │
│ int64 │ varchar │
├───────┼─────────┤
│ 42 │ DuckDB │
└───────┴─────────┘

COPY

COPY mysql_db.tbl TO 'data.parquet';


COPY mysql_db.tbl FROM 'data.parquet';

UPDATE

UPDATE mysql_db.tbl
SET name = 'Woohoo'
WHERE id = 42;

DELETE

DELETE FROM mysql_db.tbl


WHERE id = 42;

ALTER TABLE

ALTER TABLE mysql_db.tbl


ADD COLUMN k INTEGER;

572
DuckDB Documentation

DROP TABLE
DROP TABLE mysql_db.tbl;

CREATE VIEW
CREATE VIEW mysql_db.v1 AS SELECT 42;

CREATE SCHEMA and DROP SCHEMA


CREATE SCHEMA mysql_db.s1;
CREATE TABLE mysql_db.s1.integers (i INT);
INSERT INTO mysql_db.s1.integers VALUES (42);
SELECT * FROM mysql_db.s1.integers;

┌───────┐
│ i │
│ int32 │
├───────┤
│ 42 │
└───────┘

DROP SCHEMA mysql_db.s1;

Transactions
CREATE TABLE mysql_db.tmp (i INTEGER);
BEGIN;
INSERT INTO mysql_db.tmp VALUES (42);
SELECT * FROM mysql_db.tmp;

┌───────┐
│ i │
│ int64 │
├───────┤
│ 42 │
└───────┘

ROLLBACK;
SELECT * FROM mysql_db.tmp;

┌────────┐
│ i │
│ int64 │
├────────┤
│ 0 rows │
└────────┘

Note. The DDL statements are not transactional in MySQL.

Settings

Name Description Default

mysql_bit1_as_boolean Whether or not to convert BIT(1) columns to BOOLEAN true


mysql_debug_show_queries DEBUG SETTING: print all queries sent to MySQL to stdout false

573
DuckDB Documentation

Name Description Default

mysql_experimental_filter_ Whether or not to use filter pushdown (currently experimental) false


pushdown
mysql_tinyint1_as_boolean Whether or not to convert TINYINT(1) columns to BOOLEAN true

Schema Cache

To avoid having to continuously fetch schema data from MySQL, DuckDB keeps schema information ‑ such as the names of tables, their
columns, etc ‑ cached. If changes are made to the schema through a different connection to the MySQL instance, such as new columns being
added to a table, the cached schema information might be outdated. In this case, the function mysql_clear_cache can be executed
to clear the internal caches.

CALL mysql_clear_cache();

PostgreSQL Extension

The postgres extension allows DuckDB to directly read and write data from a running Postgres database instance. The data can be
queried directly from the underlying Postgres database. Data can be loaded from Postgres tables into DuckDB tables, or vice versa.See the
official announcement for implementation details and background.

Installing and Loading

To install the postgres extension, run:

INSTALL postgres;

The extension is loaded automatically upon first use. If you prefer to load it manually, run:

LOAD postgres;

Connecting

To make a PostgreSQL database accessible to DuckDB, use the ATTACH command:

-- connect to the "public" schema of the postgres instance running on localhost


ATTACH '' AS postgres_db (TYPE postgres);
-- connect to the Postgres instance with the given parameters
ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS db (TYPE postgres);

The ATTACH command takes as input either a libpq connection string or a PostgreSQL URI.

Below are some example connection strings and commonly used parameters. A full list of available parameters can be found in the Postgres
documentation.

dbname=postgresscanner
host=localhost port=5432 dbname=mydb connect_timeout=10

Name Description Default

dbname Database name [user]


host Name of host to connect to localhost
hostaddr Host IP address localhost

574
DuckDB Documentation

Name Description Default

passfile Name of file passwords are stored in ~/.pgpass


password Postgres password (empty)
port Port number 5432
user Postgres user name current user

An example URI is postgresql://username@hostname/dbname.

Postgres connection information can also be specified with environment variables. This can be useful in a production environment where
the connection information is managed externally and passed in to the environment.

export PGPASSWORD="secret"
export PGHOST=localhost
export PGUSER=owner
export PGDATABASE=mydatabase

To connect, start the duckdb process and run:

ATTACH '' AS p (TYPE postgres);

Usage

The tables in the PostgreSQL database can be read as if they were normal DuckDB tables, but the underlying data is read directly from
Postgres at query time.

SHOW TABLES;

┌───────────────────────────────────────┐
│ name │
│ varchar │
├───────────────────────────────────────┤
│ uuids │
└───────────────────────────────────────┘

SELECT * FROM uuids;

┌──────────────────────────────────────┐
│ u │
│ uuid │
├──────────────────────────────────────┤
│ 6d3d2541-710b-4bde-b3af-4711738636bf │
│ NULL │
│ 00000000-0000-0000-0000-000000000001 │
│ ffffffff-ffff-ffff-ffff-ffffffffffff │
└──────────────────────────────────────┘

It might be desirable to create a copy of the Postgres databases in DuckDB to prevent the system from re‑reading the tables from Postgres
continuously, particularly for large tables.

Data can be copied over from Postgres to DuckDB using standard SQL, for example:

CREATE TABLE duckdb_table AS FROM postgres_db.postgres_tbl;

575
DuckDB Documentation

Writing Data to Postgres

In addition to reading data from Postgres, the extension allows you to create tables, ingest data into Postgres and make other modifications
to a Postgres database using standard SQL queries.

This allows you to use DuckDB to, for example, export data that is stored in a Postgres database to Parquet, or read data from a Parquet file
into Postgres.

Below is a brief example of how to create a new table in Postgres and load data into it.

ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE postgres);


CREATE TABLE postgres_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO postgres_db.tbl VALUES (42, 'DuckDB');

Many operations on Postgres tables are supported. All these operations directly modify the Postgres database, and the result of subsequent
operations can then be read using Postgres. Note that if modifications are not desired, ATTACH can be run with the READ_ONLY property
which prevents making modifications to the underlying database. For example:

ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE postgres, READ_ONLY);

Below is a list of supported operations.

CREATE TABLE

CREATE TABLE postgres_db.tbl (id INTEGER, name VARCHAR);

INSERT INTO

INSERT INTO postgres_db.tbl VALUES (42, 'DuckDB');

SELECT

SELECT * FROM postgres_db.tbl;

┌───────┬─────────┐
│ id │ name │
│ int64 │ varchar │
├───────┼─────────┤
│ 42 │ DuckDB │
└───────┴─────────┘

COPY

COPY postgres_db.tbl TO 'data.parquet';


COPY postgres_db.tbl FROM 'data.parquet';

UPDATE

UPDATE postgres_db.tbl
SET name = 'Woohoo'
WHERE id = 42;

DELETE

DELETE FROM postgres_db.tbl


WHERE id = 42;

576
DuckDB Documentation

ALTER TABLE
ALTER TABLE postgres_db.tbl
ADD COLUMN k INTEGER;

DROP TABLE
DROP TABLE postgres_db.tbl;

CREATE VIEW
CREATE VIEW postgres_db.v1 AS SELECT 42;

CREATE SCHEMA / DROP SCHEMA


CREATE SCHEMA postgres_db.s1;
CREATE TABLE postgres_db.s1.integers (i INT);
INSERT INTO postgres_db.s1.integers VALUES (42);
SELECT * FROM postgres_db.s1.integers;

┌───────┐
│ i │
│ int32 │
├───────┤
│ 42 │
└───────┘

DROP SCHEMA postgres_db.s1;

Transactions
CREATE TABLE postgres_db.tmp (i INTEGER);
BEGIN;
INSERT INTO postgres_db.tmp VALUES (42);
SELECT * FROM postgres_db.tmp;

┌───────┐
│ i │
│ int64 │
├───────┤
│ 42 │
└───────┘

ROLLBACK;
SELECT * FROM postgres_db.tmp;

┌────────┐
│ i │
│ int64 │
├────────┤
│ 0 rows │
└────────┘

Running SQL Queries in Postgres with postgres_query

The postgres_query function allows you to run arbitrary SQL within an attached database. postgres_query takes the name of the
attached Postgres database to execute the query in, as well as the SQL query to execute. The result of the query is returned. Single‑quote
strings are escaped by repeating the single quote twice.

postgres_query(attached_database::VARCHAR, query::VARCHAR)

577
DuckDB Documentation

Example

ATTACH 'dbname=postgresscanner' AS s (TYPE POSTGRES);


SELECT * FROM postgres_query('s', 'SELECT * FROM cars LIMIT 3');

┌──────────────┬───────────┬─────────┐
│ brand │ model │ color │
│ varchar │ varchar │ varchar │
├──────────────┼───────────┼─────────┤
│ ferari │ testarosa │ red │
│ aston martin │ db2 │ blue │
│ bentley │ mulsanne │ gray │
└──────────────┴───────────┴─────────┘

Settings

The extension exposes the following configuration parameters.

Name Description Default

pg_debug_show_queries DEBUG SETTING: print all queries sent to Postgres to stdout false
pg_connection_cache Whether or not to use the connection cache true
pg_experimental_filter_ Whether or not to use filter pushdown (currently experimental) false
pushdown
pg_array_as_varchar Read Postgres arrays as varchar ‑ enables reading mixed dimensional arrays false
pg_connection_limit The maximum amount of concurrent Postgres connections 64
pg_pages_per_task The amount of pages per task 1000
pg_use_binary_copy Whether or not to use BINARY copy to read data true

Schema Cache

To avoid having to continuously fetch schema data from Postgres, DuckDB keeps schema information ‑ such as the names of tables, their
columns, etc. ‑ cached. If changes are made to the schema through a different connection to the Postgres instance, such as new columns
being added to a table, the cached schema information might be outdated. In this case, the function pg_clear_cache can be executed
to clear the internal caches.

CALL pg_clear_cache();

Note. Deprecated The old postgres_attach function is deprecated. It is recommended to switch over to the new ATTACH
syntax.

Spatial Extension

The spatial extension provides support for geospatial data processing in DuckDB. For an overview of the extension, see our blog post.

Installing and Loading

To install and load the spatial extension, run:

INSTALL spatial;
LOAD spatial;

578
DuckDB Documentation

GEOMETRY Type

The core of the spatial extension is the GEOMETRY type. If you're unfamiliar with geospatial data and GIS tooling, this type probably works
very different from what you'd expect.

In short, while the GEOMETRY type is a binary representation of ”geometry” data made up out of sets of vertices (pairs of X and Y dou-
ble precision floats), it actually stores one of several geometry subtypes. These are POINT, LINESTRING, POLYGON, as well as their
”collection” equivalents, MULTIPOINT, MULTILINESTRING and MULTIPOLYGON. Lastly there is GEOMETRYCOLLECTION, which can
contain any of the other subtypes, as well as other GEOMETRYCOLLECTIONs recursively.

This may seem strange at first, since DuckDB already have types like LIST, STRUCT and UNION which could be used in a similar way, but
the design and behaviour of the GEOMETRY type is actually based on the Simple Features geometry model, which is a standard used by
many other databases and GIS software.

That said, the spatial extension also includes a couple of experimental non‑standard explicit geometry types, such as POINT_2D,
LINESTRING_2D, POLYGON_2D and BOX_2D that are based on DuckDBs native nested types, such as structs and lists. In theory it
should be possible to optimize a lot of operations for these types much better than for the GEOMETRY type (which is just a binary blob),
but only a couple functions are implemented so far.

All of these are implicitly castable to GEOMETRY but with a conversion cost, so the GEOMETRY type is still the recommended type to use
for now if you are planning to work with a lot of different spatial functions.

GEOMETRY is not currently capable of storing additional geometry types, Z/M coordinates, or SRID information. These features may be
added in the future.

Spatial Scalar Functions

The spatial extension implements a large number of scalar functions and overloads. Most of these are implemented using the GEOS li‑
brary, but we'd like to implement more of them natively in this extension to better utilize DuckDB's vectorized execution and memory
management. The following symbols are used to indicate which implementation is used:

‑ GEOS ‑ functions that are implemented using the GEOS library

‑ DuckDB ‑ functions that are implemented natively in this extension that are capable of operating directly on the DuckDB types

‑ CAST(GEOMETRY) ‑ functions that are supported by implicitly casting to GEOMETRY and then using the GEOMETRY implementa‑
tion

The currently implemented spatial functions can roughly be categorized into the following groups:

Geometry Conversion Convert between geometries and other formats.

LINESTRING_
Scalar functions GEOMETRY POINT_2D 2D POLYGON_2D BOX_2D

VARCHAR ST_ (as POLYGON)


AsText(GEOMETRY)
WKB_BLOB ST_
AsWKB(GEOMETRY)
VARCHAR ST_
AsHEXWKB(GEOMETRY)
VARCHAR ST_ (as POLYGON)
AsGeoJSON(GEOMETRY)
GEOMETRY ST_ (as POLYGON)
GeomFromText(VARCHAR)

579
DuckDB Documentation

LINESTRING_
Scalar functions GEOMETRY POINT_2D 2D POLYGON_2D BOX_2D

GEOMETRY ST_ (as POLYGON)


GeomFromWKB(BLOB)
GEOMETRY ST_
GeomFromHEXWKB(VARCHAR)
GEOMETRY ST_
GeomFromGeoJSON(VARCHAR)

Geometry Construction Construct new geometries from other geometries or other data.

LINESTRING_
Scalar functions GEOMETRY POINT_2D 2D POLYGON_2D BOX_2D

GEOMETRY ST_
Point(DOUBLE, DOUBLE)
GEOMETRY ST_ (as POLYGON)
ConvexHull(GEOMETRY)
GEOMETRY ST_ (as POLYGON)
Boundary(GEOMETRY)
GEOMETRY ST_ (as POLYGON)
Buffer(GEOMETRY)
GEOMETRY ST_
Centroid(GEOMETRY)
GEOMETRY ST_
Collect(GEOMETRY[])
GEOMETRY ST_ (as POLYGON)
Normalize(GEOMETRY)
GEOMETRY ST_ (as POLYGON)
SimplifyPreserveTopology(GEOMETRY,
DOUBLE)
GEOMETRY ST_ (as POLYGON)
Simplify(GEOMETRY,
DOUBLE)
GEOMETRY ST_ (as POLYGON)
Union(GEOMETRY,
GEOMETRY)
GEOMETRY ST_ (as POLYGON)
Intersection(GEOMETRY,
GEOMETRY)
GEOMETRY ST_
MakeLine(GEOMETRY[])
GEOMETRY ST_ (as POLYGON)
Envelope(GEOMETRY)
GEOMETRY ST_
FlipCoordinates(GEOMETRY)

580
DuckDB Documentation

LINESTRING_
Scalar functions GEOMETRY POINT_2D 2D POLYGON_2D BOX_2D

GEOMETRY ST_
Transform(GEOMETRY,
VARCHAR, VARCHAR)
BOX_2D ST_
Extent(GEOMETRY)
GEOMETRY ST_
PointN(GEOMETRY,
INTEGER)
GEOMETRY ST_
StartPoint(GEOMETRY)
GEOMETRY ST_
EndPoint(GEOMETRY)
GEOMETRY ST_
ExteriorRing(GEOMETRY)
GEOMETRY ST_
Reverse(GEOMETRY)
GEOMETRY ST_ (as POLYGON )
RemoveRepeatedPoints(GEOMETRY)
GEOMETRY ST_ (as POLYGON )
RemoveRepeatedPoints(GEOMETRY,
DOUBLE)
GEOMETRY ST_ (as POLYGON )
ReducePrecision(GEOMETRY,
DOUBLE)
GEOMETRY ST_ (as POLYGON)
PointOnSurface(GEOMETRY)
GEOMETRY ST_
CollectionExtract(GEOMETRY)
GEOMETRY ST_
CollectionExtract(GEOMETRY,
INTEGER)

Spatial Properties Calculate and access spatial properties of geometries.

LINESTRING_
Scalar functions GEOMETRY POINT_2D 2D POLYGON_2D BOX_2D

DOUBLE ST_Area(GEOMETRY)
BOOLEAN ST_ (as POLYGON)
IsClosed(GEOMETRY)
BOOLEAN ST_ (as POLYGON)
IsEmpty(GEOMETRY)
BOOLEAN ST_ (as POLYGON)
IsRing(GEOMETRY)

581
DuckDB Documentation

LINESTRING_
Scalar functions GEOMETRY POINT_2D 2D POLYGON_2D BOX_2D

BOOLEAN ST_ (as POLYGON)


IsSimple(GEOMETRY)
BOOLEAN ST_ (as POLYGON)
IsValid(GEOMETRY)
DOUBLE ST_X(GEOMETRY)
DOUBLE ST_Y(GEOMETRY)
DOUBLE ST_XMax(GEOMETRY)
DOUBLE ST_YMax(GEOMETRY)
DOUBLE ST_XMin(GEOMETRY)
DOUBLE ST_YMin(GEOMETRY)
GeometryType ST_ (as POLYGON)
GeometryType(GEOMETRY)
DOUBLE ST_ (as POLYGON)
Length(GEOMETRY)
INTEGER ST_
NGeometries(GEOMETRY)
INTEGER ST_
NPoints(GEOMETRY)
INTEGER ST_
NInteriorRings(GEOMETRY)

Spatial Relationships Compute relationships and spatial predicates between geometries.

LINESTRING_
Scalar functions GEOMETRY POINT_2D 2D POLYGON_2D BOX_2D

BOOLEAN ST_ or (as POLYGON)


Within(GEOMETRY,
GEOMETRY)
BOOLEAN ST_ (as POLYGON)
Touches(GEOMETRY,
GEOMETRY)
BOOLEAN ST_ (as POLYGON)
Overlaps(GEOMETRY,
GEOMETRY)
BOOLEAN ST_ or (as POLYGON)
Contains(GEOMETRY,
GEOMETRY)
BOOLEAN ST_ (as POLYGON)
CoveredBy(GEOMETRY,
GEOMETRY)
BOOLEAN ST_ (as POLYGON)
Covers(GEOMETRY,
GEOMETRY)

582
DuckDB Documentation

LINESTRING_
Scalar functions GEOMETRY POINT_2D 2D POLYGON_2D BOX_2D

BOOLEAN ST_ (as POLYGON)


Crosses(GEOMETRY,
GEOMETRY)
BOOLEAN ST_ (as POLYGON)
Difference(GEOMETRY,
GEOMETRY)
BOOLEAN ST_ (as POLYGON)
Disjoint(GEOMETRY,
GEOMETRY)
BOOLEAN ST_
Intersects(GEOMETRY,
GEOMETRY)
BOOLEAN ST_ (as POLYGON)
Equals(GEOMETRY,
GEOMETRY)
DOUBLE ST_ or or (as POLYGON)
Distance(GEOMETRY,
GEOMETRY)
BOOLEAN ST_ (as POLYGON)
DWithin(GEOMETRY,
GEOMETRY, DOUBLE)
BOOLEAN ST_Intersects_
Extent(GEOMETRY,
GEOMETRY)

Spatial Aggregate Functions

Aggregate functions Implemented with

GEOMETRY ST_Envelope_Agg(GEOMETRY)
GEOMETRY ST_Union_Agg(GEOMETRY)
GEOMETRY ST_Intersection_Agg(GEOMETRY)

Spatial Table Functions

ST_Read() ‑ Read Spatial Data from Files The spatial extension provides a ST_Read table function based on the GDAL translator
library to read spatial data from a variety of geospatial vector file formats as if they were DuckDB tables. For example to create a new table
from a GeoJSON file, you can use the following query:

CREATE TABLE table AS SELECT * FROM ST_Read('some/file/path/filename.json');

ST_Read can take a number of optional arguments, the full signature is:

ST_Read(
VARCHAR,
sequential_layer_scan : BOOLEAN,
spatial_filter : WKB_BLOB,

583
DuckDB Documentation

open_options : VARCHAR[],
layer : VARCHAR,
allowed_drivers : VARCHAR[],
sibling_files : VARCHAR[],
spatial_filter_box : BOX_2D,
keep_wkb : BOOLEAN
)

• sequential_layer_scan (default: false): If set to true, the table function will scan through all layers sequentially and return
the first layer that matches the given layer name. This is required for some drivers to work properly, e.g., the OSM driver.
• spatial_filter (default: NULL): If set to a WKB blob, the table function will only return rows that intersect with the given WKB
geometry. Some drivers may support efficient spatial filtering natively, in which case it will be pushed down. Otherwise the filtering
is done by GDAL which may be much slower.
• open_options (default: []): A list of key‑value pairs that are passed to the GDAL driver to control the opening of the file. E.g., the
GeoJSON driver supports a FLATTEN_NESTED_ATTRIBUTES=YES option to flatten nested attributes.
• layer (default: NULL): The name of the layer to read from the file. If NULL, the first layer is returned. Can also be a layer index
(starting at 0).
• allowed_drivers (default: []): A list of GDAL driver names that are allowed to be used to open the file. If empty, all drivers are
allowed.
• sibling_files (default: []): A list of sibling files that are required to open the file. E.g., the ESRI Shapefile driver requires
a .shx file to be present. Although most of the time these can be discovered automatically.
• spatial_filter_box (default: NULL): If set to a BOX_2D, the table function will only return rows that intersect with the given
bounding box. Similar to spatial_filter.
• keep_wkb (default: false): If set, the table function will return geometries in a wkb_geometry column with the type WKB_BLOB
(which can be cast to BLOB) instead of GEOMETRY. This is useful if you want to use DuckDB with more exotic geometry subtypes that
DuckDB spatial doesnt support representing in the GEOMETRY type yet.

Note that GDAL is single‑threaded, so this table function will not be able to make full use of parallelism. We're planning to implement
support for the most common vector formats natively in this extension with additional table functions in the future.

We currently support over 50 different formats. You can generate the following table of supported GDAL drivers yourself by executing
SELECT * FROM ST_Drivers().

can_
short_name long_name create can_copy can_open help_url

ESRI Shapefile ESRI Shapefile true false true https://gdal.org/drivers/vector/


shapefile.html
MapInfo File MapInfo File true false true https://gdal.org/drivers/vector/
mitab.html
UK .NTF UK .NTF false false true https://gdal.org/drivers/vector/
ntf.html
LVBAG Kadaster LV BAG Extract 2.0 false false true https://gdal.org/drivers/vector/
lvbag.html
S57 IHO S‑57 (ENC) true false true https://gdal.org/drivers/vector/
s57.html
DGN Microstation DGN true false true https://gdal.org/drivers/vector/
dgn.html
OGR_VRT VRT ‑ Virtual Datasource false false true https://gdal.org/drivers/vector/
vrt.html
Memory Memory true false true
CSV Comma Separated Value (.csv) true false true https://gdal.org/drivers/vector/
csv.html

584
DuckDB Documentation

can_
short_name long_name create can_copy can_open help_url

GML Geography Markup Language true false true https://gdal.org/drivers/vector/


(GML) gml.html
GPX GPX true false true https://gdal.org/drivers/vector/
gpx.html
KML Keyhole Markup Language (KML) true false true https://gdal.org/drivers/vector/
kml.html
GeoJSON GeoJSON true false true https://gdal.org/drivers/vector/
geojson.html
GeoJSONSeq GeoJSON Sequence true false true https://gdal.org/drivers/vector/
geojsonseq.html
ESRIJSON ESRIJSON false false true https://gdal.org/drivers/vector/
esrijson.html
TopoJSON TopoJSON false false true https://gdal.org/drivers/vector/
topojson.html
OGR_GMT GMT ASCII Vectors (.gmt) true false true https://gdal.org/drivers/vector/
gmt.html
GPKG GeoPackage true true true https://gdal.org/drivers/vector/
gpkg.html
SQLite SQLite / Spatialite true false true https://gdal.org/drivers/vector/
sqlite.html
WAsP WAsP .map format true false true https://gdal.org/drivers/vector/
wasp.html
OpenFileGDB ESRI FileGDB true false true https://gdal.org/drivers/vector/
openfilegdb.html
DXF AutoCAD DXF true false true https://gdal.org/drivers/vector/
dxf.html
CAD AutoCAD Driver false false true https://gdal.org/drivers/vector/
cad.html
FlatGeobuf FlatGeobuf true false true https://gdal.org/drivers/vector/
flatgeobuf.html
Geoconcept Geoconcept true false true
GeoRSS GeoRSS true false true https://gdal.org/drivers/vector/
georss.html
VFK Czech Cadastral Exchange Data false false true https://gdal.org/drivers/vector/
Format vfk.html
PGDUMP PostgreSQL SQL dump true false false https://gdal.org/drivers/vector/
pgdump.html
OSM OpenStreetMap XML and PBF false false true https://gdal.org/drivers/vector/
osm.html
GPSBabel GPSBabel true false true https://gdal.org/drivers/vector/
gpsbabel.html
WFS OGC WFS (Web Feature Service) false false true https://gdal.org/drivers/vector/
wfs.html

585
DuckDB Documentation

can_
short_name long_name create can_copy can_open help_url

OAPIF OGC API ‑ Features false false true https://gdal.org/drivers/vector/


oapif.html
EDIGEO French EDIGEO exchange format false false true https://gdal.org/drivers/vector/
edigeo.html
SVG Scalable Vector Graphics false false true https://gdal.org/drivers/vector/
svg.html
ODS Open Document/ LibreOffice / true false true https://gdal.org/drivers/vector/
OpenOffice Spreadsheet ods.html
XLSX MS Office Open XML spreadsheet true false true https://gdal.org/drivers/vector/
xlsx.html
Elasticsearch Elastic Search true false true https://gdal.org/drivers/vector/
elasticsearch.html
Carto Carto true false true https://gdal.org/drivers/vector/
carto.html
AmigoCloud AmigoCloud true false true https://gdal.org/drivers/vector/
amigocloud.html
SXF Storage and eXchange Format false false true https://gdal.org/drivers/vector/
sxf.html
Selafin Selafin true false true https://gdal.org/drivers/vector/
selafin.html
JML OpenJUMP JML true false true https://gdal.org/drivers/vector/
jml.html
PLSCENES Planet Labs Scenes API false false true https://gdal.org/drivers/vector/
plscenes.html
CSW OGC CSW (Catalog Service for false false true https://gdal.org/drivers/vector/
the Web) csw.html
VDV VDV‑451/VDV‑452/INTREST Data true false true https://gdal.org/drivers/vector/
Format vdv.html
MVT Mapbox Vector Tiles true false true https://gdal.org/drivers/vector/
mvt.html
NGW NextGIS Web true true true https://gdal.org/drivers/vector/
ngw.html
MapML MapML true false true https://gdal.org/drivers/vector/
mapml.html
TIGER U.S. Census TIGER/Line false false true https://gdal.org/drivers/vector/
tiger.html
AVCBin Arc/Info Binary Coverage false false true https://gdal.org/drivers/vector/
avcbin.html
AVCE00 Arc/Info E00 (ASCII) Coverage false false true https://gdal.org/drivers/vector/
avce00.html

Note that far from all of these drivers have been tested properly, and some may require additional options to be passed to work as expected.
If you run into any issues please first consult the GDAL docs.

586
DuckDB Documentation

ST_ReadOsm() ‑ Read Compressed OSM Data The spatial extension also provides an experimental ST_ReadOsm() table function to
read compressed OSM data directly from a .osm.pbf file.

This will use multithreading and zero‑copy protobuf parsing which makes it a lot faster than using the st_read() OSM driver, but it only
outputs the raw OSM data (Nodes, Ways, Relations), without constructing any geometries. For node entities you can trivially construct
POINT geometries, but it is also possible to construct LINESTRING AND POLYGON by manually joining refs and nodes together in SQL.

Example usage:

SELECT *
FROM st_readosm('tmp/data/germany.osm.pbf')
WHERE tags['highway'] != []
LIMIT 5;

┌──────────────────────┬────────┬──────────────────────┬─────────┬────────────────────┬────────────┬───────────┬─
│ kind │ id │ tags │ refs │ lat │ lon │ ref_
roles │ ref_types │
│ enum('node', 'way'… │ int64 │ map(varchar, varch… │ int64[] │ double │ double │
varchar[] │ enum('node', 'way', … │
├──────────────────────┼────────┼──────────────────────┼─────────┼────────────────────┼────────────┼───────────┼─
│ node │ 122351 │ {bicycle=yes, butt… │ │ 53.5492951 │ 9.977553 │
│ │
│ node │ 122397 │ {crossing=no, high… │ │ 53.520990100000006 │ 10.0156924 │
│ │
│ node │ 122493 │ {TMC:cid_58:tabcd_… │ │ 53.129614600000004 │ 8.1970173 │
│ │
│ node │ 123566 │ {highway=traffic_s… │ │ 54.617268200000005 │ 8.9718171 │
│ │
│ node │ 125801 │ {TMC:cid_58:tabcd_… │ │ 53.070685000000005 │ 8.7819939 │
│ │
└──────────────────────┴────────┴──────────────────────┴─────────┴────────────────────┴────────────┴───────────┴─

Spatial Replacement Scans

The spatial extension also provides ”replacement scans” for common geospatial file formats, allowing you to query files of these formats
as if they were tables.

SELECT * FROM './path/to/some/shapefile/dataset.shp';

In practice this is just syntax‑sugar for calling ST_Read, so there is no difference in performance. If you want to pass additional options,
you should use the ST_Read table function directly.

The following formats are currently recognized by their file extension:

• ESRI ShapeFile, .shp


• GeoPackage, .gpkg
• FlatGeoBuf, .fgb

Similarly there is a .osm.pbf replacement scan for ST_ReadOsm.

Spatial Copy Functions

Much like the ST_Read table function the spatial extension provides a GDAL based COPY function to export duckdb tables to different
geospatial vector formats. For example to export a table to a GeoJSON file, with generated bounding boxes, you can use the following
query:

COPY table TO 'some/file/path/filename.geojson'


WITH (FORMAT GDAL, DRIVER 'GeoJSON', LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');

Available options:

587
DuckDB Documentation

• FORMAT: is the only required option and must be set to GDAL to use the GDAL based copy function.
• DRIVER: is the GDAL driver to use for the export. See the table above for a list of available drivers.
• LAYER_CREATION_OPTIONS: list of options to pass to the GDAL driver. See the GDAL docs for the driver you are using for a list of
available options.
• SRS: Set a spatial reference system as metadata to use for the export. This can be a WKT string, an EPSG code or a proj‑string, basically
anything you would normally be able to pass to GDAL/OGR. This will not perform any reprojection of the input geometry though, it
just sets the metadata if the target driver supports it.

Limitations

Raster types are not supported and there is currently no plan to add them to the extension.

SQLite Extension

The SQLite extension allows DuckDB to directly read and write data from a SQLite database file. The data can be queried directly from the
underlying SQLite tables. Data can be loaded from SQLite tables into DuckDB tables, or vice versa.

Installing and Loading

To install the sqlite extension, run:

INSTALL sqlite;

The extension is loaded automatically upon first use. If you prefer to load it manually, run:

LOAD sqlite;

Usage

To make a SQLite file accessible to DuckDB, use the ATTACH statement, which supports read & write.

For example with the sakila.db file:

ATTACH 'sakila.db' (TYPE SQLITE);


USE sakila;

The tables in the file can be read as if they were normal DuckDB tables, but the underlying data is read directly from the SQLite tables in
the file at query time.

SHOW TABLES;

┌────────────────────────┐
│ name │
├────────────────────────┤
│ actor │
│ address │
│ category │
│ city │
│ country │
│ customer │
│ customer_list │
│ film │
│ film_actor │
│ film_category │
│ film_list │
│ film_text │

588
DuckDB Documentation

│ inventory │
│ language │
│ payment │
│ rental │
│ sales_by_film_category │
│ sales_by_store │
│ staff │
│ staff_list │
│ store │
└────────────────────────┘

You can query the tables using SQL, e.g., using the example queries from sakila-examples.sql:

SELECT
cat.name AS category_name,
sum(ifnull(pay.amount, 0)) AS revenue
FROM category cat
LEFT JOIN film_category flm_cat
ON cat.category_id = flm_cat.category_id
LEFT JOIN film fil
ON flm_cat.film_id = fil.film_id
LEFT JOIN inventory inv
ON fil.film_id = inv.film_id
LEFT JOIN rental ren
ON inv.inventory_id = ren.inventory_id
LEFT JOIN payment pay
ON ren.rental_id = pay.rental_id
GROUP BY cat.name
ORDER BY revenue DESC
LIMIT 5;

Data Types

SQLite is a weakly typed database system. As such, when storing data in a SQLite table, types are not enforced. The following is valid SQL
in SQLite:

CREATE TABLE numbers (i INTEGER);


INSERT INTO numbers VALUES ('hello');

DuckDB is a strongly typed database system, as such, it requires all columns to have defined types and the system rigorously checks data
for correctness.

When querying SQLite, DuckDB must deduce a specific column type mapping. DuckDB follows SQLite's type affinity rules with a few exten‑
sions.

1. If the declared type contains the string INT then it is translated into the type BIGINT
2. If the declared type of the column contains any of the strings CHAR, CLOB, or TEXT then it is translated into VARCHAR.
3. If the declared type for a column contains the string BLOB or if no type is specified then it is translated into BLOB.
4. If the declared type for a column contains any of the strings REAL, FLOA, DOUB, DEC or NUM then it is translated into DOUBLE.
5. If the declared type is DATE, then it is translated into DATE.
6. If the declared type contains the string TIME, then it is translated into TIMESTAMP.
7. If none of the above apply, then it is translated into VARCHAR.

As DuckDB enforces the corresponding columns to contain only correctly typed values, we cannot load the string ”hello” into a column of
type BIGINT. As such, an error is thrown when reading from the ”numbers” table above:

Error: Mismatch Type Error: Invalid type in column "i": column was declared as integer, found "hello" of
type "text" instead.

This error can be avoided by setting the sqlite_all_varchar option:

589
DuckDB Documentation

SET GLOBAL sqlite_all_varchar = true;

When set, this option overrides the type conversion rules described above, and instead always converts the SQLite columns into a VARCHAR
column. Note that this setting must be set before sqlite_attach is called.

Opening SQLite Databases Directly

SQLite databases can also be opened directly and can be used transparently instead of a DuckDB database file. In any client, when con‑
necting, a path to a SQLite database file can be provided and the SQLite database will be opened instead.

For example, with the shell:

duckdb data/db/sakila.db

SHOW tables;

┌────────────┐
│ name │
│ varchar │
├────────────┤
│ actor │
│ address │
│ category │
│ · │
│ staff_list │
│ store │
├────────────┤
│ 21 rows │
│ (5 shown) │
└────────────┘

Writing Data to SQLite

In addition to reading data from SQLite, the extension also allows you to create new SQLite database files, create tables, ingest data into
SQLite and make other modifications to SQLite database files using standard SQL queries.

This allows you to use DuckDB to, for example, export data that is stored in a SQLite database to Parquet, or read data from a Parquet file
into SQLite.

Below is a brief example of how to create a new SQLite database and load data into it.

ATTACH 'new_sqlite_database.db' AS sqlite_db (TYPE SQLITE);


CREATE TABLE sqlite_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO sqlite_db.tbl VALUES (42, 'DuckDB');

The resulting SQLite database can then be read into from SQLite.

sqlite3 new_sqlite_database.db

SQLite version 3.39.5 2022-10-14 20:58:05


sqlite> SELECT * FROM tbl;

id name
-- ------
42 DuckDB

Many operations on SQLite tables are supported. All these operations directly modify the SQLite database, and the result of subsequent
operations can then be read using SQLite.

Below is a list of supported operations.

590
DuckDB Documentation

CREATE TABLE
CREATE TABLE sqlite_db.tbl (id INTEGER, name VARCHAR);

INSERT INTO
INSERT INTO sqlite_db.tbl VALUES (42, 'DuckDB');

SELECT
SELECT * FROM sqlite_db.tbl;

┌───────┬─────────┐
│ id │ name │
│ int64 │ varchar │
├───────┼─────────┤
│ 42 │ DuckDB │
└───────┴─────────┘

COPY
COPY sqlite_db.tbl TO 'data.parquet';
COPY sqlite_db.tbl FROM 'data.parquet';

UPDATE
UPDATE sqlite_db.tbl SET name = 'Woohoo' WHERE id = 42;

DELETE
DELETE FROM sqlite_db.tbl WHERE id = 42;

ALTER TABLE
ALTER TABLE sqlite_db.tbl ADD COLUMN k INTEGER;

DROP TABLE
DROP TABLE sqlite_db.tbl;

CREATE VIEW
CREATE VIEW sqlite_db.v1 AS SELECT 42;

Transactions
CREATE TABLE sqlite_db.tmp (i INTEGER);
BEGIN;
INSERT INTO sqlite_db.tmp VALUES (42);
SELECT * FROM sqlite_db.tmp;

┌───────┐
│ i │
│ int64 │
├───────┤
│ 42 │
└───────┘

591
DuckDB Documentation

ROLLBACK;
SELECT * FROM sqlite_db.tmp;

┌────────┐
│ i │
│ int64 │
├────────┤
│ 0 rows │
└────────┘

Note. Deprecated The old sqlite_attach function is deprecated. It is recommended to switch over to the new ATTACH syntax.

Substrait Extension

The main goal of the substrait extension is to support both production and consumption of Substrait query plans in DuckDB.

This extension is mainly exposed via 3 different APIs ‑ the SQL API, the Python API, and the R API. Here we depict how to consume and
produce Substrait query plans in each API.

Note. The Substrait integration is currently experimental. Support is currently only available on request. If you have not asked for
permission to ask for support, contact us prior to opening an issue. If you open an issue without doing so, we will close it without
further review.

Installing and Loading

The Substrait extension is an autoloadable extensions, meaning that it will be loaded at runtime whenever one of the substrait functions
is called. To explicitly install and load the released version of the Substrait extension, you can also use the following SQL commands.

INSTALL substrait;
LOAD substrait;

SQL

In the SQL API, users can generate Substrait plans (into a BLOB or a JSON) and consume Substrait plans.

BLOB Generation To generate a Substrait BLOB the get_substrait(sql) function must be called with a valid SQL select query.

CREATE TABLE crossfit (exercise TEXT, difficulty_level INT);


INSERT INTO crossfit VALUES ('Push Ups', 3), ('Pull Ups', 5) , ('Push Jerk', 7), ('Bar Muscle Up', 10);

.mode line
CALL get_substrait('SELECT count(exercise) AS exercise FROM crossfit WHERE difficulty_level <= 5');

Plan BLOB = \x12\x09\x1A\x07\x10\x01\x1A\x03lte\x12\x11\x1A\x0F\x10\x02\x1A\x0Bis_not_


null\x12\x09\x1A\x07\x10\x03\x1A\x03and\x12\x0B\x1A\x09\x10\x04\x1A\x05count\x1A\xC8\x01\x12\xC5\x01\x0A\xB8\x01:
level\x12\x11\x0A\x07\xB2\x01\x04\x08\x0D\x18\x01\x0A\x04*\x02\x10\x01\x18\x02\x1AJ\x1AH\x08\x03\x1A\x04\x0A\x02\
\x1A\x1E\x08\x01\x1A\x04*\x02\x10\x01\x22\x0C\x1A\x0A\x12\x08\x0A\x04\x12\x02\x08\x01\x22\x00\x22\x06\x1A\x04\x0A

JSON Generation To generate a JSON representing the Substrait plan the get_substrait_json(sql) function must be called with
a valid SQL select query.

CALL get_substrait_json('SELECT count(exercise) AS exercise FROM crossfit WHERE difficulty_level <= 5');

592
DuckDB Documentation

Json =
{"extensions":[{"extensionFunction":{"functionAnchor":1,"name":"lte"}},{"extensionFunction":{"functionAnchor":2,
not_
null"}},{"extensionFunction":{"functionAnchor":3,"name":"and"}},{"extensionFunction":{"functionAnchor":4,"name":
level"],"struct":{"types":[{"varchar":{"length":13,"nullability":"NULLABILITY_
NULLABLE"}},{"i32":{"nullability":"NULLABILITY_NULLABLE"}}],"nullability":"NULLABILITY_
REQUIRED"}},"filter":{"scalarFunction":{"functionReference":3,"outputType":{"bool":{"nullability":"NULLABILITY_
NULLABLE"}},"arguments":[{"value":{"scalarFunction":{"functionReference":1,"outputType":{"i32":{"nullability":"N
NULLABLE"}},"arguments":[{"value":{"selection":{"directReference":{"structField":{"field":1}},"rootReference":{}
NULLABLE"}},"arguments":[{"value":{"selection":{"directReference":{"structField":{"field":1}},"rootReference":{}
NULLABLE"}}}}]}},"expressions":[{"selection":{"directReference":{"structField":{}},"rootReference":{}}}]}},"name

BLOB Consumption To consume a Substrait BLOB the from_substrait(blob) function must be called with a valid Substrait BLOB
plan.

CALL from_substrait('\x12\x09\x1A\x07\x10\x01\x1A\x03lte\x12\x11\x1A\x0F\x10\x02\x1A\x0Bis_not_
null\x12\x09\x1A\x07\x10\x03\x1A\x03and\x12\x0B\x1A\x09\x10\x04\x1A\x05count\x1A\xC8\x01\x12\xC5\x01\x0A\xB8\x01:
level\x12\x11\x0A\x07\xB2\x01\x04\x08\x0D\x18\x01\x0A\x04*\x02\x10\x01\x18\x02\x1AJ\x1AH\x08\x03\x1A\x04\x0A\x02\
\x1A\x1E\x08\x01\x1A\x04*\x02\x10\x01\x22\x0C\x1A\x0A\x12\x08\x0A\x04\x12\x02\x08\x01\x22\x00\x22\x06\x1A\x04\x0A

exercise = 2

Python

Substrait extension is autoloadable, but if you prefer to do so explicitly, you can use the relevant Python syntax within a connection:

import duckdb

con = duckdb.connect()
con.install_extension("substrait")
con.load_extension("substrait")

BLOB Generation To generate a Substrait BLOB the get_substrait(sql) function must be called, from a connection, with a valid
SQL select query.

con.execute(query = "CREATE TABLE crossfit (exercise TEXT, difficulty_level INT)")


con.execute(query = "INSERT INTO crossfit VALUES ('Push Ups', 3), ('Pull Ups', 5) , ('Push Jerk', 7),
('Bar Muscle Up', 10)")

proto_bytes = con.get_substrait(query="SELECT count(exercise) AS exercise FROM crossfit WHERE


difficulty_level <= 5").fetchone()[0]

JSON Generation To generate a JSON representing the Substrait plan the get_substrait_json(sql) function, from a connection,
must be called with a valid SQL select query.

json = con.get_substrait_json("SELECT count(exercise) AS exercise FROM crossfit WHERE difficulty_level


<= 5").fetchone()[0]

BLOB Consumption To consume a Substrait BLOB the from_substrait(blob) function must be called, from the connection, with
a valid Substrait BLOB plan.

query_result = con.from_substrait(proto=proto_bytes)

593
DuckDB Documentation

By default the extension will be autoloaded on first use. To explicitly install and load this extension in R, use the following commands:

library("duckdb")
con <- dbConnect(duckdb::duckdb())
dbExecute(con, "INSTALL substrait")
dbExecute(con, "LOAD substrait")

BLOB Generation To generate a Substrait BLOB the duckdb_get_substrait(con, sql) function must be called, with a connec‑
tion and a valid SQL select query.

dbExecute(con, "CREATE TABLE crossfit (exercise TEXT, difficulty_level INT)")


dbExecute(con, "INSERT INTO crossfit VALUES ('Push Ups', 3), ('Pull Ups', 5) , ('Push Jerk', 7), ('Bar
Muscle Up', 10)")

proto_bytes <- duckdb::duckdb_get_substrait(con, "SELECT * FROM crossfit LIMIT 5")

JSON Generation To generate a JSON representing the Substrait plan duckdb_get_substrait_json(con, sql) function, with
a connection and a valid SQL select query.

json <- duckdb::duckdb_get_substrait_json(con, "SELECT count(exercise) AS exercise FROM crossfit WHERE


difficulty_level <= 5")

BLOB Consumption To consume a Substrait BLOB the duckdb_prepare_substrait(con, blob) function must be called, with
a connection and a valid Substrait BLOB plan.

result <- duckdb::duckdb_prepare_substrait(con, proto_bytes)


df <- dbFetch(result)

TPC‑DS Extension

The tpcds extension implements the data generator and queries for the TPC‑DS benchmark.

Installing and Loading

The tpcds extension will be transparently autoloaded on first use from the official extension repository. If you would like to install and
load it manually, run:

INSTALL tpcds;
LOAD tpcds;

Usage

To generate data for scale factor 1, use:

CALL dsdgen(sf = 1);

To run a query, e.g., query 8, use:

PRAGMA tpcds(8);

594
DuckDB Documentation

┌──────────────┬────────────────────┐
│ s_store_name │ sum(ss_net_profit) │
│ varchar │ decimal(38,2) │
├──────────────┼────────────────────┤
│ able │ -10354620.18 │
│ ation │ -10576395.52 │
│ bar │ -10625236.01 │
│ ese │ -10076698.16 │
│ ought │ -10994052.78 │
└──────────────┴────────────────────┘

Limitations

The tpchds({query_id}) function runs a fixed TPC‑DS query with pre‑defined bind parameters (a.k.a. substitution parameters). It is
not possible to change the query parameters using the tpcds extension.

GitHub

The tpcds extension is part of the main DuckDB repository.

TPC‑H Extension

The tpch extension implements the data generator and queries for the TPC‑H benchmark.

Installing and Loading

The tpch extension is shipped by default in some DuckDB builds, otherwise it will be transparently autoloaded on first use. If you would
like to install and load it manually, run:

INSTALL tpch;
LOAD tpch;

Usage

Generating Data To generate data for scale factor 1, use:

CALL dbgen(sf = 1);

Calling dbgen does not clean up existing TPC‑H tables. To clean up existing tables, use DROP TABLE before running dbgen:

DROP TABLE IF EXISTS customer;


DROP TABLE IF EXISTS lineitem;
DROP TABLE IF EXISTS nation;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS part;
DROP TABLE IF EXISTS partsupp;
DROP TABLE IF EXISTS region;
DROP TABLE IF EXISTS supplier;

595
DuckDB Documentation

Running a Query To run a query, e.g., query 4, use:

PRAGMA tpch(4);

┌─────────────────┬─────────────┐
│ o_orderpriority │ order_count │
│ varchar │ int64 │
├─────────────────┼─────────────┤
│ 1-URGENT │ 21188 │
│ 2-HIGH │ 20952 │
│ 3-MEDIUM │ 20820 │
│ 4-NOT SPECIFIED │ 21112 │
│ 5-LOW │ 20974 │
└─────────────────┴─────────────┘

Listing Queries To list all 22 queries, run:

FROM tpch_queries();

This function returns a table with columns query_nr and query.

Listing Expected Answers To produced the expected results for all queries on scale factors 0.01, 0.1, and 1, run:

FROM tpch_answers();

This function returns a table with columns query_nr, scale_factor, and answer.

Data Generator Parameters

The data generator function dbgen has the following parameters:

Name Type Description

catalog VARCHAR Target catalog


children UINTEGER Number of partitions
overwrite BOOLEAN (Not used)
sf DOUBLE Scale factor
step UINTEGER Defines the partition to be generated, indexed from 0 to children ‑ 1. Must be defined when the
children arguments is defined
suffix VARCHAR Append the suffix to table names

Generating Larger Than Memory Data Sets

To generate data sets for large scale factors, which yield larger than memory data sets, run the dbgen function in steps. For example, you
may generate SF300 in 10 steps:

CALL dbgen(sf = 300, children = 10, step = 0);


CALL dbgen(sf = 300, children = 10, step = 1);
...
CALL dbgen(sf = 300, children = 10, step = 9);

596
DuckDB Documentation

Limitations

• The data generator function dbgen is single‑threaded and does not support concurrency. Running multiple steps to parallelize over
different partitions is also not supported at the moment.
• The tpch({query_id}) function runs a fixed TPC‑H query with pre‑defined bind parameters (a.k.a. substitution parameters). It
is not possible to change the query parameters using the tpch extension.

GitHub

The tpch extension is part of the main DuckDB repository.

597
DuckDB Documentation

598
DuckDB Documentation

Guides

599
Data Import & Export

Data Import Overview

When importing data from other systems to DuckDB, there are several considerations to take into account. This page documents the key
approaches recommended to bulk import data to DuckDB.

Recommended Import Methods

1. For systems which are supported by a DuckDB scanner extension, it's preferable to use the scanner. DuckDB currently offers scanners
for MySQL, PostgreSQL, and SQLite.
2. If there is a bulk export feature in the data source system, export the data to Parquet or CSV format, then load it using DuckDB's
Parquet or CSV loader.
3. If the approaches above are not applicable, consider using the DuckDB appender, currently available in the C, C++, Go, Java, and
Rust APIs.
4. If the data source system supports Apache Arrow and the data transfer is a recurring task, consider using the DuckDB Arrow extension.

Methods to Avoid

If possible, avoid looping row‑by‑row (tuple‑at‑a‑time) in favor of bulk operations. Performing row‑by‑row inserts (even with prepared
statements) is detrimental to performance and will result in slow load times.

Note. Bestpractice Unless your data is small (<100k rows), avoid using inserts in loops.

CSV Import

To read data from a CSV file, use the read_csv function in the FROM clause of a query.

SELECT * FROM read_csv('input.csv');

To create a new table using the result from a query, use CREATE TABLE AS from a SELECT statement.

CREATE TABLE new_tbl AS SELECT * FROM read_csv('input.csv');

We can use DuckDB's optional FROM‑first syntax to omit SELECT *:

CREATE TABLE new_tbl AS FROM read_csv('input.csv');

To load data into an existing table from a query, use INSERT INTO from a SELECT statement.

INSERT INTO tbl SELECT * FROM read_csv('input.csv');

Alternatively, the COPY statement can also be used to load data from a CSV file into an existing table.

COPY tbl FROM 'input.csv';

For additional options, see the CSV Import reference and the COPY statement documentation.

601
DuckDB Documentation

CSV Export

To export the data from a table to a CSV file, use the COPY statement.

COPY tbl TO 'output.csv' (HEADER, DELIMITER ',');

The result of queries can also be directly exported to a CSV file.

COPY (SELECT * FROM tbl) TO 'output.csv' (HEADER, DELIMITER ',');

For additional options, see the COPY statement documentation.

Parquet Import

To read data from a Parquet file, use the read_parquet function in the FROM clause of a query.

SELECT * FROM read_parquet('input.parquet');

To create a new table using the result from a query, use CREATE TABLE AS from a SELECT statement.

CREATE TABLE new_tbl AS SELECT * FROM read_parquet('input.parquet');

To load data into an existing table from a query, use INSERT INTO from a SELECT statement.

INSERT INTO tbl SELECT * FROM read_parquet('input.parquet');

Alternatively, the COPY statement can also be used to load data from a Parquet file into an existing table.

COPY tbl FROM 'input.parquet' (FORMAT PARQUET);

For additional options, see the Parquet Loading reference.

Parquet Export

To export the data from a table to a Parquet file, use the COPY statement.

COPY tbl TO 'output.parquet' (FORMAT PARQUET);

The result of queries can also be directly exported to a Parquet file.

COPY (SELECT * FROM tbl) TO 'output.parquet' (FORMAT PARQUET);

The flags for setting compression, row group size, etc. are listed in the Reading and Writing Parquet files page.

Querying Parquet Files

To run a query directly on a Parquet file, use the read_parquet function in the FROM clause of a query.

SELECT * FROM read_parquet('input.parquet');

The Parquet file will be processed in parallel. Filters will be automatically pushed down into the Parquet scan, and only the relevant columns
will be read automatically.

For more information see the blog post ”Querying Parquet with Precision using DuckDB”.

602
DuckDB Documentation

HTTP Parquet Import

To load a Parquet file over HTTP(S), the httpfs extension is required. This can be installed use the INSTALL SQL command. This only
needs to be run once.

INSTALL httpfs;

To load the httpfs extension for usage, use the LOAD SQL command:

LOAD httpfs;

After the httpfs extension is set up, Parquet files can be read over http(s):

SELECT * FROM read_parquet('https:// domain /path/to/file.parquet');

For example:

SELECT * FROM read_parquet('https://duckdb.org/data/prices.parquet');

The function read_parquet can be omitted if the URL ends with .parquet:

SELECT * FROM read_parquet('https://duckdb.org/data/holdings.parquet');

Moreover, the read_parquet function itself can also be omitted thanks to DuckDB's replacement scan mechanism:

SELECT * FROM 'https://duckdb.org/data/holdings.parquet';

S3 Parquet Import

Prerequisites

To load a Parquet file from S3, the httpfs extension is required. This can be installed use the INSTALL SQL command. This only needs
to be run once.

INSTALL httpfs;

To load the httpfs extension for usage, use the LOAD SQL command:

LOAD httpfs;

Credentials and Configuration

After loading the httpfs extension, set up the credentials and S3 region to read data:

CREATE SECRET (
TYPE S3,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
REGION 'us-east-1'
);

Note. Tip If you get an IO Error (Connection error for HTTP HEAD), configure the endpoint explicitly via ENDPOINT
's3. your-region .amazonaws.com'.

Alternatively, use the aws extension to retrieve the credentials automatically:

CREATE SECRET (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);

603
DuckDB Documentation

Querying

After the httpfs extension is set up and the S3 configuration is set correctly, Parquet files can be read from S3 using the following com‑
mand:

SELECT * FROM read_parquet('s3:// bucket / file ');

Google Cloud Storage (GCS) and Cloudflare R2

DuckDB can also handle Google Cloud Storage (GCS) and Cloudflare R2 via the S3 API. See the relevant guides for details.

S3 Parquet Export

To write a Parquet file to S3, the httpfs extension is required. This can be installed use the INSTALL SQL command. This only needs to
be run once.

INSTALL httpfs;

To load the httpfs extension for usage, use the LOAD SQL command:

LOAD httpfs;

After loading the httpfs extension, set up the credentials to write data. Note that the region parameter should match the region of the
bucket you want to access.

CREATE SECRET (
TYPE S3,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
REGION 'us-east-1'
);

Note. Tip If you get an IO Error (Connection error for HTTP HEAD), configure the endpoint explicitly via ENDPOINT
's3. your-region .amazonaws.com'.

Alternatively, use the aws extension to retrieve the credentials automatically:

CREATE SECRET (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);

After the httpfs extension is set up and the S3 credentials are correctly configured, Parquet files can be written to S3 using the following
command:

COPY table_name TO 's3://bucket/file.parquet';

Similarly, Google Cloud Storage (GCS) is supported through the Interoperability API. You need to create HMAC keys and provide the creden‑
tials as follows:

CREATE SECRET (
TYPE GCS,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

After setting up the GCS credentials, you can export using:

COPY table_name TO 'gs://gcs_bucket/file.parquet';

604
DuckDB Documentation

S3 Iceberg Import

Prerequisites

To load an Iceberg file from S3, both the httpfs and iceberg extensions are required. They can be installed use the INSTALL SQL
command. The extensions only need to be installed once.

INSTALL httpfs;
INSTALL iceberg;

To load the extensions for usage, use the LOAD command:

LOAD httpfs;
LOAD iceberg;

Credentials

After loading the extensions, set up the credentials and S3 region to read data. You may either use an access key and secret, or a token.

CREATE SECRET (
TYPE S3,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
REGION 'us-east-1'
);

Alternatively, use the aws extension to retrieve the credentials automatically:

CREATE SECRET (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);

Loading Iceberg Tables from S3

After the extensions are set up and the S3 credentials are correctly configured, Iceberg table can be read from S3 using the following com‑
mand:

SELECT *
FROM iceberg_scan('s3:// bucket / iceberg-table-folder /metadata/ id .metadata.json')

Note that you need to link directly to the manifest file. Otherwise you'll get an error like this:

Error: IO Error: Cannot open file "s3:// bucket / iceberg-table-folder /metadata/version-hint.text": No


such file or directory

S3 Express One

In late 2023, AWS announced the S3 Express One Zone, a high‑speed variant of traditional S3 buckets. DuckDB can read S3 Express One
buckets using the httpfs extension.

Credentials and Configuration

The configuration of S3 Express One buckets is similar to regular S3 buckets with one exception: we have to specify the endpoint according
to the following pattern:

s3express- availability zone . region .amazonaws.com

605
DuckDB Documentation

where the availability zone (e.g., use-az5) can be obtained from the S3 Express One bucket's configuration page and the
region is the AWS region (e.g., us-east-1).

For example, to allow DuckDB to use an S3 Express One bucket, configure the Secrets manager as follows:

CREATE SECRET (
TYPE S3,
REGION 'us-east-1',
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
ENDPOINT 's3express-use1-az5.us-east-1.amazonaws.com'
);

Querying

You can query the S3 Express One bucket as any other S3 bucket:

SELECT *
FROM 's3://express-bucket-name--use1-az5--x-s3/my-file.parquet';

Performance

We ran two experiments on a c7gd.12xlarge instance using the LDBC SF300 Comments creationDate Parquet file file (also used in
the microbenchmarks of the performance guide).

Experiment File size Runtime

Loading only from Parquet 4.1 GB 3.5s


Creating local table from Parquet 4.1 GB 5.1s

The ”loading only” variant is running the load as part of an EXPLAIN ANALYZE statement to measure the runtime without account
creating a local table, while the ”creating local table” variant uses CREATE TABLE ... AS to create a persistent table on the local
disk.

GCS Import

Prerequisites

For Google Cloud Storage (GCS), the Interoperability API enables you to have access to it like an S3 connection using the httpfs extension.
This can be installed use the INSTALL SQL command. This only needs to be run once.

Credentials and Configuration

You need to create HMAC keys and declare them:

CREATE SECRET (
TYPE GCS,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

606
DuckDB Documentation

Querying

After setting up the GCS credentials, you can query the GCS data using:

SELECT *
FROM read_parquet('gs:// gcs_bucket / file ');

Cloudflare R2 Import

Prerequisites

For Cloudflare R2, the S3 Compatibility API allows you to use DuckDB's S3 support to read and write from R2 buckets. This requires the
httpfs extension, which can be installed use the INSTALL SQL command. This only needs to be run once.

Credentials and Configuration

You will need to generate an S3 auth token and create an R2 secret in DuckDB:

CREATE SECRET (
TYPE R2,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
ACCOUNT_ID 'my_account_id'
);

Querying

After setting up the R2 credentials, you can query the R2 data using:

SELECT * FROM read_parquet('r2:// r2_bucket_name / file ');

JSON Import

To read data from a JSON file, use the read_json_auto function in the FROM clause of a query.

SELECT * FROM read_json_auto('input.json');

To create a new table using the result from a query, use CREATE TABLE AS from a SELECT statement.

CREATE TABLE new_tbl AS SELECT * FROM read_json_auto('input.json');

To load data into an existing table from a query, use INSERT INTO from a SELECT statement.

INSERT INTO tbl SELECT * FROM read_json_auto('input.json');

Alternatively, the COPY statement can also be used to load data from a JSON file into an existing table.

COPY tbl FROM 'input.json';

For additional options, see the JSON Loading reference and the COPY statement documentation.

607
DuckDB Documentation

JSON Export

To export the data from a table to a JSON file, use the COPY statement.

COPY tbl TO 'output.json';

The result of queries can also be directly exported to a JSON file.

COPY (SELECT * FROM tbl) TO 'output.json';

For additional options, see the COPY statement documentation.

Excel Import

Installing the Extension

To read data from an Excel file, install and load the spatial extension. This is only needed once per DuckDB connection.

INSTALL spatial;
LOAD spatial;

Importing Excel Sheets

Use the st_read function in the FROM clause of a query:

SELECT * FROM st_read('test_excel.xlsx');

The layer parameter allows specifying the name of the Excel worksheet.

SELECT * FROM st_read('test_excel.xlsx', layer = 'Sheet1');

Creating a New Table To create a new table using the result from a query, use CREATE TABLE ... AS from a SELECT statement.

CREATE TABLE new_tbl AS


SELECT * FROM st_read('test_excel.xlsx', layer = 'Sheet1');

Loading to an Existing Table To load data into an existing table from a query, use INSERT INTO from a SELECT statement.

INSERT INTO tbl


SELECT * FROM st_read('test_excel.xlsx', layer = 'Sheet1');

Options Several configuration options are also available for the underlying GDAL library that is doing the XLSX parsing. You can pass
them via the open_options parameter of the st_read function as a list of 'KEY=VALUE' strings.

Importing a Sheet with/without a Header The option HEADERS has three possible values:

• FORCE: treat the first row as a header


• DISABLE treat the first row as a row of data
• AUTO attempt auto‑detection (default)

For example, to treat the first row as a header, run:

SELECT *
FROM st_read(
'test_excel.xlsx',
layer = 'Sheet1',
open_options = ['HEADERS=FORCE']
);

608
DuckDB Documentation

Detecting Types The option FIELD_TYPE defines how field types should be treated:

• STRING: all fields should be loaded as strings (VARCHAR type)


• AUTO: field types should be auto‑detected (default)

For example, to treat the first row as a header and use auto‑detection for types, run:

SELECT *
FROM st_read(
'test_excel.xlsx',
layer = 'Sheet1',
open_options = ['HEADERS=FORCE', 'FIELD_TYPES=AUTO']
);

To treat the fields as strings:

SELECT *
FROM st_read(
'test_excel.xlsx',
layer = 'Sheet1',
open_options = ['FIELD_TYPES=STRING']
);

See Also

DuckDB can also export Export files. For additional details on Excel support, see the spatial extension page, the GDAL XLSX driver page, and
the GDAL configuration options page.

Excel Export

Installing the Extension

To export the data from a table to an Excel file, install and load the spatial extension. This is only needed once per DuckDB connection.

INSTALL spatial;
LOAD spatial;

Exporting Excel Sheets

Then use the COPY statement. The file will contain one worksheet with the same name as the file, but without the .xlsx extension.

COPY tbl TO 'output.xlsx' WITH (FORMAT GDAL, DRIVER 'xlsx');

The result of a query can also be directly exported to an Excel file.

COPY (SELECT * FROM tbl) TO 'output.xlsx' WITH (FORMAT GDAL, DRIVER 'xlsx');

Note. Dates and timestamps are currently not supported by the xlsx writer. Cast columns of those types to VARCHAR prior to
creating the xlsx file.

Note. Warning The output file must not already exist.

See Also

DuckDB can also import Export files. For additional details, see the spatial extension page and the GDAL XLSX driver page.

609
DuckDB Documentation

MySQL Import

To run a query directly on a running MySQL database, the mysql extension is required.

Installation and Loading

The extension can be installed use the INSTALL SQL command. This only needs to be run once.

INSTALL mysql;

To load the mysql extension for usage, use the LOAD SQL command:

LOAD mysql;

Usage

After the mysql extension is installed, you can attach to a MySQL database using the following command:

ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE mysql_scanner, READ_


ONLY);
USE mysql_db;

The string used by ATTACH is a PostgreSQL‑style connection string (not a MySQL connection string!). It is a list of connection arguments
provided in {key}={value} format. Below is a list of valid arguments. Any options not provided are replaced by their default values.

Setting Default

database NULL
host localhost
password
port 0
socket NULL
user current user

You can directly read and write the MySQL database:

CREATE TABLE tbl (id INTEGER, name VARCHAR);


INSERT INTO tbl VALUES (42, 'DuckDB');

For a list of supported operations, see the MySQL extension documentation.

PostgreSQL Import

To run a query directly on a running PostgreSQL database, the postgres extension is required.

Installation and Loading

The extension can be installed use the INSTALL SQL command. This only needs to be run once.

INSTALL postgres;

To load the postgres extension for usage, use the LOAD SQL command:

LOAD postgres;

610
DuckDB Documentation

Usage

After the postgres extension is installed, tables can be queried from PostgreSQL using the postgres_scan function:

-- scan the table "mytable" from the schema "public" in the database "mydb"
SELECT * FROM postgres_scan('host=localhost port=5432 dbname=mydb', 'public', 'mytable');

The first parameter to the postgres_scan function is the PostgreSQL connection string, a list of connection arguments provided in
{key}={value} format. Below is a list of valid arguments.

Name Description Default

host Name of host to connect to localhost


hostaddr Host IP address localhost
port Port number 5432
user Postgres user name [OS user name]
password Postgres password
dbname Database name [user]
passfile Name of file passwords are stored in ~/.pgpass

Alternatively, the entire database can be attached using the ATTACH command. This allows you to query all tables stored within the
PostgreSQL database as if it was a regular database.

-- attach the Postgres database using the given connection string


ATTACH 'host=localhost port=5432 dbname=mydb' AS test (TYPE postgres);
-- the table "tbl_name" can now be queried as if it is a regular table
SELECT * FROM test.tbl_name;
-- switch the active database to "test"
USE test;
-- list all tables in the file
SHOW TABLES;

For more information see the PostgreSQL extension documentation.

SQLite Import

To run a query directly on a SQLite file, the sqlite extension is required.

Installation and Loading

The extension can be installed use the INSTALL SQL command. This only needs to be run once.

INSTALL sqlite;

To load the sqlite extension for usage, use the LOAD SQL command:

LOAD sqlite;

Usage

After the SQLite extension is installed, tables can be queried from SQLite using the sqlite_scan function:

611
DuckDB Documentation

-- scan the table "tbl_name" from the SQLite file "test.db"


SELECT * FROM sqlite_scan('test.db', 'tbl_name');

Alternatively, the entire file can be attached using the ATTACH command. This allows you to query all tables stored within a SQLite database
file as if they were a regular database.

-- attach the SQLite file "test.db"


ATTACH 'test.db' AS test (TYPE sqlite);
-- the table "tbl_name" can now be queried as if it is a regular table
SELECT * FROM test.tbl_name;
-- switch the active database to "test"
USE test;
-- list all tables in the file
SHOW TABLES;

For more information see the SQLite extension documentation.

Directly Reading Files

DuckDB allows directly reading files via the read_text and read_blob functions. These functions accept a filename, a list of filenames
or a glob pattern, and output the content of each file as a VARCHAR or BLOB, respectively, as well as additional metadata such as the file
size and last modified time.

read_text

The read_text table function reads from the selected source(s) to a VARCHAR.

SELECT size, parse_path(filename), content


FROM read_text('test/sql/table_function/files/*.txt');

┌───────┬───────────────────────────────────────────────┬──────────────┐
│ size │ parse_path(filename) │ content │
│ int64 │ varchar[] │ varchar │
├───────┼───────────────────────────────────────────────┼──────────────┤
│ 12 │ [test, sql, table_function, files, one.txt] │ Hello World! │
│ 2 │ [test, sql, table_function, files, three.txt] │ 42 │
│ 10 │ [test, sql, table_function, files, two.txt] │ Föö Bär │
└───────┴───────────────────────────────────────────────┴──────────────┘

The file content is first validated to be valid UTF‑8. If read_text attempts to read a file with invalid UTF‑8 an error is thrown suggesting
to use read_blob instead.

read_blob

The read_blob table function reads from the selected source(s) to a BLOB.

SELECT size, content, filename


FROM read_blob('test/sql/table_function/files/*');

┌───────┬──────────────────────────────────────────────────────────────────────────────────────────────────┬─────
│ size │ content │
filename │
│ int64 │ blob │
varchar │
├───────┼──────────────────────────────────────────────────────────────────────────────────────────────────┼─────
│ 178 │
PK\x03\x04\x0A\x00\x00\x00\x00\x00\xACi=X\x14t\xCE\xC7\x0A\x00\x00\x00\x0A\x00\x00\x00\x09\x00… │
test/sql/table_function/files/four.blob │

612
DuckDB Documentation

│ 12 │ Hello World! │
test/sql/table_function/files/one.txt │
│ 2 │ 42 │
test/sql/table_function/files/three.txt │
│ 10 │ F\xC3\xB6\xC3\xB6 B\xC3\xA4r │
test/sql/table_function/files/two.txt │
└───────┴──────────────────────────────────────────────────────────────────────────────────────────────────┴─────

Schema

The schemas of the tables returned by read_text and read_blob are identical:

┌───────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├───────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ filename │ VARCHAR │ YES │ │ │ │
│ content │ VARCHAR │ YES │ │ │ │
│ size │ BIGINT │ YES │ │ │ │
│ last_modified │ TIMESTAMP │ YES │ │ │ │
└───────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘

Handling Missing Metadata

In cases where the underlying filesystem is unable to provide some of this data due (e.g. because HTTPFS can't always return a valid
timestamp), the cell is set to NULL instead.

Support for Projection Pushdown

The table functions also utilize projection pushdown to avoid computing properties unnecessarily. So you could e.g. use this to glob a
directory full of huge files to get the file size in the size column, as long as you omit the content column the data wont be read into DuckDB.

613
DuckDB Documentation

614
Performance

Performance Guide

DuckDB aims to automatically achieve high performance by using well‑chosen default configurations and having a forgiving architecture.
Of course, there are still opportunities for tuning the system for specific workloads. The Performance Guide's page contain guidelines and
tips for achieving good performance when loading and processing data with DuckDB.

The guides include several microbenchmarks. You may find details about these on the Benchmarks page.

Schema

Types

It is important to use the correct type for encoding columns (e.g., BIGINT, DATE, DATETIME). While it is always possible to use string types
(VARCHAR, etc.) to encode more specific values, this is not recommended. Strings use more space and are slower to process in operations
such as filtering, join, and aggregation.

When loading CSV files, you may leverage the CSV reader's auto‑detection mechanism to get the correct types for CSV inputs.

If you run in a memory‑constrained environment, using smaller data types (e.g., TINYINT) can reduce the amount of memory and disk
space required to complete a query. DuckDB’s bitpacking compression means small values stored in larger data types will not take up larger
sizes on disk, but they will take up more memory during processing.

Note. Bestpractice Use the most restrictive types possible when creating columns. Avoid using strings for encoding more specific
data items.

Microbenchmark: Using Timestamps We illustrate the difference in aggregation speed using the creationDate column of the LDBC
Comment table on scale factor 300. This table has approx. 554 million unordered timestamp values. We run a simple aggregation query
that returns the average day‑of‑the month from the timestamps in two configurations.

First, we use a DATETIME to encode the values and run the query using the extract datetime function:

SELECT avg(extract('day' FROM creationDate)) FROM Comment;

Second, we use the VARCHAR type and use string operations:

SELECT avg(CAST(creationDate[9:10] AS INT)) FROM Comment;

The results of the microbenchmark are as follows:

Column type Storage size Query time

DATETIME 3.3 GB 0.9 s


VARCHAR 5.2 GB 3.9 s

The results show that using the DATETIME value yields smaller storage sizes and faster processing.

615
DuckDB Documentation

Microbenchmark: Joining on Strings We illustrate the difference caused by joining on different types by computing a self‑join on the
LDBC Comment table at scale factor 100. The table has 64‑bit integer identifiers used as the id attribute of each row. We perform the
following join operation:
SELECT count(*) AS count
FROM Comment c1
JOIN Comment c2 ON c1.ParentCommentId = c2.id;

In the first experiment, we use the correct (most restrictive) types, i.e., both the id and the ParentCommentId columns are defined as
BIGINT. In the second experiment, we define all columns with the VARCHAR type. While the results of the queries are the same for all
both experiments, their runtime vary significantly. The results below show that joining on BIGINT columns is approx. 1.8× faster than
performing the same join on VARCHAR‑typed columns encoding the same value.

Join column payload type Join column schema type Example value Query time

BIGINT BIGINT 70368755640078 1.2 s


BIGINT VARCHAR '70368755640078' 2.1 s

Note. Bestpractice Avoid representing numeric values as strings, especially if you intend to perform e.g. join operations on them.

Constraints

DuckDB allows defining constraints such as UNIQUE, PRIMARY KEY, and FOREIGN KEY. These constraints can be beneficial for ensuring
data integrity but they have a negative effect on load performance as they necessitate building indexes and performing checks. Moreover,
they very rarely improve the performance of queries as DuckDB does not rely on these indexes for join and aggregation operators (see index‑
ing for more details).

Note. Bestpractice Do not define constraints unless your goal is to ensure data integrity.

Microbenchmark: The Effect of Primary Keys

We illustrate the effect of using primary keys with the LDBC Comment table at scale factor 300. This table has approx. 554 million entries.
We first create the schema without a primary key, then load the data. In the second experiment, we create the schema with a primary key,
then load the data. In both cases, we take the data from .csv.gz files, and measure the time required to perform the loading.

Operation Execution time

Load without primary key 92.168s


Load with primary key 286.765s

Note. Bestpractice For best bulk load performance, avoid defining primary key constraints if possible.

Indexing

DuckDB has two types of indexes: zonemaps and ART indexes.

Zonemaps

DuckDB automatically creates zonemaps (also known as min‑max indexes) for the columns of all general‑purpose data types. These in‑
dexes are used for predicate pushdown into scan operators and computing aggregations. This means that if a filter criterion (like WHERE
column1 = 123) is in use, DuckDB can skip any row group whose min‑max range does not contain that filter value (e.g., a block with a
min‑max range of 1000 to 2000 will be omitted when comparing for = 123 or < 400).

616
DuckDB Documentation

The Effect of Ordering on Zonemaps The more ordered the data within a column, the more useful the zonemap indexes will be. For
example, in the worst case, a column could contain a random number on every row. DuckDB will be unlikely to be able to skip any row
groups. The best case of ordered data commonly arises with DATETIME columns. If specific columns will be queried with selective filters,
it is best to pre‑order data by those columns when inserting it. Even an imperfect ordering will still be helpful.

Microbenchmark: The Effect of Ordering For an example, let’s repeat the microbenchmark for timestamps with a timestamp column
that sorted using an ascending order vs. an unordered one.

Column type Ordered Storage size Query time

DATETIME yes 1.3 GB 0.578 s


DATETIME no 3.3 GB 0.904 s

The results show that simply keeping the column order allows for improved compression, yielding a 2.5x smaller storage size. It also allows
the computation to be 1.5x faster.

Ordered Integers Another practical way to exploit ordering is to use the INTEGER type with automatic increments rather than UUID for
columns that will be queried using selective filters. UUIDs will likely be inserted in a random order, so many row groups in the table will
need to be scanned to find a specific UUID value, while an ordered INTEGER column will allow all row groups to be skipped except the
one that contains the value.

ART Indexes

DuckDB allows defining Adaptive Radix Tree (ART) indexes in two ways. First, such an index is created implicitly for columns with PRIMARY
KEY, FOREIGN KEY, and UNIQUE constraints. Second, explicitly running a the CREATE INDEX statement creates an ART index on the
target column(s).

The tradeoffs of having an ART index on a column are as follows:

1. It enables efficient constraint checking upon changes (inserts, updates, and deletes) for non‑bulky changes.
2. Having an ART index makes changes to the affected column(s) slower compared to non‑indexed performance. That is because of
index maintenance for these operations.

Regarding query performance, an ART index has the following effects:

1. It speeds up point queries and other highly selective queries using the indexed column(s), where the filtering condition returns approx.
0.1% of all rows or fewer. When in doubt, use EXPLAIN to verify that your query plan uses the index scan.
2. An ART index has no effect on the performance of join, aggregation, and sorting queries.

Indexes are serialized to disk and deserialized lazily, i.e., when the database is reopened, operations using the index will only load the
required parts of the index. Therefore, having an index will not cause any slowdowns when opening an existing database.

Note. Bestpractice We recommend following these guidelines:

• Only use primary keys, foreign keys, or unique constraints, if these are necessary for enforcing constraints on your data.
• Do not define explicit indexes unless you have highly selective queries.
• If you define an ART index, do so after bulk loading the data to the table. Adding an index prior to loading, either explicitly or
via primary/foreign keys, is detrimental to load performance.

Environment

The environment where DuckDB is run has an obvious impact on performance. This page focuses on the effects of the hardware configura‑
tion and the operating system used.

617
DuckDB Documentation

Hardware Configuration

CPU and Memory As a rule of thumb, aggregation‑heavy workloads require approx. 5 GB memory per CPU core and join‑heavy workloads
require approximately 10 GB memory per core for best performance. In AWS EC2, the former are available as general‑purpose instances
(e.g., M7g) and the latter as memory‑optimized instances (e.g., R7g).

Note. Bestpractice Aim for 5‑10 GB memory per CPU core.

Disk DuckDB is capable of operating both as an in‑memory and as a disk‑based database system. In the latter case, it can spill to disk
to process larger‑than‑memory workloads (a.k.a. out‑of‑core processing). In these cases, a fast disk is highly beneficial. However, if the
workload fits in memory, the disk speed only has a limited effect on performance.

In general, network‑based storage will result in slower DuckDB workloads than using local disks. This includes network disks such as NFS,
network drives such as SMB and Samba, and network‑backed cloud disks such as AWS EBS. However, different network disks can have
vastly varying IO performance, ranging from very slow to almost as fast as local. Therefore, for optimal performance, only use network
disks that can provide high IO performance.

Note. Bestpractice Fast disks are important if your workload is larger than memory and/or fast data loading is important. Only use
network‑backed disks if they guarantee high IO.

Operating System

We recommend using the latest stable version of operating systems: macOS, Windows, and Linux are all well‑tested and DuckDB can run
on them with high performance. Among Linux distributions, we recommended using Ubuntu Linux LTS due to its stability and the fact that
most of DuckDB’s Linux test suite jobs run on Ubuntu workers.

File Formats

Handling Parquet Files

DuckDB has advanced support for Parquet files, which includes directly querying Parquet files. When deciding on whether to query these
files directly or to first load them to the database, you need to consider several factors.

Reasons for Querying Parquet Files Availability of basic statistics: Parquet files use a columnar storage format and contain basic
statistics such as zonemaps. Thanks to these features, DuckDB can leverage optimizations such as projection and filter pushdown on
Parquet files. Therefore, workloads that combine projection, filtering, and aggregation tend to perform quite well when run on Parquet
files.

Storage considerations: Loading the data from Parquet files will require approximately the same amount of space for the DuckDB database
file. Therefore, if the available disk space is constrained, it is worth running the queries directly on Parquet files.

Reasons against Querying Parquet Files Lack of advanced statistics: The DuckDB database format has the hyperloglog statistics that
Parquet files do not have. These improve the accuracy of cardinality estimates, and are especially important if the queries contain a large
number of join operators.

Tip. If you find that DuckDB produces a suboptimal join order on Parquet files, try loading the Parquet files to DuckDB tables. The improved
statistics likely help obtain a better join order.

Repeated queries: If you plan to run multiple queries on the same data set, it is worth loading the data into DuckDB. The queries will
always be somewhat faster, which over time amortizes the initial load time.

High decompression times: Some Parquet files are compressed using heavyweight compression algorithms such as gzip. In these cases,
querying the Parquet files will necessitate an expensive decompression time every time the file is accessed. Meanwhile, lightweight com‑
pression methods like snappy, lz4, zstd, are faster to decompress. You may use the parquet_metadata function to find out the com‑
pression algorithm used.

618
DuckDB Documentation

Microbenchmark: Running TPC‑H on a DuckDB Database vs. Parquet The queries on the TPC‑H benchmark run approximately 1.1‑
5.0x slower on Parquet files than on a DuckDB database.

Note. Bestpractice If you have the storage space available, and have a join‑heavy workload and/or plan to run many queries on the
same dataset, load the Parquet files into the database first. The compression algorithm and the row group sizes in the Parquet files
have a large effect on performance: study these using the parquet_metadata function.

The Effect of Row Group Sizes DuckDB works best on Parquet files with row groups of 100K‑1M rows each. The reason for this is that
DuckDB can only parallelize over row groups – so if a Parquet file has a single giant row group it can only be processed by a single thread.
You can use the parquet_metadata function to figure out how many row groups a Parquet file has. When writing Parquet files, use the
row_group_size option.

Microbenchmark: Running Aggregation Query at Different Row Group Sizes We run a simple aggregation query over Parquet files
using different row group sizes, selected between 960 and 1,966,080. The results are as follows.

Row group size Execution time

960 8.77s
1920 8.95s
3840 4.33s
7680 2.35s
15360 1.58s
30720 1.17s
61440 0.94s
122880 0.87s
245760 0.93s
491520 0.95s
983040 0.97s
1966080 0.88s

The results show that row group sizes <5,000 have a strongly detrimental effect, making runtimes more than 5‑10x larger than ideally‑sized
row groups, while row group sizes between 5,000 and 20,000 are still 1.5‑2.5x off from best performance. Above row group size of 100,000,
the differences are small: the gap is about 10% between the best and the worst runtime.

Parquet File Sizes DuckDB can also parallelize across multiple Parquet files. It is advisable to have at least as many total row groups
across all files as there are CPU threads. For example, with a machine having 10 threads, both 10 files with 1 row group or 1 file with 10 row
groups will achieve full parallelism. It is also beneficial to keep the size of individual Parquet files moderate.

Note. Bestpractice The ideal range is between 100MB and 10GB per individual Parquet file.

Hive Partitioning for Filter Pushdown When querying many files with filter conditions, performance can be improved by using a Hive‑
format folder structure to partition the data along the columns used in the filter condition. DuckDB will only need to read the folders and
files that meet the filter criteria. This can be especially helpful when querying remote files.

More Tips on Reading and Writing Parquet Files For tips on reading and writing Parquet files, see the Parquet Tips page.

619
DuckDB Documentation

Loading CSV Files

CSV files are often distributed in compressed format such as GZIP archives (.csv.gz). DuckDB can decompress these files on the fly. In
fact, this is typically faster than decompressing the files first and loading them due to reduced IO.

Schema Load Time

Load from GZIP‑compressed CSV files (.csv.gz) 107.1s


Decompressing (using parallel gunzip) and loading from 121.3s
decompressed CSV files

Loading Many Small CSV Files The CSV reader runs the CSV sniffer on all files. For many small files, this may cause an unnecessarily high
overhead. A potential optimization to speed this up is to turn the sniffer off. Assuming that all files have the same CSV dialect and colum
names/types, get the sniffer options as follows:

.mode line
SELECT Prompt FROM sniff_csv('part-0001.csv');

Prompt = FROM read_csv('file_path.csv', auto_detect=false, delim=',', quote='"', escape='"', new_


line='\n', skip=0, header=true, columns={'hello': 'BIGINT', 'world': 'VARCHAR'});

Then, you can adjust read_csv command, by e.g. applying filename expansion (globbing), and run with the rest of the options detected
by the sniffer:

FROM read_csv('part-*.csv', auto_detect=false, delim=',', quote='"', escape='"', new_line='\n', skip=0,


header=true, columns={'hello': 'BIGINT', 'world': 'VARCHAR'});

Tuning Workloads

Parallelism (Multi‑Core Processing)

The Effect of Row Groups on Parallelism DuckDB parallelizes the workload based on row groups, i.e., groups of rows that are stored
together at the storage level. A row group in DuckDB's database format consists of max. 122,880 rows. Parallelism starts at the level of row
groups, therefore, for a query to run on k threads, it needs to scan at least k * 122,880 rows.

Too Many Threads Note that in certain cases DuckDB may launch too many threads (e.g., due to HyperThreading), which can lead to
slowdowns. In these cases, it’s worth manually limiting the number of threads using SET threads = X.

Larger‑Than‑Memory Workloads (Out‑of‑Core Processing)

A key strength of DuckDB is support for larger‑than‑memory workloads, i.e., it is able to process data sets that are larger than the available
system memory (also known as out‑of‑core processing). It can also run queries where the intermediate results cannot fit into memory. This
section explains the prerequisites, scope, and known limitations of larger‑than‑memory processing in DuckDB.

Spilling to Disk Larger‑than‑memory workloads are supported by spilling to disk. If DuckDB is connected to a persistent database file,
DuckDB will create a temporary directory named database_file_name .tmp when the available memory is no longer sufficient to
continue processing.

If DuckDB is running in in‑memory mode, it cannot use disk to offload data if it does not fit into main memory. To enable offloading in the
absence of a persistent database file, use the SET temp_directory statement:

SET temp_directory = '/path/to/temp_dir.tmp/'

620
DuckDB Documentation

Operators Some operators cannot output a single row until the last row of their input has been seen. These are called blocking operators
as they require their entire input to be buffered, and are the most memory‑instensive operators in relational database systems. The main
blocking operators are the following:

• sorting: ORDER BY,


• grouping: GROUP BY,
• windowing: OVER ... (PARTITION BY ... ORDER BY ...),
• joining: JOIN.

DuckDB supports larger‑than‑memory processing for all of these operators.

Limitations DuckDB strives to always complete workloads even if they are larger‑than‑memory. That said, there are some limitations at
the moment:

• If multiple blocking operators appear in the same query, DuckDB may still throw an out‑of‑memory exception due to the complex
interplay of these operators.
• Some aggregate functions, such as list() and string_agg(), do not support offloading to disk.
• Aggregate functions that use sorting are holistic, i.e., they need all inputs before the aggregation can start. As DuckDB cannot yet
offload some complex intermediate aggregate states to disk, these functions can cause an out‑of‑memory exception when run on
large data sets.
• The PIVOT operation internally uses the list() function, therefore it is subject to the same limitation.

Profiling

If your queries are not performing as well as expected, it’s worth studying their query plans:

• Use EXPLAIN to print the physical query plan without running the query.
• Use EXPLAIN ANALYZE to run and profile the query. This will show the CPU time that each step in the query takes. Note that due
to multi‑threading, adding up the individual times will be larger than the total query processing time.

Query plans can point to the root of performance issues. A few general directions:

• Avoid nested loop joins in favor of hash joins.


• A scan that does not include a filter pushdown for a filter condition that is later applied performs unnecessary IO. Try rewriting the
query to apply a pushdown.
• Bad join orders where the cardinality of an operator explodes to billions of tuples should be avoided at all costs.

Prepared Statements

Prepared statements can improve performance when running the same query many times, but with different parameters. When a state‑
ment is prepared, it completes several of the initial portions of the query execution process (parsing, planning, etc.) and caches their
output. When it is executed, those steps can be skipped, improving performance. This is beneficial mostly for repeatedly running small
queries (with a runtime of < 100ms) with different sets of parameters.

Note that it is not a primary design goal for DuckDB to quickly execute many small queries concurrently. Rather, it is optimized for running
larger, less frequent queries.

Querying Remote Files

DuckDB uses synchronous IO when reading remote files. This means that each DuckDB thread can make at most one HTTP request at a
time. If a query must make many small requests over the network, increasing DuckDB's threads setting to larger than the total number
of CPU cores (approx. 2‑5 times CPU cores) can improve parallelism and performance.

621
DuckDB Documentation

Avoid Reading Unnecessary Data The main bottleneck in workloads reading remote files is likely to be the IO. This means that minimiz‑
ing the unnecessarily read data can be highly beneficial.

Some basic SQL tricks can help with this:

• Avoid SELECT *. Instead, only select columns that are actually used. DuckDB will try to only download the data it actually needs.
• Apply filters on remote parquet files when possible. DuckDB can use these filters to reduce the amount of data that is scanned.
• Either sort or partition data by columns that are regularly used for filters: this increases the effectiveness of the filters in reducing IO.

To inspect how much remote data is transferred for a query, EXPLAIN ANALYZE can be used to print out the total number of requests
and total data transferred for queries on remote files.

Avoid Reading Data More Than Once DuckDB does not cache data from remote files automatically. This means that running a query on
a remote file twice will download the required data twice. So if data needs to be accessed multiple times, storing it locally can make sense.
To illustrate this, lets look at an example:

Consider the following queries:

SELECT col_a + col_b FROM 's3://bucket/file.parquet' WHERE col_a > 10;


SELECT col_a * col_b FROM 's3://bucket/file.parquet' WHERE col_a > 10;

These queries download the columns col_a and col_b from s3://bucket/file.parquet twice. Now consider the following
queries:

CREATE TABLE local_copy_of_file AS


SELECT col_a, col_b FROM 's3://bucket/file.parquet' WHERE col_a > 10;

SELECT col_a + col_b FROM local_copy_of_file;


SELECT col_a * col_b FROM local_copy_of_file;

Here DuckDB will first copy col_a and col_b from s3://bucket/file.parquet into a local table, and then query the local in‑
memory columns twice. Note also that the filter WHERE col_a > 10 is also now applied only once.

An important side note needs to be made here though. The first two queries are fully streaming, with only a small memory footprint,
whereas the second requires full materialization of columns col_a and col_b. This means that in some rare cases (e.g., with a high‑
speed network, but with very limited memory available) it could actually be beneficial to download the data twice.

Best Practices for Using Connections

DuckDB will perform best when reusing the same database connection many times. Disconnecting and reconnecting on every query will
incur some overhead, which can reduce performance when running many small queries. DuckDB also caches some data and metadata in
memory, and that cache is lost when the last open connection is closed. Frequently, a single connection will work best, but a connection
pool may also be used.

Using multiple connections can parallelize some operations, although it is typically not necessary. DuckDB does attempt to parallelize as
much as possible within each individual query, but it is not possible to parallelize in all cases. Making multiple connections can process
more operations concurrently. This can be more helpful if DuckDB is not CPU limited, but instead bottlenecked by another resource like
network transfer speed.

The preserve_insertion_order Option

When importing or exporting data sets (from/to the Parquet or CSV formats), which are much larger than the available memory, an out of
memory error may occur:

Error: Out of Memory Error: failed to allocate data of size ... (.../... used)

In these cases, consider setting the preserve_insertion_order configuration option to false:

SET preserve_insertion_order = false;

This allows the systems to re‑order any results that do not contain ORDER BY clauses, potentially reducing memory usage.

622
DuckDB Documentation

My Workload Is Slow

If you find that your workload in DuckDB is slow, we recommend performing the following checks. More detailed instructions are linked for
each point.

1. Do you have enough memory? DuckDB works best if you have 5‑10GB memory per CPU core.
2. Are you using a fast disk? Network‑attached disks can cause the workload to slow down, especially for larger than memory workloads.
3. Are you using indexes or constraints (primary key, unique, etc.)? If possible, try disabling them, which boosts load and update per‑
formance.
4. Are you using the correct types? For example, use TIMESTAMP to encode datetime values.
5. Are you reading from Parquet files? If so, do they have row group sizes between 100k and 1M and file sizes between 100MB to 10GB?
6. Does the query plan look right? Study it with EXPLAIN.
7. Is the workload running in parallel? Use htop or the operating system's task manager to observe this.
8. Is DuckDB using too many threads? Try limiting the amount of threads.

Are you aware of other common issues? If so, please click the Report content issue link below and describe them along with their
workarounds.

Benchmarks

For several of the recommendations in our performance guide, we use microbenchmarks to back up our claims. For these benchmarks, we
use data sets from the TPC‑H benchmark and the LDBC Social Network Benchmark’s BI workload.

Data Sets

We use the LDBC BI SF300 data set's Comment table (20GB .tar.zst archive, 21GB when decompressed into .csv.gz files), while
others use the same table's creationDate column (4GB .parquet file).

The TPC data sets used in the benchmark are generated with the DuckDB tpch extension.

A Note on Benchmarks

Running fair benchmarks is difficult, especially when performing system‑to‑system comparison. When running benchmarks on DuckDB,
please make sure you are using the latest version (preferably the nightly build). If in doubt about your benchmark results, feel free to contact
us at [email protected].

Disclaimer on Benchmarks

Note that the benchmark results presented in this guide do not constitute official TPC or LDBC benchmark results. Instead, they merely use
the data sets of and some queries provided by the TPC‑H and the LDBC BI benchmark frameworks, and omit other parts of the workloads
such as updates.

623
DuckDB Documentation

624
Meta Queries

Describe

Describing a Table

In order to view the schema of a table, use DESCRIBE or SHOW followed by the table name.

CREATE TABLE tbl (i INTEGER PRIMARY KEY, j VARCHAR);


DESCRIBE tbl;
SHOW tbl; -- equivalent to DESCRIBE tbl;

column_name column_type null key default extra

i INTEGER NO PRI NULL NULL


j VARCHAR YES NULL NULL NULL

Describing a Query

In order to view the schema of the result of a query, prepend DESCRIBE to a query.

DESCRIBE SELECT * FROM tbl;

column_name column_type null key default extra

i INTEGER YES NULL NULL NULL


j VARCHAR YES NULL NULL NULL

Note that there are subtle differences: compared to the result when describing a table, nullability (null) and key information (key) are
lost.

Using DESCRIBE in a Subquery

DESCRIBE can be used a subquery. This allows creating a table from the description, for example:

CREATE TABLE tbl_description AS SELECT * FROM (DESCRIBE tbl);

Describing Remote Tables

It is possible to describe remote tables via the httpfs extension using the DESCRIBE TABLE statement. For example:

DESCRIBE TABLE 'https://blobs.duckdb.org/data/Star_Trek-Season_1.csv';

625
DuckDB Documentation

┌─────────────────────────────────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├─────────────────────────────────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ season_num │ BIGINT │ YES │ │ │ │
│ episode_num │ BIGINT │ YES │ │ │ │
│ aired_date │ DATE │ YES │ │ │ │
│ ... │ ... │ ... │ │ │ │
├─────────────────────────────────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┤
│ 18 rows 6 columns │
└───────────────────────────────────────────────────────────────────────────────────────────────┘

EXPLAIN: Inspect Query Plans

In order to view the query plan of a query, prepend EXPLAIN to a query.

EXPLAIN SELECT * FROM tbl;

By default only the final physical plan is shown. In order to see the unoptimized and optimized logical plans, change the explain_
output setting:

SET explain_output = 'all';

Below is an example of running EXPLAIN on Q13 of the TPC‑H benchmark on the scale factor 1 data set.

EXPLAIN
SELECT
c_count,
count(*) AS custdist
FROM (
SELECT
c_custkey,
count(o_orderkey)
FROM
customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%special%requests%'
GROUP BY
c_custkey) AS c_orders (c_custkey,
c_count)
GROUP BY
c_count
ORDER BY
custdist DESC,
c_count DESC;

┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Physical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ ORDER_BY │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ ORDERS: │
│ count_star() DESC │
│ c_orders.c_count DESC │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐

626
DuckDB Documentation

│ HASH_GROUP_BY │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ #0 │
│ count_star() │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ c_count │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ count(o_orderkey) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_GROUP_BY │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ #0 │
│ count(#1) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ c_custkey │
│ o_orderkey │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_JOIN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ RIGHT │
│ o_custkey = c_custkey ├──────────────┐
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │
│ EC: 300000 │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ FILTER ││ SEQ_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ (o_comment !~~ '%special ││ customer │
│ %requests%') ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ c_custkey │
│ EC: 300000 ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ ││ EC: 150000 │
└─────────────┬─────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ orders │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ o_custkey │
│ o_comment │
│ o_orderkey │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 1500000 │
└───────────────────────────┘

627
DuckDB Documentation

See Also

For more information, see the Profiling page.

EXPLAIN ANALYZE: Profile Queries

In order to profile a query, prepend EXPLAIN ANALYZE to a query.

EXPLAIN ANALYZE SELECT * FROM tbl;

The query plan will be pretty‑printed to the screen using timings for every operator.

Note that the cumulative wall‑clock time that is spent on every operator is shown. When multiple threads are processing the query in
parallel, the total processing time of the query may be lower than the sum of all the times spent on the individual operators.

Below is an example of running EXPLAIN ANALYZE on Q13 of the TPC‑H benchmark on the scale factor 1 data set.

EXPLAIN ANALYZE
SELECT
c_count,
count(*) AS custdist
FROM (
SELECT
c_custkey,
count(o_orderkey)
FROM
customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%special%requests%'
GROUP BY
c_custkey) AS c_orders (c_custkey,
c_count)
GROUP BY
c_count
ORDER BY
custdist DESC,
c_count DESC;

┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Total Time: 0.0487s ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│ RESULT_COLLECTOR │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 0 │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXPLAIN_ANALYZE │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 0 │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ ORDER_BY │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ ORDERS: │

628
DuckDB Documentation

│ count_star() DESC │
│ c_orders.c_count DESC │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 42 │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_GROUP_BY │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ #0 │
│ count_star() │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 42 │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ c_count │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 150000 │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ count(o_orderkey) │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 150000 │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_GROUP_BY │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ #0 │
│ count(#1) │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 150000 │
│ (0.09s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ c_custkey │
│ o_orderkey │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 1534302 │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_JOIN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ RIGHT │
│ o_custkey = c_custkey │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ├──────────────┐
│ EC: 300000 │ │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │
│ 1534302 │ │

629
DuckDB Documentation

│ (0.08s) │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ FILTER ││ SEQ_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ (o_comment !~~ '%special ││ customer │
│ %requests%') ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ c_custkey │
│ EC: 300000 ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ EC: 150000 │
│ 1484298 ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ (0.10s) ││ 150000 │
│ ││ (0.00s) │
└─────────────┬─────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ orders │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ o_custkey │
│ o_comment │
│ o_orderkey │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 1500000 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 1500000 │
│ (0.01s) │
└───────────────────────────┘

See Also

For more information, see the Profiling page.

List Tables

The SHOW TABLES command can be used to obtain a list of all tables within the selected schema.

CREATE TABLE tbl (i INTEGER);


SHOW TABLES;

name

tbl

SHOW or SHOW ALL TABLES can be used to obtain a list of all tables within all attached databases and schemas.

CREATE TABLE tbl (i INTEGER);


CREATE SCHEMA s1;
CREATE TABLE s1.tbl (v VARCHAR);
SHOW ALL TABLES;

630
DuckDB Documentation

database schema table_name column_names column_types temporary

memory main tbl [i] [INTEGER] false


memory s1 tbl [v] [VARCHAR] false

To view the schema of an individual table, use the DESCRIBE command.

See Also

The SQL‑standard information_schema views are also defined. Moreover, DuckDB defines sqlite_master and many PostgreSQL
system catalog tables for compatibility with SQLite and PostgreSQL respectively.

Summarize

The SUMMARIZE command can be used to easily compute a number of aggregates over a table or a query. The SUMMARIZE command
launches a query that computes a number of aggregates over all columns (min, max, approx_unique, avg, std, q25, q50, q75,
count), and return these along the column name, column type, and the percentage of NULL values in the column.

Usage

In order to summarize the contents of a table, use SUMMARIZE followed by the table name.

SUMMARIZE tbl;

In order to summarize a query, prepend SUMMARIZE to a query.

SUMMARIZE SELECT * FROM tbl;

Example

Below is an example of SUMMARIZE on the lineitem table of TPC‑H SF1 table, generated using the tpch extension.

INSTALL tpch;
LOAD tpch;
CALL dbgen(sf = 1);

SUMMARIZE lineitem;

column_ column_ approx_ null_


name type min max unique avg std q25 q50 q75 count percentage

l_ INTEGER 1 6000000 1508227 3000279.604204982


1732187.8734803519
15094472989869448523260012150.0%
orderkey
l_partkey INTEGER 1 200000 202598 100017.98932999402
57735.69082650496
49913 99992 150039 60012150.0%
l_ INTEGER 1 10000 10061 5000.602606138924
2886.9619987306114
2501 4999 7500 60012150.0%
suppkey
l_ INTEGER 1 7 7 3.0005757167506912
1.7324314036519328
2 3 4 60012150.0%
linenumber
l_ DECIMAL(15,2)
1.00 50.00 50 25.507967136654827
14.426262537016918
13 26 38 60012150.0%
quantity

631
DuckDB Documentation

column_ column_ approx_ null_


name type min max unique avg std q25 q50 q75 count percentage

l_ DECIMAL(15,2)
901.00 104949.50 923139 38255.138484656854
23300.43871096221
18756 36724 55159 60012150.0%
extendedprice
l_ DECIMAL(15,2)
0.00 0.10 11 0.04999943011540163
0.03161985510812596
0 0 0 60012150.0%
discount
l_tax DECIMAL(15,2)
0.00 0.08 9 0.04001350893110812
0.025816551798842728
0 0 0 60012150.0%
l_ VARCHAR A R 3 NULL NULL NULL NULL NULL 60012150.0%
returnflag
l_ VARCHAR F O 2 NULL NULL NULL NULL NULL 60012150.0%
linestatus
l_ DATE 1992‑ 1998‑12‑01 2516 NULL NULL NULL NULL NULL 60012150.0%
shipdate 01‑02
l_ DATE 1992‑ 1998‑10‑31 2460 NULL NULL NULL NULL NULL 60012150.0%
commitdate 01‑31
l_ DATE 1992‑ 1998‑12‑31 2549 NULL NULL NULL NULL NULL 60012150.0%
receiptdate 01‑04
l_ VARCHAR COLLECT TAKE BACK 4 NULL NULL NULL NULL NULL 60012150.0%
shipinstruct COD RETURN
l_ VARCHAR AIR TRUCK 7 NULL NULL NULL NULL NULL 60012150.0%
shipmode
l_ VARCHAR Tiresias zzle? 3558599 NULL NULL NULL NULL NULL 60012150.0%
comment furiously iro

Using SUMMARIZE in a Subquery

SUMMARIZE can be used a subquery. This allows creating a table from the summary, for example:

CREATE TABLE tbl_summary AS SELECT * FROM (SUMMARIZE tbl);

Summarizing Remote Tables

It is possible to summarize remote tables via the httpfs extension using the SUMMARIZE TABLE statement. For example:

SUMMARIZE TABLE 'https://blobs.duckdb.org/data/Star_Trek-Season_1.csv';

DuckDB Environment

DuckDB provides a number of functions and PRAGMA options to retrieve information on the running DuckDB instance and its environ‑
ment.

Version

The version() function returns the version number of DuckDB.

SELECT version();

632
DuckDB Documentation

┌───────────┐
│ version() │
│ varchar │
├───────────┤
│ v0.10.0 │
└───────────┘

Using a PRAGMA:

PRAGMA version;

┌─────────────────┬────────────┐
│ library_version │ source_id │
│ varchar │ varchar │
├─────────────────┼────────────┤
│ v0.10.0 │ 20b1486d11 │
└─────────────────┴────────────┘

Platform

The platform information consists of the operating system, compiler, and, optionally, the compiler. The platform is used when installing
extensions. To retrieve the platform, use the following PRAGMA:

PRAGMA platform;

On macOS, running on Apple Silicon architecture, the result is:

┌───────────┐
│ platform │
│ varchar │
├───────────┤
│ osx_arm64 │
└───────────┘

On Windows, running on an AMD64 architecture, the platform is windows_amd64. On CentOS 7, running on the AMD64 architecture, the
platform is linux_amd64_gcc4. On Ubuntu 22.04, running on the ARM64 architecture, the platform is linux_arm64.

Extensions

To get a list of DuckDB extension and their status (e.g., loaded, installed), use the duckdb_extensions() function:

SELECT *
FROM duckdb_extensions();

Meta Table Functions

DuckDB has the following built‑in table functions to obtain metadata about available catalog objects:

• duckdb_columns(): columns
• duckdb_constraints(): constraints
• duckdb_databases(): lists the databases that are accessible from within the current DuckDB process
• duckdb_dependencies(): dependencies between objects
• duckdb_extensions(): extensions
• duckdb_functions(): functions
• duckdb_indexes(): secondary indexes
• duckdb_keywords(): DuckDB's keywords and reserved words
• duckdb_optimizers(): the available optimization rules in the DuckDB instance

633
DuckDB Documentation

• duckdb_schemas(): schemas
• duckdb_sequences(): sequences
• duckdb_settings(): settings

• duckdb_tables(): base tables


• duckdb_types(): data types
• duckdb_views(): views
• duckdb_temporary_files(): the temporary files DuckDB has written to disk, to offload data from memory

634
ODBC

ODBC 101: A Duck Themed Guide to ODBC

• What is ODBC?
• General Concepts
• Setting up an Application
• Sample Application

What is ODBC?

ODBC which stands for Open Database Connectivity, is a standard that allows different programs to talk to different databases including, of
course, DuckDB . This makes it easier to build programs that work with many different databases, which saves time as developers don't
have to write custom code to connect to each database. Instead, they can use the standardized ODBC interface, which reduces development
time and costs, and programs are easier to maintain. However, ODBC can be slower than other methods of connecting to a database, such
as using a native driver, as it adds an extra layer of abstraction between the application and the database. Furthermore, because DuckDB
is column‑based and ODBC is row‑based, there can be some inefficiencies when using ODBC with DuckDB.

Note. There are links throughout this page to the official Microsoft ODBC documentation, which is a great resource for learning
more about ODBC.

General Concepts

• Handles
• Connecting
• Error Handling and Diagnostics
• Buffers and Binding

Handles A handle is a pointer to a specific ODBC object which is used to interact with the database. There are several different types of
handles, each with a different purpose, these are the environment handle, the connection handle, the statement handle, and the descriptor
handle. Handles are allocated using the SQLAllocHandle which takes as input the type of handle to allocate, and a pointer to the handle,
the driver then creates a new handle of the specified type which it returns to the application.

Handle Types

Handle Type Description Use Case Additional Information

Environment SQL_HANDLE_ Manages the environment Initializing ODBC, managing Must be allocated once per
ENV settings for ODBC operations, driver behavior, resource application upon starting,
and provides a global context allocation and freed at the end.
in which to access data.

635
DuckDB Documentation

Handle Type Description Use Case Additional Information

Connection SQL_HANDLE_ Represents a connection to a Establishing a connection to Multiple connection handles


DBC data source. Used to a database, managing the can be created as needed,
establish, manage, and connection state allowing simultaneous
terminate connections. connections to multiple data
Defines both the driver and sources. Note: Allocating a
the data source to use within connection handle does not
the driver. establish a connection, but
must be allocated first, and
then used once the
connection has been
established.
Statement SQL_HANDLE_ Handles the execution of SQL Executing SQL queries, To facilitate the execution of
STMT statements, as well as the fetching result sets, concurrent queries, multiple
returned result sets. managing statement options. handles can be allocated per
connection.
Descriptor SQL_HANDLE_ Describes the attributes of a Describing table structures, Used in situations where data
DESC data structure or parameter, result sets, binding columns structures need to be
and allows the application to to application buffers explicitly defined, for
specify the structure of data example during parameter
to be bound/retrieved. binding or result set fetching.
They are automatically
allocated when a statement
is allocated, but can also be
allocated explicitly.

Connecting The first step is to connect to the data source so that the application can perform database operations. First the application
must allocate an environment handle, and then a connection handle. The connection handle is then used to connect to the data source.
There are two functions which can be used to connect to a data source, SQLDriverConnect and SQLConnect. The former is used to
connect to a data source using a connection string, while the latter is used to connect to a data source using a DSN.

Connection String A connection string is a string which contains the information needed to connect to a data source. It is formatted as
a semicolon separated list of key‑value pairs, however DuckDB currently only utilizes the DSN and ignores the rest of the parameters.

DSN A DSN (Data Source Name) is a string that identifies a database. It can be a file path, URL, or a database name. For example:
C:\Users\me\duckdb.db and DuckDB are both valid DSNs. More information on DSNs can be found on the ”Choosing a Data Source
or Driver” page of the SQL Server documentation.

Error Handling and Diagnostics All functions in ODBC return a code which represents the success or failure of the function. This allows
for easy error handling, as the application can simply check the return code of each function call to determine if it was successful. When
unsuccessful, the application can then use the SQLGetDiagRec function to retrieve the error information. The following table defines
the return codes:

Return code Description

SQL_SUCCESS The function completed successfully.


SQL_SUCCESS_WITH_INFO The function completed successfully, but additional information is available, including a
warning

636
DuckDB Documentation

Return code Description

SQL_ERROR The function failed.


SQL_INVALID_HANDLE The handle provided was invalid, indicating a programming error, i.e., when a handle is not
allocated before it is used, or is the wrong type
SQL_NO_DATA The function completed successfully, but no more data is available
SQL_NEED_DATA More data is needed, such as when a parameter data is sent at execution time, or
additional connection information is required.
SQL_STILL_EXECUTING A function that was asynchronously executed is still executing.

Buffers and Binding A buffer is a block of memory used to store data. Buffers are used to store data retrieved from the database, or to
send data to the database. Buffers are allocated by the application, and then bound to a column in a result set, or a parameter in a query,
using the SQLBindCol and SQLBindParameter functions. When the application fetches a row from the result set, or executes a query,
the data is stored in the buffer. When the application sends a query to the database, the data in the buffer is sent to the database.

Setting up an Application

The following is a step‑by‑step guide to setting up an application that uses ODBC to connect to a database, execute a query, and fetch the
results in C++.
Note. To install the driver as well as anything else you will need follow these instructions.

• What is ODBC?
• General Concepts

– Handles

* Handle Types
– Connecting

* Connection String
* DSN
– Error Handling and Diagnostics
– Buffers and Binding

• Setting up an Application

– 1. Include the SQL Header Files


– 2. Define the ODBC Handles and Connect to the Database
– 3. Adding a Query
– 4. Fetching Results
– 5. Go Wild
– 6. Free the Handles and Disconnecting

• Sample Application

– Sample .cpp file


– Sample CMakelists.txt file

1. Include the SQL Header Files The first step is to include the SQL header files:

#include <sql.h>
#include <sqlext.h>

These files contain the definitions of the ODBC functions, as well as the data types used by ODBC. In order to be able to use these header
files you have to have the unixodbc package installed:

637
DuckDB Documentation

brew install unixodbc


# or
sudo apt-get install unixodbc-dev
# or
sudo yum install unixODBC-devel

Remember to include the header file location in your CFLAGS.

For MAKEFILE:

CFLAGS=-I/usr/local/include
# or
CFLAGS=-/opt/homebrew/Cellar/unixodbc/2.3.11/include

For CMAKE:

include_directories(/usr/local/include)
# or
include_directories(/opt/homebrew/Cellar/unixodbc/2.3.11/include)

You also have to link the library in your CMAKE or MAKEFILE: For CMAKE:

target_link_libraries(ODBC_application /path/to/duckdb_odbc/libduckdb_odbc.dylib)

For MAKEFILE:

LDLIBS=-L/path/to/duckdb_odbc/libduckdb_odbc.dylib

2. Define the ODBC Handles and Connect to the Database

2.a. Connecting with SQLConnect Then set up the ODBC handles, allocate them, and connect to the database. First the environment
handle is allocated, then the environment is set to ODBC version 3, then the connection handle is allocated, and finally the connection is
made to the database. The following code snippet shows how to do this:

SQLHANDLE env;
SQLHANDLE dbc;

SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env);

SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0);

SQLAllocHandle(SQL_HANDLE_DBC, env, &dbc);

std::string dsn = "DSN=duckdbmemory";


SQLConnect(dbc, (SQLCHAR*)dsn.c_str(), SQL_NTS, NULL, 0, NULL, 0);

std::cout << "Connected!" << std::endl;

2.b. Connecting with SQLDriverConnect Alternatively, you can connect to the ODBC driver using SQLDriverConnect.
SQLDriverConnect accepts a connection string in which you can configure the database using any of the available DuckDB con‑
figuration options.

SQLHANDLE env;
SQLHANDLE dbc;

SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env);

SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0);

638
DuckDB Documentation

SQLAllocHandle(SQL_HANDLE_DBC, env, &dbc);

SQLCHAR str[1024];
SQLSMALLINT strl;
std::string dsn = "DSN=DuckDB;allow_unsigned_extensions=true;access_mode=READ_ONLY"
SQLDriverConnect(dbc, nullptr, (SQLCHAR*)dsn.c_str(), SQL_NTS, str, sizeof(str), &strl, SQL_DRIVER_
COMPLETE)

std::cout << "Connected!" << std::endl;

3. Adding a Query Now that the application is set up, we can add a query to it. First, we need to allocate a statement handle:

SQLHANDLE stmt;
SQLAllocHandle(SQL_HANDLE_STMT, dbc, &stmt);

Then we can execute a query:

SQLExecDirect(stmt, (SQLCHAR*)"SELECT * FROM integers", SQL_NTS);

4. Fetching Results Now that we have executed a query, we can fetch the results. First, we need to bind the columns in the result set to
buffers:

SQLLEN int_val;
SQLLEN null_val;
SQLBindCol(stmt, 1, SQL_C_SLONG, &int_val, 0, &null_val);

Then we can fetch the results:

SQLFetch(stmt);

5. Go Wild Now that we have the results, we can do whatever we want with them. For example, we can print them:

std::cout << "Value: " << int_val << std::endl;

or do any other processing we want. As well as executing more queries and doing any thing else we want to do with the database such as
inserting, updating, or deleting data.

6. Free the Handles and Disconnecting Finally, we need to free the handles and disconnect from the database. First, we need to free
the statement handle:

SQLFreeHandle(SQL_HANDLE_STMT, stmt);

Then we need to disconnect from the database:

SQLDisconnect(dbc);

And finally, we need to free the connection handle and the environment handle:

SQLFreeHandle(SQL_HANDLE_DBC, dbc);
SQLFreeHandle(SQL_HANDLE_ENV, env);

Freeing the connection and environment handles can only be done after the connection to the database has been closed. Trying to free
them before disconnecting from the database will result in an error.

Sample Application

The following is a sample application that includes a cpp file that connects to the database, executes a query, fetches the results, and prints
them. It also disconnects from the database and frees the handles, and includes a function to check the return value of ODBC functions. It
also includes a CMakeLists.txt file that can be used to build the application.

639
DuckDB Documentation

Sample .cpp file

#include <iostream>
#include <sql.h>
#include <sqlext.h>

void check_ret(SQLRETURN ret, std::string msg) {


if (ret != SQL_SUCCESS && ret != SQL_SUCCESS_WITH_INFO) {
std::cout << ret << ": " << msg << " failed" << std::endl;
exit(1);
}
if (ret == SQL_SUCCESS_WITH_INFO) {
std::cout << ret << ": " << msg << " succeeded with info" << std::endl;
}
}

int main() {
SQLHANDLE env;
SQLHANDLE dbc;
SQLRETURN ret;

ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env);


check_ret(ret, "SQLAllocHandle(env)");

ret = SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0);


check_ret(ret, "SQLSetEnvAttr");

ret = SQLAllocHandle(SQL_HANDLE_DBC, env, &dbc);


check_ret(ret, "SQLAllocHandle(dbc)");

std::string dsn = "DSN=duckdbmemory";


ret = SQLConnect(dbc, (SQLCHAR*)dsn.c_str(), SQL_NTS, NULL, 0, NULL, 0);
check_ret(ret, "SQLConnect");

std::cout << "Connected!" << std::endl;

SQLHANDLE stmt;
ret = SQLAllocHandle(SQL_HANDLE_STMT, dbc, &stmt);
check_ret(ret, "SQLAllocHandle(stmt)");

ret = SQLExecDirect(stmt, (SQLCHAR*)"SELECT * FROM integers", SQL_NTS);


check_ret(ret, "SQLExecDirect(SELECT * FROM integers)");

SQLLEN int_val;
SQLLEN null_val;
ret = SQLBindCol(stmt, 1, SQL_C_SLONG, &int_val, 0, &null_val);
check_ret(ret, "SQLBindCol");

ret = SQLFetch(stmt);
check_ret(ret, "SQLFetch");

std::cout << "Value: " << int_val << std::endl;

ret = SQLFreeHandle(SQL_HANDLE_STMT, stmt);


check_ret(ret, "SQLFreeHandle(stmt)");

ret = SQLDisconnect(dbc);
check_ret(ret, "SQLDisconnect");

640
DuckDB Documentation

ret = SQLFreeHandle(SQL_HANDLE_DBC, dbc);


check_ret(ret, "SQLFreeHandle(dbc)");

ret = SQLFreeHandle(SQL_HANDLE_ENV, env);


check_ret(ret, "SQLFreeHandle(env)");
}

Sample CMakelists.txt file


cmake_minimum_required(VERSION 3.25)
project(ODBC_Tester_App)

set(CMAKE_CXX_STANDARD 17)
include_directories(/opt/homebrew/Cellar/unixodbc/2.3.11/include)

add_executable(ODBC_Tester_App main.cpp)
target_link_libraries(ODBC_Tester_App /duckdb_odbc/libduckdb_odbc.dylib)

641
DuckDB Documentation

642
Python

Installing the Python Client

Installing via Pip

The latest release of the Python client can be installed using pip.

pip install duckdb

The pre‑release Python client can be installed using --pre.

pip install duckdb --upgrade --pre

Installing from Source

The latest Python client can be installed from source from the tools/pythonpkg directory in the DuckDB GitHub repository.

BUILD_PYTHON=1 GEN=ninja make


cd tools/pythonpkg
python setup.py install

For detailed instructions on how to compile DuckDB from source, see the Building guide.

Executing SQL in Python

SQL queries can be executed using the duckdb.sql function.

import duckdb
duckdb.sql("SELECT 42").show()

By default this will create a relation object. The result can be converted to various formats using the result conversion functions. For
example, the fetchall method can be used to convert the result to Python objects.

results = duckdb.sql("SELECT 42").fetchall()


print(results)
# [(42,)]

Several other result objects exist. For example, you can use df to convert the result to a Pandas DataFrame.

results = duckdb.sql("SELECT 42").df()


print(results)
# 42
# 0 42

By default, a global in‑memory connection will be used. Any data stored in files will be lost after shutting down the program. A connection
to a persistent database can be created using the connect function.

After connecting, SQL queries can be executed using the sql command.

con = duckdb.connect("file.db")
con.sql("CREATE TABLE integers (i INTEGER)")
con.sql("INSERT INTO integers VALUES (42)")
con.sql("SELECT * FROM integers").show()

643
DuckDB Documentation

Jupyter Notebooks

DuckDB's Python client can be used directly in Jupyter notebooks with no additional configuration if desired. However, additional libraries
can be used to simplify SQL query development. This guide will describe how to utilize those additional libraries. See other guides in the
Python section for how to use DuckDB and Python together.

In this example, we use the JupySQL package.

This example workflow is also available as a Google Colab notebook.

Library Installation

Four additional libraries improve the DuckDB experience in Jupyter notebooks.

1. jupysql

• Convert a Jupyter code cell into a SQL cell

2. Pandas

• Clean table visualizations and compatibility with other analysis

3. matplotlib

• Plotting with Python

4. duckdb‑engine (DuckDB SQLAlchemy driver)

• Used by SQLAlchemy to connect to DuckDB (optional)

# Run these pip install commands from the command line if Jupyter Notebook is not yet installed.
# Otherwise, see Google Collab link above for an in-notebook example
pip install duckdb

# Install Jupyter Notebook (Note: you can also install JupyterLab: pip install jupyterlab)
pip install notebook

# Install supporting libraries


pip install jupysql
pip install pandas
pip install matplotlib
pip install duckdb-engine

Library Import and Configuration

Open a Jupyter Notebook and import the relevant libraries.

Connecting to DuckDB Natively To connect to DuckDB, run:

import duckdb
import pandas as pd

%load_ext sql
conn = duckdb.connect()
%sql conn --alias duckdb

644
DuckDB Documentation

Connecting to DuckDB via SQLAlchemy Using duckdb_engine Alternatively, you can connect to DuckDB via SQLAlchemy using
duckdb_engine. See the performance and feature differences.

import duckdb
import pandas as pd
# No need to import duckdb_engine
# jupysql will auto-detect the driver needed based on the connection string!

# Import jupysql Jupyter extension to create SQL cells


%load_ext sql

Set configurations on jupysql to directly output data to Pandas and to simplify the output that is printed to the notebook.

%config SqlMagic.autopandas = True


%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

Connect jupysql to DuckDB using a SQLAlchemy‑style connection string. Either connect to a new in‑memory DuckDB, the default connec‑
tion or a file backed database.

%sql duckdb:///:default:

%sql duckdb:///:memory:

%sql duckdb:///path/to/file.db

Note. The %sql command and duckdb.sql share the same default connection if you provide duckdb:///:default: as the
SQLAlchemy connection string.

Querying DuckDB

Single line SQL queries can be run using %sql at the start of a line. Query results will be displayed as a Pandas DF.

%sql SELECT 'Off and flying!' AS a_duckdb_column

An entire Jupyter cell can be used as a SQL cell by placing %%sql at the start of the cell. Query results will be displayed as a Pandas DF.

%%sql
SELECT
schema_name,
function_name
FROM duckdb_functions()
ORDER BY ALL DESC
LIMIT 5

To store the query results in a Python variable, use << as an assignment operator. This can be used with both the %sql and %%sql Jupyter
magics.

%sql res << SELECT 'Off and flying!' AS a_duckdb_column

If the %config SqlMagic.autopandas = True option is set, the variable is a Pandas dataframe, otherwise, it is a ResultSet that
can be converted to Pandas with the DataFrame() function.

Querying Pandas Dataframes

DuckDB is able to find and query any dataframe stored as a variable in the Jupyter notebook.

input_df = pd.DataFrame.from_dict({"i": [1, 2, 3],


"j": ["one", "two", "three"]})

The dataframe being queried can be specified just like any other table in the FROM clause.

%sql output_df << SELECT sum(i) AS total_i FROM input_df

645
DuckDB Documentation

Visualizing DuckDB Data

The most common way to plot datasets in Python is to load them using Pandas and then use matplotlib or seaborn for plotting. This
approach requires loading all data into memory which is highly inefficient. The plotting module in JupySQL runs computations in the SQL
engine. This delegates memory management to the engine and ensures that intermediate computations do not keep eating up memory,
efficiently plotting massive datasets.

Install and Load DuckDB httpfs Extension DuckDB's httpfs extension allows Parquet and CSV files to be queried remotely over http.
These examples query a Parquet file that contains historical taxi data from NYC. Using the Parquet format allows DuckDB to only pull the
rows and columns into memory that are needed rather than downloading the entire file. DuckDB can be used to process local Parquet files
as well, which may be desirable if querying the entire Parquet file, or running multiple queries that require large subsets of the file.

%%sql
INSTALL httpfs;
LOAD httpfs;

Boxplot & Histogram To create a boxplot, call %sqlplot boxplot, passing the name of the table and the column to plot. In this case,
the name of the table is the URL of the remotely stored Parquet file.

%sqlplot boxplot --table https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet


--column trip_distance

Now, create a query that filters by the 90th percentile. Note the use of the --save, and --no-execute functions. This tells JupySQL to
store the query, but skips execution. It will be referenced in the next plotting call.

%%sql --save short_trips --no-execute


SELECT *
FROM 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet'
WHERE trip_distance < 6.3

To create a histogram, call %sqlplot histogram and pass the name of the table, the column to plot, and the number of bins. This uses
--with short-trips so JupySQL uses the query defined previously and therefore only plots a subset of the data.

%sqlplot histogram --table short_trips --column trip_distance --bins 10 --with short_trips

646
DuckDB Documentation

Summary

You now have the ability to alternate between SQL and Pandas in a simple and highly performant way! You can plot massive datasets
directly through the engine (avoiding both the download of the entire file and loading all of it into Pandas in memory). Dataframes can be
read as tables in SQL, and SQL results can be output into Dataframes. Happy analyzing!

SQL on Pandas

Pandas DataFrames stored in local variables can be queried as if they are regular tables within DuckDB.

import duckdb
import pandas

# Create a Pandas dataframe


my_df = pandas.DataFrame.from_dict({'a': [42]})

# query the Pandas DataFrame "my_df"


# Note: duckdb.sql connects to the default in-memory database connection
results = duckdb.sql("SELECT * FROM my_df").df()

The seamless integration of Pandas DataFrames to DuckDB SQL queries is allowed by replacement scans, which replace instances of ac‑
cessing the my_df table (which does not exist in DuckDB) with a table function that reads the my_df dataframe.

Import from Pandas

CREATE TABLE AS and INSERT INTO can be used to create a table from any query. We can then create tables or insert into existing
tables by referring to referring to the Pandas DataFrame in the query.

import duckdb
import pandas

647
DuckDB Documentation

# Create a Pandas dataframe


my_df = pandas.DataFrame.from_dict({'a': [42]})

# create the table "my_table" from the DataFrame "my_df"


# Note: duckdb.sql connects to the default in-memory database connection
duckdb.sql("CREATE TABLE my_table AS SELECT * FROM my_df")

# insert into the table "my_table" from the DataFrame "my_df"


duckdb.sql("INSERT INTO my_table SELECT * FROM my_df")

Export to Pandas

The result of a query can be converted to a Pandas DataFrame using the df() function.

import duckdb

# read the result of an arbitrary SQL query to a Pandas DataFrame


results = duckdb.sql("SELECT 42").df()

SQL on Apache Arrow

DuckDB can query multiple different types of Apache Arrow objects.

Apache Arrow Tables

Arrow Tables stored in local variables can be queried as if they are regular tables within DuckDB.

import duckdb
import pyarrow as pa

# connect to an in-memory database


con = duckdb.connect()

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],


'j': ["one", "two", "three", "four"]})

# query the Apache Arrow Table "my_arrow_table" and return as an Arrow Table
results = con.execute("SELECT * FROM my_arrow_table WHERE i = 2").arrow()

Apache Arrow Datasets

Arrow Datasets stored as variables can also be queried as if they were regular tables. Datasets are useful to point towards directories of
Parquet files to analyze large datasets. DuckDB will push column selections and row filters down into the dataset scan operation so that
only the necessary data is pulled into memory.

import duckdb
import pyarrow as pa
import tempfile
import pathlib
import pyarrow.parquet as pq
import pyarrow.dataset as ds

# connect to an in-memory database

648
DuckDB Documentation

con = duckdb.connect()

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],


'j': ["one", "two", "three", "four"]})

# create example Parquet files and save in a folder


base_path = pathlib.Path(tempfile.gettempdir())
(base_path / "parquet_folder").mkdir(exist_ok = True)
pq.write_to_dataset(my_arrow_table, str(base_path / "parquet_folder"))

# link to Parquet files using an Arrow Dataset


my_arrow_dataset = ds.dataset(str(base_path / 'parquet_folder/'))

# query the Apache Arrow Dataset "my_arrow_dataset" and return as an Arrow Table
results = con.execute("SELECT * FROM my_arrow_dataset WHERE i = 2").arrow()

Apache Arrow Scanners

Arrow Scanners stored as variables can also be queried as if they were regular tables. Scanners read over a dataset and select specific
columns or apply row‑wise filtering. This is similar to how DuckDB pushes column selections and filters down into an Arrow Dataset, but
using Arrow compute operations instead. Arrow can use asynchronous IO to quickly access files.

import duckdb
import pyarrow as pa
import tempfile
import pathlib
import pyarrow.parquet as pq
import pyarrow.dataset as ds
import pyarrow.compute as pc

# connect to an in-memory database


con = duckdb.connect()

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],


'j': ["one", "two", "three", "four"]})

# create example Parquet files and save in a folder


base_path = pathlib.Path(tempfile.gettempdir())
(base_path / "parquet_folder").mkdir(exist_ok = True)
pq.write_to_dataset(my_arrow_table, str(base_path / "parquet_folder"))

# link to Parquet files using an Arrow Dataset


my_arrow_dataset = ds.dataset(str(base_path / 'parquet_folder/'))

# define the filter to be applied while scanning


# equivalent to "WHERE i = 2"
scanner_filter = (pc.field("i") == pc.scalar(2))

arrow_scanner = ds.Scanner.from_dataset(my_arrow_dataset, filter = scanner_filter)

# query the Apache Arrow scanner "arrow_scanner" and return as an Arrow Table
results = con.execute("SELECT * FROM arrow_scanner").arrow()

Apache Arrow RecordBatchReaders

Arrow RecordBatchReaders are a reader for Arrow's streaming binary format and can also be queried directly as if they were tables. This

649
DuckDB Documentation

streaming format is useful when sending Arrow data for tasks like interprocess communication or communicating between language run‑
times.

import duckdb
import pyarrow as pa

# connect to an in-memory database


con = duckdb.connect()

my_recordbatch = pa.RecordBatch.from_pydict({'i': [1, 2, 3, 4],


'j': ["one", "two", "three", "four"]})

my_recordbatchreader = pa.ipc.RecordBatchReader.from_batches(my_recordbatch.schema, [my_recordbatch])

# query the Apache Arrow RecordBatchReader "my_recordbatchreader" and return as an Arrow Table
results = con.execute("SELECT * FROM my_recordbatchreader WHERE i = 2").arrow()

Import from Apache Arrow

CREATE TABLE AS and INSERT INTO can be used to create a table from any query. We can then create tables or insert into existing
tables by referring to referring to the Apache Arrow object in the query. This example imports from an Arrow Table, but DuckDB can query
different Apache Arrow formats as seen in the SQL on Arrow guide.

import duckdb
import pyarrow as pa

# connect to an in-memory database


my_arrow = pa.Table.from_pydict({'a': [42]})

# create the table "my_table" from the DataFrame "my_arrow"


duckdb.sql("CREATE TABLE my_table AS SELECT * FROM my_arrow")

# insert into the table "my_table" from the DataFrame "my_arrow"


duckdb.sql("INSERT INTO my_table SELECT * FROM my_arrow")

Export to Apache Arrow

All results of a query can be exported to an Apache Arrow Table using the arrow function. Alternatively, results can be returned as a
RecordBatchReader using the fetch_record_batch function and results can be read one batch at a time. In addition, relations built
using DuckDB's Relational API can also be exported.

Export to an Arrow Table

import duckdb
import pyarrow as pa

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],


'j': ["one", "two", "three", "four"]})

# query the Apache Arrow Table "my_arrow_table" and return as an Arrow Table
results = duckdb.sql("SELECT * FROM my_arrow_table").arrow()

650
DuckDB Documentation

Export as a RecordBatchReader

import duckdb
import pyarrow as pa

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],


'j': ["one", "two", "three", "four"]})

# query the Apache Arrow Table "my_arrow_table" and return as an Arrow RecordBatchReader
chunk_size = 1_000_000
results = duckdb.sql("SELECT * FROM my_arrow_table").fetch_record_batch(chunk_size)

# Loop through the results. A StopIteration exception is thrown when the RecordBatchReader is empty
while True:
try:
# Process a single chunk here (just printing as an example)
print(results.read_next_batch().to_pandas())
except StopIteration:
print('Already fetched all batches')
break

Export from Relational API

Arrow objects can also be exported from the Relational API. A relation can be converted to an Arrow table using the arrow or to_arrow_
table functions, or a record batch using record_batch. A result can be exported to an Arrow table with arrow or the alias fetch_
arrow_table, or to a RecordBatchReader using fetch_arrow_reader.

import duckdb

# connect to an in-memory database


con = duckdb.connect()

con.execute('CREATE TABLE integers (i integer)')


con.execute('INSERT INTO integers VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (NULL)')

# Create a relation from the table and export the entire relation as Arrow
rel = con.table("integers")
relation_as_arrow = rel.arrow() # or .to_arrow_table()

# Or, calculate a result using that relation and export that result to Arrow
res = rel.aggregate("sum(i)").execute()
result_as_arrow = res.arrow() # or fetch_arrow_table()

Relational API on Pandas

DuckDB offers a relational API that can be used to chain together query operations. These are lazily evaluated so that DuckDB can optimize
their execution. These operators can act on Pandas DataFrames, DuckDB tables or views (which can point to any underlying storage format
that DuckDB can read, such as CSV or Parquet files, etc.). Here we show a simple example of reading from a Pandas DataFrame and returning
a DataFrame.

import duckdb
import pandas

# connect to an in-memory database


con = duckdb.connect()

651
DuckDB Documentation

input_df = pandas.DataFrame.from_dict({'i': [1, 2, 3, 4],


'j': ["one", "two", "three", "four"]})

# create a DuckDB relation from a dataframe


rel = con.from_df(input_df)

# chain together relational operators (this is a lazy operation, so the operations are not yet executed)
# equivalent to: SELECT i, j, i*2 AS two_i FROM input_df ORDER BY i DESC LIMIT 2
transformed_rel = rel.filter('i >= 2').project('i, j, i*2 as two_i').order('i desc').limit(2)

# trigger execution by requesting .df() of the relation


# .df() could have been added to the end of the chain above - it was separated for clarity
output_df = transformed_rel.df()

Relational operators can also be used to group rows, aggregate, find distinct combinations of values, join, union, and more. They are also
able to directly insert results into a DuckDB table or write to a CSV.

Please see these additional examples and the available relational methods on the DuckDBPyRelation class.

Multiple Python Threads

This page demonstrates how to simultaneously insert into and read from a DuckDB database across multiple Python threads. This could
be useful in scenarios where new data is flowing in and an analysis should be periodically re‑run. Note that this is all within a single Python
process (see the FAQ for details on DuckDB concurrency). Feel free to follow along in this Google Colab notebook.

Setup

First, import DuckDB and several modules from the Python standard library. Note: if using Pandas, add import pandas at the top of the
script as well (as it must be imported prior to the multi‑threading). Then connect to a file‑backed DuckDB database and create an example
table to store inserted data. This table will track the name of the thread that completed the insert and automatically insert the timestamp
when that insert occurred using the DEFAULT expression.

import duckdb
from threading import Thread, current_thread
import random

duckdb_con = duckdb.connect('my_peristent_db.duckdb')
# Use connect without parameters for an in-memory database
# duckdb_con = duckdb.connect()
duckdb_con.execute("""
CREATE OR REPLACE TABLE my_inserts (
thread_name VARCHAR,
insert_time TIMESTAMP DEFAULT current_timestamp
)
""")

Reader and Writer Functions

Next, define functions to be executed by the writer and reader threads. Each thread must use the .cursor() method to create a thread‑
local connection to the same DuckDB file based on the original connection. This approach also works with in‑memory DuckDB databases.

def write_from_thread(duckdb_con):
# Create a DuckDB connection specifically for this thread
local_con = duckdb_con.cursor()
# Insert a row with the name of the thread. insert_time is auto-generated.

652
DuckDB Documentation

thread_name = str(current_thread().name)
result = local_con.execute("""
INSERT INTO my_inserts (thread_name)
VALUES (?)
""", (thread_name,)).fetchall()

def read_from_thread(duckdb_con):
# Create a DuckDB connection specifically for this thread
local_con = duckdb_con.cursor()
# Query the current row count
thread_name = str(current_thread().name)
results = local_con.execute("""
SELECT
? AS thread_name,
count(*) AS row_counter,
current_timestamp
FROM my_inserts
""", (thread_name,)).fetchall()
print(results)

Create Threads

We define how many writers and readers to use, and define a list to track all of the Threads that will be created. Then, create first writer
and then reader Threads. Next, shuffle them so that they will be kicked off in a random order to simulate simultaneous writers and readers.
Note that the Threads have not yet been executed, only defined.

write_thread_count = 50
read_thread_count = 5
threads = []

# Create multiple writer and reader threads (in the same process)
# Pass in the same connection as an argument
for i in range(write_thread_count):
threads.append(Thread(target = write_from_thread,
args = (duckdb_con,),
name = 'write_thread_' + str(i)))

for j in range(read_thread_count):
threads.append(Thread(target = read_from_thread,
args = (duckdb_con,),
name = 'read_thread_' + str(j)))

# Shuffle the threads to simulate a mix of readers and writers


random.seed(6) # Set the seed to ensure consistent results when testing
random.shuffle(threads)

Run Threads and Show Results

Now, kick off all threads to run in parallel, then wait for all of them to finish before printing out the results. Note that the timestamps of
readers and writers are interspersed as expected due to the randomization.

# Kick off all threads in parallel


for thread in threads:
thread.start()

# Ensure all threads complete before printing final results


for thread in threads:

653
DuckDB Documentation

thread.join()

print(duckdb_con.execute("""
SELECT
*
FROM my_inserts
ORDER BY
insert_time
""").df())

Integration with Ibis

Ibis is a Python dataframe library that supports 15+ backends, with DuckDB as the default. Ibis with DuckDB provides a Pythonic interface
for SQL with great performance.

Installation

You can pip install Ibis with the DuckDB backend:

pip install 'ibis-framework[duckdb]'

or use conda:

conda install ibis-framework

or use mamba:

mamba install ibis-framework

Create a Database File

Ibis can work with several file types, but at its core, it connects to existing databases and interacts with the data there. You can get started
with your own DuckDB databases or create a new one with example data.

import ibis

con = ibis.connect("duckdb://penguins.ddb")
con.create_table(
"penguins", ibis.examples.penguins.fetch().to_pyarrow(), overwrite = True
)

# Output:
DatabaseTable: penguins
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm int64
body_mass_g int64
sex string
year int64

You can now see the example dataset copied over to the database:

# reconnect to the persisted database (dropping temp tables)


con = ibis.connect("duckdb://penguins.ddb")
con.list_tables()

654
DuckDB Documentation

# Output:
['penguins']

There's one table, called penguins. We can ask Ibis to give us an object that we can interact with.

penguins = con.table("penguins")
penguins

# Output:
DatabaseTable: penguins
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm int64
body_mass_g int64
sex string
year int64

Ibis is lazily evaluated, so instead of seeing the data, we see the schema of the table. To peek at the data, we can call head and then to_
pandas to get the first few rows of the table as a pandas DataFrame.

penguins.head().to_pandas()

species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year


0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN None 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007

to_pandas takes the existing lazy table expression and evaluates it. If we leave it off, you'll see the Ibis representation of the table expres‑
sion that to_pandas will evaluate (when you're ready!).

penguins.head()

# Output:
r0 := DatabaseTable: penguins
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm int64
body_mass_g int64
sex string
year int64

Limit[r0, n=5]

Ibis returns results as a pandas DataFrame using to_pandas, but isn't using pandas to perform any of the computation. The query is
executed by DuckDB. Only when to_pandas is called does Ibis then pull back the results and convert them into a DataFrame.

Interactive Mode

For the rest of this intro, we'll turn on interactive mode, which partially executes queries to give users a preview of the results. There is a
small difference in the way the output is formatted, but otherwise this is the same as calling to_pandas on the table expression with a
limit of 10 result rows returned.

ibis.options.interactive = True
penguins.head()

655
DuckDB Documentation

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year

┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ nan │ nan │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Common Operations

Ibis has a collection of useful table methods to manipulate and query the data in a table.

filter filter allows you to select rows based on a condition or set of conditions.

We can filter so we only have penguins of the species Adelie:

penguins.filter(penguins.species == "Gentoo")

┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Gentoo │ Biscoe │ 46.1 │ 13.2 │ 211 │ 4500 │ female │ 2007 │
│ Gentoo │ Biscoe │ 50.0 │ 16.3 │ 230 │ 5700 │ male │ 2007 │
│ Gentoo │ Biscoe │ 48.7 │ 14.1 │ 210 │ 4450 │ female │ 2007 │
│ Gentoo │ Biscoe │ 50.0 │ 15.2 │ 218 │ 5700 │ male │ 2007 │
│ Gentoo │ Biscoe │ 47.6 │ 14.5 │ 215 │ 5400 │ male │ 2007 │
│ Gentoo │ Biscoe │ 46.5 │ 13.5 │ 210 │ 4550 │ female │ 2007 │
│ Gentoo │ Biscoe │ 45.4 │ 14.6 │ 211 │ 4800 │ female │ 2007 │
│ Gentoo │ Biscoe │ 46.7 │ 15.3 │ 219 │ 5200 │ male │ 2007 │
│ Gentoo │ Biscoe │ 43.3 │ 13.4 │ 209 │ 4400 │ female │ 2007 │
│ Gentoo │ Biscoe │ 46.8 │ 15.4 │ 215 │ 5150 │ male │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Or filter for Gentoo penguins that have a body mass larger than 6 kg.

penguins.filter((penguins.species == "Gentoo") & (penguins.body_mass_g > 6000))

┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Gentoo │ Biscoe │ 49.2 │ 15.2 │ 221 │ 6300 │ male │ 2007 │
│ Gentoo │ Biscoe │ 59.6 │ 17.0 │ 230 │ 6050 │ male │ 2007 │
└─────────┴────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

You can use any boolean comparison in a filter (although if you try to do something like use < on a string, Ibis will yell at you).

select Your data analysis might not require all the columns present in a given table. select lets you pick out only those columns that
you want to work with.

To select a column you can use the name of the column as a string:

656
DuckDB Documentation

penguins.select("species", "island", "year").limit(3)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │
├─────────┼───────────┼───────┤
│ Adelie │ Torgersen │ 2007 │
│ Adelie │ Torgersen │ 2007 │
│ Adelie │ Torgersen │ 2007 │
│ … │ … │ … │
└─────────┴───────────┴───────┘

Or you can use column objects directly (this can be convenient when paired with tab‑completion):

penguins.select(penguins.species, penguins.island, penguins.year).limit(3)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │
├─────────┼───────────┼───────┤
│ Adelie │ Torgersen │ 2007 │
│ Adelie │ Torgersen │ 2007 │
│ Adelie │ Torgersen │ 2007 │
│ … │ … │ … │
└─────────┴───────────┴───────┘

Or you can mix‑and‑match:

penguins.select("species", "island", penguins.year).limit(3)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │
├─────────┼───────────┼───────┤
│ Adelie │ Torgersen │ 2007 │
│ Adelie │ Torgersen │ 2007 │
│ Adelie │ Torgersen │ 2007 │
│ … │ … │ … │
└─────────┴───────────┴───────┘

mutate mutate lets you add new columns to your table, derived from the values of existing columns.

penguins.mutate(bill_length_cm=penguins.bill_length_mm / 10)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year
┃ bill_length_cm ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64
│ float64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼──────
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
3.91 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
3.95 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
4.03 │

657
DuckDB Documentation

│ Adelie │ Torgersen │ nan │ nan │ NULL │ NULL │ NULL │ 2007 │


nan │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
3.67 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
3.93 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
3.89 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
3.92 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
3.41 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
4.20 │
│ … │ … │ … │ … │ … │ … │ … │ … │
… │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴──────

Notice that the table is a little too wide to display all the columns now (depending on your screen‑size). bill_length is now present in
millimeters and centimeters. Use a select to trim down the number of columns we're looking at.

penguins.mutate(bill_length_cm=penguins.bill_length_mm / 10).select(
"species",
"island",
"bill_depth_mm",
"flipper_length_mm",
"body_mass_g",
"sex",
"year",
"bill_length_cm",
)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ bill_length_cm

┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ int64 │ int64 │ string │ int64 │ float64 │
├─────────┼───────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼────────────────┤
│ Adelie │ Torgersen │ 18.7 │ 181 │ 3750 │ male │ 2007 │ 3.91 │
│ Adelie │ Torgersen │ 17.4 │ 186 │ 3800 │ female │ 2007 │ 3.95 │
│ Adelie │ Torgersen │ 18.0 │ 195 │ 3250 │ female │ 2007 │ 4.03 │
│ Adelie │ Torgersen │ nan │ NULL │ NULL │ NULL │ 2007 │ nan │
│ Adelie │ Torgersen │ 19.3 │ 193 │ 3450 │ female │ 2007 │ 3.67 │
│ Adelie │ Torgersen │ 20.6 │ 190 │ 3650 │ male │ 2007 │ 3.93 │
│ Adelie │ Torgersen │ 17.8 │ 181 │ 3625 │ female │ 2007 │ 3.89 │
│ Adelie │ Torgersen │ 19.6 │ 195 │ 4675 │ male │ 2007 │ 3.92 │
│ Adelie │ Torgersen │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │ 3.41 │
│ Adelie │ Torgersen │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │ 4.20 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴────────────────┘

selectors Typing out all of the column names except one is a little annoying. Instead of doing that again, we can use a selector to
quickly select or deselect groups of columns.

import ibis.selectors as s

penguins.mutate(bill_length_cm=penguins.bill_length_mm / 10).select(
~s.matches("bill_length_mm")

658
DuckDB Documentation

# match every column except `bill_length_mm`


)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ bill_length_cm

┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ int64 │ int64 │ string │ int64 │ float64 │
├─────────┼───────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼────────────────┤
│ Adelie │ Torgersen │ 18.7 │ 181 │ 3750 │ male │ 2007 │ 3.91 │
│ Adelie │ Torgersen │ 17.4 │ 186 │ 3800 │ female │ 2007 │ 3.95 │
│ Adelie │ Torgersen │ 18.0 │ 195 │ 3250 │ female │ 2007 │ 4.03 │
│ Adelie │ Torgersen │ nan │ NULL │ NULL │ NULL │ 2007 │ nan │
│ Adelie │ Torgersen │ 19.3 │ 193 │ 3450 │ female │ 2007 │ 3.67 │
│ Adelie │ Torgersen │ 20.6 │ 190 │ 3650 │ male │ 2007 │ 3.93 │
│ Adelie │ Torgersen │ 17.8 │ 181 │ 3625 │ female │ 2007 │ 3.89 │
│ Adelie │ Torgersen │ 19.6 │ 195 │ 4675 │ male │ 2007 │ 3.92 │
│ Adelie │ Torgersen │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │ 3.41 │
│ Adelie │ Torgersen │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │ 4.20 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴────────────────┘

You can also use a selector alongside a column name.

penguins.select("island", s.numeric())

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┓
┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━┩
│ string │ float64 │ float64 │ int64 │ int64 │ int64 │
├───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼───────┤
│ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ 2007 │
│ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ 2007 │
│ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ 2007 │
│ Torgersen │ nan │ nan │ NULL │ NULL │ 2007 │
│ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ 2007 │
│ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ 2007 │
│ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ 2007 │
│ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ 2007 │
│ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ 2007 │
│ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ 2007 │
│ … │ … │ … │ … │ … │ … │
└───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴───────┘

You can read more about selectors in the docs!

order_by order_by arranges the values of one or more columns in ascending or descending order.

By default, ibis sorts in ascending order:

penguins.order_by(penguins.flipper_length_mm).select(
"species", "island", "flipper_length_mm"
)

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ flipper_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ int64 │
├───────────┼───────────┼───────────────────┤
│ Adelie │ Biscoe │ 172 │

659
DuckDB Documentation

│ Adelie │ Biscoe │ 174 │


│ Adelie │ Torgersen │ 176 │
│ Adelie │ Dream │ 178 │
│ Adelie │ Dream │ 178 │
│ Adelie │ Dream │ 178 │
│ Chinstrap │ Dream │ 178 │
│ Adelie │ Dream │ 179 │
│ Adelie │ Torgersen │ 180 │
│ Adelie │ Biscoe │ 180 │
│ … │ … │ … │
└───────────┴───────────┴───────────────────┘

You can sort in descending order using the desc method of a column:

penguins.order_by(penguins.flipper_length_mm.desc()).select(
"species", "island", "flipper_length_mm"
)

┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ int64 │
├─────────┼────────┼───────────────────┤
│ Gentoo │ Biscoe │ 231 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 229 │
│ Gentoo │ Biscoe │ 229 │
│ … │ … │ … │
└─────────┴────────┴───────────────────┘

Or you can use ibis.desc

penguins.order_by(ibis.desc("flipper_length_mm")).select(
"species", "island", "flipper_length_mm"
)

┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ int64 │
├─────────┼────────┼───────────────────┤
│ Gentoo │ Biscoe │ 231 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 230 │
│ Gentoo │ Biscoe │ 229 │
│ Gentoo │ Biscoe │ 229 │
│ … │ … │ … │
└─────────┴────────┴───────────────────┘

660
DuckDB Documentation

aggregate Ibis has several aggregate functions available to help summarize data.

mean, max, min, count, sum (the list goes on).

To aggregate an entire column, call the corresponding method on that column.

penguins.flipper_length_mm.mean()

# Output:
200.91520467836258

You can compute multiple aggregates at once using the aggregate method:

penguins.aggregate([penguins.flipper_length_mm.mean(), penguins.bill_depth_mm.max()])

┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Mean(flipper_length_mm) ┃ Max(bill_depth_mm) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ float64 │ float64 │
├─────────────────────────┼────────────────────┤
│ 200.915205 │ 21.5 │
└─────────────────────────┴────────────────────┘

But aggregate really shines when it's paired with group_by.

group_by group_by creates groupings of rows that have the same value for one or more columns.

But it doesn't do much on its own ‑‑ you can pair it with aggregate to get a result.

penguins.group_by("species").aggregate()

┏━━━━━━━━━━━┓
┃ species ┃
┡━━━━━━━━━━━┩
│ string │
├───────────┤
│ Adelie │
│ Gentoo │
│ Chinstrap │
└───────────┘

We grouped by the species column and handed it an ”empty” aggregate command. The result of that is a column of the unique values
in the species column.

If we add a second column to the group_by, we'll get each unique pairing of the values in those columns.

penguins.group_by(["species", "island"]).aggregate()

┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species ┃ island ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ string │
├───────────┼───────────┤
│ Adelie │ Torgersen │
│ Adelie │ Biscoe │
│ Adelie │ Dream │
│ Gentoo │ Biscoe │
│ Chinstrap │ Dream │
└───────────┴───────────┘

Now, if we add an aggregation function to that, we start to really open things up.

penguins.group_by(["species", "island"]).aggregate(penguins.bill_length_mm.mean())

661
DuckDB Documentation

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ Mean(bill_length_mm) ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ float64 │
├───────────┼───────────┼──────────────────────┤
│ Adelie │ Torgersen │ 38.950980 │
│ Adelie │ Biscoe │ 38.975000 │
│ Adelie │ Dream │ 38.501786 │
│ Gentoo │ Biscoe │ 47.504878 │
│ Chinstrap │ Dream │ 48.833824 │
└───────────┴───────────┴──────────────────────┘

By adding that mean to the aggregate, we now have a concise way to calculate aggregates over each of the distinct groups in the group_
by. And we can calculate as many aggregates as we need.

penguins.group_by(["species", "island"]).aggregate(
[penguins.bill_length_mm.mean(), penguins.flipper_length_mm.max()]
)

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ Mean(bill_length_mm) ┃ Max(flipper_length_mm) ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ int64 │
├───────────┼───────────┼──────────────────────┼────────────────────────┤
│ Adelie │ Torgersen │ 38.950980 │ 210 │
│ Adelie │ Biscoe │ 38.975000 │ 203 │
│ Adelie │ Dream │ 38.501786 │ 208 │
│ Gentoo │ Biscoe │ 47.504878 │ 231 │
│ Chinstrap │ Dream │ 48.833824 │ 212 │
└───────────┴───────────┴──────────────────────┴────────────────────────┘

If we need more specific groups, we can add to the group_by.

penguins.group_by(["species", "island", "sex"]).aggregate(


[penguins.bill_length_mm.mean(), penguins.flipper_length_mm.max()]
)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ sex ┃ Mean(bill_length_mm) ┃ Max(flipper_length_mm) ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ string │ float64 │ int64 │
├─────────┼───────────┼────────┼──────────────────────┼────────────────────────┤
│ Adelie │ Torgersen │ male │ 40.586957 │ 210 │
│ Adelie │ Torgersen │ female │ 37.554167 │ 196 │
│ Adelie │ Torgersen │ NULL │ 37.925000 │ 193 │
│ Adelie │ Biscoe │ female │ 37.359091 │ 199 │
│ Adelie │ Biscoe │ male │ 40.590909 │ 203 │
│ Adelie │ Dream │ female │ 36.911111 │ 202 │
│ Adelie │ Dream │ male │ 40.071429 │ 208 │
│ Adelie │ Dream │ NULL │ 37.500000 │ 179 │
│ Gentoo │ Biscoe │ female │ 45.563793 │ 222 │
│ Gentoo │ Biscoe │ male │ 49.473770 │ 231 │
│ … │ … │ … │ … │ … │
└─────────┴───────────┴────────┴──────────────────────┴────────────────────────┘

Chaining It All Together

We've already chained some Ibis calls together. We used mutate to create a new column and then select to only view a subset of the
new table. We were just chaining group_by with aggregate.

662
DuckDB Documentation

There's nothing stopping us from putting all of these concepts together to ask questions of the data.

How about:

• What was the largest female penguin (by body mass) on each island in the year 2008?

penguins.filter((penguins.sex == "female") & (penguins.year == 2008)).group_by(


["island"]
).aggregate(penguins.body_mass_g.max())

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ island ┃ Max(body_mass_g) ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │
├───────────┼──────────────────┤
│ Biscoe │ 5200 │
│ Torgersen │ 3800 │
│ Dream │ 3900 │
└───────────┴──────────────────┘

• What about the largest male penguin (by body mass) on each island for each year of data collection?

penguins.filter(penguins.sex == "male").group_by(["island", "year"]).aggregate(


penguins.body_mass_g.max().name("max_body_mass")
).order_by(["year", "max_body_mass"])

┏━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ island ┃ year ┃ max_body_mass ┃
┡━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string │ int64 │ int64 │
├───────────┼───────┼───────────────┤
│ Dream │ 2007 │ 4650 │
│ Torgersen │ 2007 │ 4675 │
│ Biscoe │ 2007 │ 6300 │
│ Torgersen │ 2008 │ 4700 │
│ Dream │ 2008 │ 4800 │
│ Biscoe │ 2008 │ 6000 │
│ Torgersen │ 2009 │ 4300 │
│ Dream │ 2009 │ 4475 │
│ Biscoe │ 2009 │ 6000 │
└───────────┴───────┴───────────────┘

Learn More

That's all for this quick‑start guide. If you want to learn more, check out the Ibis documentation.

Integration with Polars

Polars is a DataFrames library built in Rust with bindings for Python and Node.js. It uses Apache Arrow's columnar format as its memory
model. DuckDB can read Polars DataFrames and convert query results to Polars DataFrames. It does this internally using the efficient
Apache Arrow integration. Note that the pyarrow library must be installed for the integration to work.

Installation

pip install duckdb


pip install -U 'polars[pyarrow]'

663
DuckDB Documentation

Polars to DuckDB

DuckDB can natively query Polars DataFrames by referring to the name of Polars DataFrames as they exist in the current scope.

import duckdb
import polars as pl

df = pl.DataFrame(
{
"A": [1, 2, 3, 4, 5],
"fruits": ["banana", "banana", "apple", "apple", "banana"],
"B": [5, 4, 3, 2, 1],
"cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
}
)
duckdb.sql("SELECT * FROM df").show()

DuckDB to Polars

DuckDB can output results as Polars DataFrames using the .pl() result‑conversion method.

df = duckdb.sql("""
SELECT 1 AS id, 'banana' AS fruit
UNION ALL
SELECT 2, 'apple'
UNION ALL
SELECT 3, 'mango'"""
).pl()
print(df)

shape: (3, 2)
┌─────┬────────┐
│ id ┆ fruit │
│ --- ┆ --- │
│ i32 ┆ str │
╞═════╪════════╡
│ 1 ┆ banana │
│ 2 ┆ apple │
│ 3 ┆ mango │
└─────┴────────┘

To learn more about Polars, feel free to explore their Python API Reference.

Using fsspec Filesystems

DuckDB support for fsspec filesystems allows querying data in filesystems that DuckDB's httpfs extension does not support. fsspec
has a large number of inbuilt filesystems, and there are also many external implementations. This capability is only available in DuckDB's
Python client because fsspec is a Python library, while the httpfs extension is available in many DuckDB clients.

Example

The following is an example of using fsspec to query a file in Google Cloud Storage (instead of using their S3‑compatible API).

Firstly, you must install duckdb and fsspec, and a filesystem interface of your choice.

pip install duckdb fsspec gcsfs

664
DuckDB Documentation

Then, you can register whichever filesystem you'd like to query:

import duckdb
from fsspec import filesystem

# this line will throw an exception if the appropriate filesystem interface is not installed
duckdb.register_filesystem(filesystem('gcs'))

duckdb.sql("SELECT * FROM read_csv('gcs:///bucket/file.csv')")

Note. These filesystems are not implemented in C++, hence, their performance may not be comparable to the ones provided by the
httpfs extension. It is also worth noting that as they are third‑party libraries, they may contain bugs that are beyond our control.

665
DuckDB Documentation

666
SQL Features

Friendly SQL

DuckDB offers several advanced SQL features as well as extensions to the SQL syntax. We call these colloquially as ”friendly SQL”.

Note. Several of these features are also supported in other systems while some are (currently) exclusive to DuckDB.

Clauses

• CREATE OR REPLACE TABLE


• CREATE TABLE ... AS (CTAS)
• DESCRIBE
• FROM‑first syntax
• GROUP BY ALL
• INSERT INTO ... BY NAME
• ORDER BY ALL
• PIVOT UNPIVOT
• SELECT * EXCLUDE
• SELECT * REPLACE
• SUMMARIZE
• UNION BY NAME

Query Features

• Column aliases in WHERE, GROUP BY, and HAVING


• COLUMNS() expression

– with regular expressions


– with EXCLUDE and REPLACE
– with lambda functions

Literals and Identifiers

• Case‑insensitivity while maintaining case of entities in the catalog


• Deduplicating identifiers
• Underscores as digit separators

Data Types

• MAP data type


• UNION data type

667
DuckDB Documentation

Functions and Expressions

• Formatters: format() function with the fmt syntax and the printf() function
• List comprehensions
• List slicing
• String slicing
• STRUCT.* notation
• Simple LIST and STRUCT creation

Join Types

• ASOF joins
• LATERAL joins
• POSITIONAL joins

Trailing Commas

DuckDB allows trailing commas, both when listing entities (e.g., column and table names) and when constructing LIST items. For example,
the following query works:

SELECT
42 AS x,
['a', 'b', 'c',] AS y,
'hello world' AS z,
;

See Also

• Friendlier SQL with DuckDB blog post


• Even Friendlier SQL with DuckDB blog post
• SQL Gymnastics: Bending SQL into flexible new shapes blog post

AsOf Join

What is an AsOf Join?

Time series data is not always perfectly aligned. Clocks may be slightly off, or there may be a delay between cause and effect. This can
make connecting two sets of ordered data challenging. AsOf joins are a tool for solving this and other similar problems.

One of the problems that AsOf joins are used to solve is finding the value of a varying property at a specific point in time. This use case is
so common that it is where the name came from:

Give me the value of the property as of this time.

More generally, however, AsOf joins embody some common temporal analytic semantics, which can be cumbersome and slow to imple‑
ment in standard SQL.

Portfolio Example Data Set

Let's start with a concrete example. Suppose we have a table of stock prices with timestamps:

668
DuckDB Documentation

ticker when price

APPL 2001‑01‑01 00:00:00 1


APPL 2001‑01‑01 00:01:00 2
APPL 2001‑01‑01 00:02:00 3
MSFT 2001‑01‑01 00:00:00 1
MSFT 2001‑01‑01 00:01:00 2
MSFT 2001‑01‑01 00:02:00 3
GOOG 2001‑01‑01 00:00:00 1
GOOG 2001‑01‑01 00:01:00 2
GOOG 2001‑01‑01 00:02:00 3

We have another table containing portfolio holdings at various points in time:

ticker when shares

APPL 2000‑12‑31 23:59:30 5.16


APPL 2001‑01‑01 00:00:30 2.94
APPL 2001‑01‑01 00:01:30 24.13
GOOG 2000‑12‑31 23:59:30 9.33
GOOG 2001‑01‑01 00:00:30 23.45
GOOG 2001‑01‑01 00:01:30 10.58
DATA 2000‑12‑31 23:59:30 6.65
DATA 2001‑01‑01 00:00:30 17.95
DATA 2001‑01‑01 00:01:30 18.37

To load these tables to DuckDB, run:

CREATE TABLE prices AS FROM 'https://duckdb.org/data/prices.csv';


CREATE TABLE holdings AS FROM 'https://duckdb.org/data/holdings.csv';

Inner AsOf Joins

We can compute the value of each holding at that point in time by finding the most recent price before the holding's timestamp by using
an AsOf Join:

SELECT h.ticker, h.when, price * shares AS value


FROM holdings h
ASOF JOIN prices p
ON h.ticker = p.ticker
AND h.when >= p.when;

This attaches the value of the holding at that time to each row:

ticker when value

APPL 2001‑01‑01 00:00:30 2.94

669
DuckDB Documentation

ticker when value

APPL 2001‑01‑01 00:01:30 48.26


GOOG 2001‑01‑01 00:00:30 23.45
GOOG 2001‑01‑01 00:01:30 21.16

It essentially executes a function defined by looking up nearby values in the prices table. Note also that missing ticker values do not
have a match and don't appear in the output.

Outer AsOf Joins

Because AsOf produces at most one match from the right hand side, the left side table will not grow as a result of the join, but it could shrink
if there are missing times on the right. To handle this situation, you can use an outer AsOf Join:

SELECT h.ticker, h.when, price * shares AS value


FROM holdings h
ASOF LEFT JOIN prices p
ON h.ticker = p.ticker
AND h.when >= p.when
ORDER BY ALL;

As you might expect, this will produce NULL prices and values instead of dropping left side rows when there is no ticker or the time is before
the prices begin.

ticker when value

APPL 2000‑12‑31 23:59:30


APPL 2001‑01‑01 00:00:30 2.94
APPL 2001‑01‑01 00:01:30 48.26
GOOG 2000‑12‑31 23:59:30
GOOG 2001‑01‑01 00:00:30 23.45
GOOG 2001‑01‑01 00:01:30 21.16
DATA 2000‑12‑31 23:59:30
DATA 2001‑01‑01 00:00:30
DATA 2001‑01‑01 00:01:30

AsOf Joins with the USING Keyword

So far we have been explicit about specifying the conditions for AsOf, but SQL also has a simplified join condition syntax for the common
case where the column names are the same in both tables. This syntax uses the USING keyword to list the fields that should be compared
for equality. AsOf also supports this syntax, but with two restrictions:

• The last field is the inequality


• The inequality is >= (the most common case)

Our first query can then be written as:

SELECT ticker, h.when, price * shares AS value


FROM holdings h
ASOF JOIN prices p USING (ticker, when);

670
DuckDB Documentation

Be aware that if you don't explicitly list the columns in the SELECT, the ordering field value will be the probe value, not the build value.
For a natural join, this is not an issue because all the conditions are equalities, but for AsOf, one side has to be chosen. Since AsOf can be
viewed as a lookup function, it is more natural to return the ”function arguments” than the function internals.

See Also

For implementation details, see the blog post ”DuckDB's AsOf joins: Fuzzy Temporal Lookups”.

Full‑Text Search

DuckDB supports full‑text search via the fts extension. A full‑text index allows for a query to quickly search for all occurrences of individual
words within longer text strings.

Example: Shakespeare Corpus

Here's an example of building a full‑text index of Shakespeare's plays.

CREATE TABLE corpus AS


SELECT * FROM 'https://blobs.duckdb.org/data/shakespeare.parquet';

DESCRIBE corpus;

┌─────────────┬─────────────┬─────────┐
│ column_name │ column_type │ null │
├─────────────┼─────────────┼─────────┤
│ line_id │ VARCHAR │ YES │
│ play_name │ VARCHAR │ YES │
│ line_number │ VARCHAR │ YES │
│ speaker │ VARCHAR │ YES │
│ text_entry │ VARCHAR │ YES │
└─────────────┴─────────────┴─────────┘

The text of each line is in text_entry, and a unique key for each line is in line_id.

Creating a Full‑Text Search Index

First, we create the index, specifying the table name, the unique id column, and the column(s) to index. We will just index the single column
text_entry, which contains the text of the lines in the play.

PRAGMA create_fts_index('corpus', 'line_id', 'text_entry');

The table is now ready to query using the Okapi BM25 ranking function. Rows with no match return a null score.

What does Shakespeare say about butter?

SELECT
fts_main_corpus.match_bm25(line_id, 'butter') AS score,
line_id, play_name, speaker, text_entry
FROM corpus
WHERE score IS NOT NULL
ORDER BY score DESC;

┌────────────────────┬─────────────┬──────────────────────────┬──────────────┬───────────────────────────────────
│ score │ line_id │ play_name │ speaker │ text_
entry │
│ double │ varchar │ varchar │ varchar │ varchar

671
DuckDB Documentation

├────────────────────┼─────────────┼──────────────────────────┼──────────────┼───────────────────────────────────
│ 4.427313429798464 │ H4/2.4.494 │ Henry IV │ Carrier │ As fat as butter.

│ 3.836270302568675 │ H4/1.2.21 │ Henry IV │ FALSTAFF │ prologue to an egg and
butter. │
│ 3.836270302568675 │ H4/2.1.55 │ Henry IV │ Chamberlain │ They are up already, and
call for eggs and butter; │
│ 3.3844488405497115 │ H4/4.2.21 │ Henry IV │ FALSTAFF │ toasts-and-butter, with
hearts in their bellies no │
│ 3.3844488405497115 │ H4/4.2.62 │ Henry IV │ PRINCE HENRY │ already made thee butter.
But tell me, Jack, whose │
│ 3.3844488405497115 │ AWW/4.1.40 │ Alls well that ends well │ PAROLLES │ butter-womans mouth and
buy myself another of │
│ 3.3844488405497115 │ AYLI/3.2.93 │ As you like it │ TOUCHSTONE │ right butter-womens rank
to market. │
│ 3.3844488405497115 │ KL/2.4.132 │ King Lear │ Fool │ kindness to his horse,
buttered his hay. │
│ 3.0278411214953107 │ AWW/5.2.9 │ Alls well that ends well │ Clown │ henceforth eat no fish of
fortunes buttering. │
│ 3.0278411214953107 │ MWW/2.2.260 │ Merry Wives of Windsor │ FALSTAFF │ Hang him, mechanical
salt-butter rogue! I will │
│ 3.0278411214953107 │ MWW/2.2.284 │ Merry Wives of Windsor │ FORD │ rather trust a Fleming
with my butter, Parson Hugh │
│ 3.0278411214953107 │ MWW/3.5.7 │ Merry Wives of Windsor │ FALSTAFF │ Ill have my brains taen
out and buttered, and give │
│ 3.0278411214953107 │ MWW/3.5.102 │ Merry Wives of Windsor │ FALSTAFF │ to heat as butter; a man
of continual dissolution │
│ 2.739219044070792 │ H4/2.4.115 │ Henry IV │ PRINCE HENRY │ Didst thou never see Titan
kiss a dish of butter? │
├────────────────────┴─────────────┴──────────────────────────┴──────────────┴───────────────────────────────────
│ 14 rows
5 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Unlike standard indexes, full‑text indexes don't auto‑update as the underlying data is changed, so you need to PRAGMA drop_fts_
index(my_fts_index) and recreate it when appropriate.

Note on Generating the Corpus Table

For more details, see the ”Generating a Shakespeare corpus for full‑text searching from JSON” blog post

• The Columns are: line_id, play_name, line_number, speaker, text_entry.


• We need a unique key for each row in order for full‑text searching to work.
• The line_id ”KL/2.4.132” means King Lear, Act 2, Scene 4, Line 132.

672
SQL Editors

DBeaver SQL IDE

DBeaver is a powerful and popular desktop sql editor and integrated development environment (IDE). It has both an open source and
enterprise version. It is useful for visually inspecting the available tables in DuckDB and for quickly building complex queries. DuckDB's
JDBC connector allows DBeaver to query DuckDB files, and by extension, any other files that DuckDB can access (like Parquet files).

Installing DBeaver

1. Install DBeaver using the download links and instructions found at their download page.

2. Open DBeaver and create a new connection. Either click on the ”New Database Connection” button or go to Database > New Database
Connection in the menu bar.

3. Search for DuckDB, select it, and click Next.

4. Enter the path or browse to the DuckDB database file you wish to query. To use an in‑memory DuckDB (useful primarily if just inter‑
ested in querying Parquet files, or for testing) enter :memory: as the path.

5. Click ”Test Connection”. This will then prompt you to install the DuckDB JDBC driver. If you are not prompted, see alternative driver
installation instructions below.

6. Click ”Download” to download DuckDB's JDBC driver from Maven. Once download is complete, click ”OK”, then click ”Finish”.

• Note: If you are in a corporate environment or behind a firewall, before clicking download, click the ”Download Configuration” link
to configure your proxy settings.

1. You should now see a database connection to your DuckDB database in the left hand ”Database Navigator” pane. Expand it to see
the tables and views in your database. Right click on that connection and create a new SQL script.

2. Write some SQL and click the ”Execute” button.

3. Now you're ready to fly with DuckDB and DBeaver!

Alternative Driver Installation

1. If not prompted to install the DuckDB driver when testing your connection, return to the ”Connect to a database” dialog and click
”Edit Driver Settings”.

2. (Alternate) You may also access the driver settings menu by returning to the main DBeaver window and clicking Database > Driver
Manager in the menu bar. Then select DuckDB, then click Edit.

3. Go to the ”Libraries” tab, then click on the DuckDB driver and click ”Download/Update”. If you do not see the DuckDB driver, first
click on ”Reset to Defaults”.

4. Click ”Download” to download DuckDB's JDBC driver from Maven. Once download is complete, click ”OK”, then return to the main
DBeaver window and continue with step 7 above.

• Note: If you are in a corporate environment or behind a firewall, before clicking download, click the ”Download Configuration”
link to configure your proxy settings.

673
DuckDB Documentation

674
Data Viewers

Tableau ‑ A Data Visualization Tool

Tableau is a popular commercial data visualization tool. In addition to a large number of built in connectors, it also provides generic
database connectivity via ODBC and JDBC connectors.

Tableau has two main versions: Desktop and Online (Server).

• For Desktop, connecting to a DuckDB database is similar to working in an embedded environment like Python.
• For Online, since DuckDB is in‑process, the data needs to be either on the server itself or in a remote data bucket that is accessible
from the server.

Database Creation

When using a DuckDB database file the data sets do not actually need to be imported into DuckDB tables; it suffices to create views of the
data. For example, this will create a view of the h2oai Parquet test file in the current DuckDB code base:

CREATE VIEW h2oai AS (


FROM read_parquet('/Users/username/duckdb/data/parquet-testing/h2oai/h2oai_group_small.parquet')
);

Note that you should use full path names to local files so that they can be found from inside Tableau. Also note that you will need to use a
version of the driver that is compatible (i.e., from the same release) as the database format used by the DuckDB tool (e.g., Python module,
command line) that was used to create the file.

Installing the JDBC Driver

Tableau provides documentation on how to install a JDBC driver for Tableau to use.

Note. Tableau (both Desktop and Server versions) need to be restarted any time you add or modify drivers.

Driver Links The link here is for a recent version of the JDBC driver that is compatible with Tableau. If you wish to connect to a database
file, you will need to make sure the file was created with a file‑compatible version of DuckDB. Also, check that there is only one version of
the driver installed as there are multiple filenames in use.

Download the snapshot jar

• macOS: Copy it to ~/Library/Tableau/Drivers/


• Windows: Copy it to C:\Program Files\Tableau\Drivers
• Linux: Copy it to /opt/tableau/tableau_driver/jdbc.

Using the PostgreSQL Dialect

If you just want to do something simple, you can try connecting directly to the JDBC driver and using Tableau‑provided PostgreSQL di‑
alect.

1. Create a DuckDB file containing your views and/or data.

675
DuckDB Documentation

2. Launch Tableau
3. Under Connect > To a Server > More… click on “Other Databases (JDBC)” This will bring up the connection dialogue box. For the URL,
enter jdbc:duckdb:/User/username/path/to/database.db. For the Dialect, choose PostgreSQL. The rest of the fields
can be ignored:

However, functionality will be missing such as median and percentile aggregate functions. To make the data source connection more
compatible with the PostgreSQL dialect, please use the DuckDB taco connector as described below.

Installing the Tableau DuckDB Connector

While it is possible to use the Tableau‑provided PostgreSQL dialect to communicate with the DuckDB JDBC driver, we strongly recommend
using the DuckDB ”taco” connector. This connector has been fully tested against the Tableau dialect generator and is more compatible
than the provided PostgreSQL dialect.

The documentation on how to install and use the connector is in its repository, but essentially you will need the duckdb_jdbc.taco
file. The current version of the Taco is not signed, so you will need to launch Tableau with signature validation disabled. (Despite what the
Tableau documentation says, the real security risk is in the JDBC driver code, not the small amount of JavaScript in the Taco.)

676
DuckDB Documentation

Server (Online) On Linux, copy the Taco file to /opt/tableau/connectors. On Windows, copy the Taco file to C:\Program
Files\Tableau\Connectors. Then issue these commands to disable signature validation:

$ tsm configuration set -k native_api.disable_verify_connector_plugin_signature -v true


$ tsm pending-changes apply

The last command will restart the server with the new settings.

macOS Copy the Taco file to the /Users/[User]/Documents/My Tableau Repository/Connectors folder. Then launch
Tableau Desktop from the Terminal with the command line argument to disable signature validation:

$ /Applications/Tableau\ Desktop\ year . quarter .app/Contents/MacOS/Tableau


-DDisableVerifyConnectorPluginSignature=true

You can also package this up with AppleScript by using the following script:

do shell script "\"/Applications/Tableau Desktop 2023.2.app/Contents/MacOS/Tableau\"


-DDisableVerifyConnectorPluginSignature=true"
quit

Create this file with the Script Editor (located in /Applications/Utilities) and save it as a packaged application:

You can then double‑click it to launch Tableau. You will need to change the application name in the script when you get upgrades.

Windows Desktop Copy the Taco file to the C:\Users\[Windows User]\Documents\My Tableau Repository\Connectors
directory. Then launch Tableau Desktop from a shell with the -DDisableVerifyConnectorPluginSignature=true argument
to disable signature validation.

677
DuckDB Documentation

Output

Once loaded, you can run queries against your data! Here is the result of the first H2O.ai benchmark query from the Parquet test file:

CLI Charting with YouPlot

DuckDB can be used with CLI graphing tools to quickly pipe input to stdout to graph your data in one line.

YouPlot is a Ruby‑based CLI tool for drawing visually pleasing plots on the terminal. It can accept input from other programs by piping
data from stdin. It takes tab‑separated (or delimiter of your choice) data and can easily generate various types of plots including bar, line,
histogram and scatter.

With DuckDB, you can write to the console (stdout) by using the TO '/dev/stdout' command. And you can also write comma‑
separated values by using WITH (FORMAT 'csv', HEADER).

Installing YouPlot

Installation instructions for YouPlot can be found on the main YouPlot repository. If you're on a Mac, you can use:

brew install youplot

Run uplot --help to ensure you've installed it successfully!

678
DuckDB Documentation

Piping DuckDB Queries to stdout

By combining the COPY...TO function with a CSV output file, data can be read from any format supported by DuckDB and piped to YouPlot.
There are three important steps to doing this.

1. As an example, this is how to read all data from input.json:

duckdb -s "SELECT * FROM read_json_auto('input.json')"

2. To prepare the data for YouPlot, write a simple aggregate:

duckdb -s "SELECT date, sum(purchases) AS total_purchases FROM read_json_auto('input.json') GROUP


BY 1 ORDER BY 2 DESC LIMIT 10"

3. Finally, wrap the SELECT in the COPY ... TO function with an output location of /dev/stdout.

The syntax looks like this:

COPY ( query ) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)

The full DuckDB command below outputs the query in CSV format with a header:

duckdb -s "COPY (SELECT date, sum(purchases) AS total_purchases FROM read_json_auto('input.json')


GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)"

Connecting DuckDB to YouPlot

Finally, the data can now be piped to YouPlot! Let's assume we have an input.json file with dates and number of purchases made by
somebody on that date. Using the query above, we'll pipe the data to the uplot command to draw a plot of the Top 10 Purchase Dates

duckdb -s "COPY (SELECT date, sum(purchases) AS total_purchases FROM read_json_auto('input.json') GROUP


BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)" | uplot bar -d, -H -t "Top
10 Purchase Dates"

This tells uplot to draw a bar plot, use a comma‑seperated delimiter (-d,), that the data has a header (-H), and give the plot a title (-t).

Bonus Round! stdin + stdout

Maybe you're piping some data through jq. Maybe you're downloading a JSON file from somewhere. You can also tell DuckDB to read the
data from another process by changing the filename to /dev/stdin.

Let's combine this with a quick curl from GitHub to see what a certain user has been up to lately.

679
DuckDB Documentation

curl -sL "https://api.github.com/users/dacort/events?per_page=100" \


| duckdb -s "COPY (SELECT type, count(*) AS event_count FROM read_json_auto('/dev/stdin') GROUP BY 1
ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)" \
| uplot bar -d, -H -t "GitHub Events for @dacort"

680
DuckDB Documentation

Under the Hood

681
Internals

Overview of DuckDB Internals

On this page is a brief description of the internals of the DuckDB engine.

Parser

The parser converts a query string into the following tokens:

• SQLStatement
• QueryNode
• TableRef
• ParsedExpression

The parser is not aware of the catalog or any other aspect of the database. It will not throw errors if tables do not exist, and will not resolve
any types of columns yet. It only transforms a query string into a set of tokens as specified.

ParsedExpression The ParsedExpression represents an expression within a SQL statement. This can be e.g. a reference to a column, an
addition operator or a constant value. The type of the ParsedExpression indicates what it represents, e.g. a comparison is represented as
a ComparisonExpression.

ParsedExpressions do not have types, except for nodes with explicit types such as CAST statements. The types for expressions are resolved
in the Binder, not in the Parser.

TableRef The TableRef represents any table source. This can be a reference to a base table, but it can also be a join, a table‑producing
function or a subquery.

QueryNode The QueryNode represents either (1) a SELECT statement, or (2) a set operation (i.e. UNION, INTERSECT or DIFFER-
ENCE).

SQL Statement The SQLStatement represents a complete SQL statement. The type of the SQL Statement represents what kind of state‑
ment it is (e.g. StatementType::SELECT represents a SELECT statement). A single SQL string can be transformed into multiple SQL
statements in case the original query string contains multiple queries.

Binder

The binder converts all nodes into their bound equivalents. In the binder phase:

• The tables and columns are resolved using the catalog


• Types are resolved
• Aggregate/window functions are extracted

The following conversions happen:

• SQLStatement ‑> BoundStatement

683
DuckDB Documentation

• QueryNode ‑> BoundQueryNode


• TableRef ‑> BoundTableRef
• ParsedExpression ‑> Expression

Logical Planner

The logical planner creates LogicalOperator nodes from the bound statements. In this phase, the actual logical query tree is created.

Optimizer

After the logical planner has created the logical query tree, the optimizers are run over that query tree to create an optimized query plan.
The following query optimizers are run:

• Expression Rewriter: Simplifies expressions, performs constant folding


• Filter Pushdown: Pushes filters down into the query plan and duplicates filters over equivalency sets. Also prunes subtrees that are
guaranteed to be empty (because of filters that statically evaluate to false).
• Join Order Optimizer: Reorders joins using dynamic programming. Specifically, the DPcpp algorithm from the paper Dynamic
Programming Strikes Back is used.
• Common Sub Expressions: Extracts common subexpressions from projection and filter nodes to prevent unnecessary duplicate
execution.
• In Clause Rewriter: Rewrites large static IN clauses to a MARK join or INNER join.

Column Binding Resolver

The column binding resolver converts logical BoundColumnRefExpresion nodes that refer to a column of a specific table into Bound-
ReferenceExpression nodes that refer to a specific index into the DataChunks that are passed around in the execution engine.

Physical Plan Generator

The physical plan generator converts the resulting logical operator tree into a PhysicalOperator tree.

Execution

In the execution phase, the physical operators are executed to produce the query result. The execution model is a vectorized volcano
model, where DataChunks are pulled from the root node of the physical operator tree. Each PhysicalOperator itself defines how it grants
its result. A PhysicalTableScan node will pull the chunk from the base tables on disk, whereas a PhysicalHashJoin will perform
a hash join between the output obtained from its child nodes.

Storage

Compatibility

Backward Compatibility Backward compatibility refers to the ability of a newer DuckDB version to read storage files created by an older
DuckDB version. Version 0.10 is the first release of DuckDB that supports backward compatibility in the storage format. DuckDB v0.10 can
read and operate on files created by the previous DuckDB version – DuckDB v0.9.

For future DuckDB versions, our goal is to ensure that any DuckDB version released after can read files created by previous versions, starting
from this release. We want to ensure that the file format is fully backward compatible. This allows you to keep data stored in DuckDB files
around and guarantees that you will be able to read the files without having to worry about which version the file was written with or having
to convert files between versions.

684
DuckDB Documentation

Forward Compatibility Forward compatibility refers to the ability of an older DuckDB version to read storage files produced by a newer
DuckDB version. DuckDB v0.9 is partially forward compatible with DuckDB v0.10. Certain files created by DuckDB v0.10 can be read by
DuckDB v0.9.

Forward compatibility is provided on a best effort basis. While stability of the storage format is important – there are still many improve‑
ments and innovations that we want to make to the storage format in the future. As such, forward compatibility may be (partially) broken
on occasion.

How to Move Between Storage Formats

When you update DuckDB and open an old database file, you might encounter an error message about incompatible storage formats,
pointing to this page. To move your database(s) to newer format you only need the older and the newer DuckDB executable.

Open your database file with the older DuckDB and run the SQL statement EXPORT DATABASE 'tmp'. This allows you to save the
whole state of the current database in use inside folder tmp. The content of the tmp folder will be overridden, so choose an empty/non yet
existing location. Then, start the newer DuckDB and execute IMPORT DATABASE 'tmp' (pointing to the previously populated folder)
to load the database, which can be then saved to the file you pointed DuckDB to.

A bash two‑liner (to be adapted with the file names and executable locations) is:

$ /older/version/duckdb mydata.db -c "EXPORT DATABASE 'tmp'"


$ /newer/duckdb mydata.new.db -c "IMPORT DATABASE 'tmp'"

After this mydata.db will be untouched with the old format, mydata.new.db will contain the same data but in a format accessible from
more recent DuckDB, and folder tmp will old the same data in an universal format as different files.

Check EXPORT documentation for more details on the syntax.

Storage Header

DuckDB files start with a uint64_t which contains a checksum for the main header, followed by four magic bytes (DUCK), followed by the
storage version number in a uint64_t.

$ hexdump -n 20 -C mydata.db
00000000 01 d0 e2 63 9c 13 39 3e 44 55 43 4b 2b 00 00 00 |...c..9>DUCK+...|
00000010 00 00 00 00 |....|
00000014

A simple example of reading the storage version using Python is below.

import struct

pattern = struct.Struct('<8x4sQ')

with open('test/sql/storage_version/storage_version.db', 'rb') as fh:


print(pattern.unpack(fh.read(pattern.size)))

Storage Version Table

For changes in each given release, check out the change log on GitHub. To see the commits that changed each storage version, see the
commit log.

Storage version DuckDB version(s)

64 v0.9.0, v0.9.1, v0.9.2, v0.10.0


51 v0.8.0, v0.8.1

685
DuckDB Documentation

Storage version DuckDB version(s)

43 v0.7.0, v0.7.1
39 v0.6.0, v0.6.1
38 v0.5.0, v0.5.1
33 v0.3.3, v0.3.4, v0.4.0
31 v0.3.2
27 v0.3.1
25 v0.3.0
21 v0.2.9
18 v0.2.8
17 v0.2.7
15 v0.2.6
13 v0.2.5
11 v0.2.4
6 v0.2.3
4 v0.2.2
1 v0.2.1 and prior

Compression

DuckDB uses lightweight compression. Note that compression is only applied to persistent databases and is not applied to in‑memory
instances.

Compression Algorithms The compression algorithms supported by DuckDB include the following:

• Constant Encoding
• Run‑Length Encoding (RLE)
• Bit Packing
• Frame of Reference (FOR)
• Dictionary Encoding
• Fast Static Symbol Table (FSST) – VLDB 2020 paper
• Adaptive Lossless Floating‑Point Compression (ALP) – SIGMOD 2024 paper
• Chimp – VLDB 2022 paper
• Patas

Disk Usage

The disk usage of DuckDB's format depends on a number of factors, including the data type and the data distribution, the compression
methods used, etc. As a rough approximation, loading 100 GB of uncompressed CSV files into a DuckDB database file will require 25 GB of
disk space, while loading 100 GB of Parquet files will require 120 GB of disk space.

Row Groups

DuckDB's storage format stores the data in row groups, i.e., horizontal partitions of the data. This concept is equivalent to Parquet's row
groups. Several features in DuckDB, including parallelism and compression are based on row groups.

686
DuckDB Documentation

Troubleshooting

Error Message When Opening an Incompatible Database File When opening a database file that has been written by a different DuckDB
version from the one you are using, the following error message may occur:

Error: unable to open database "...": Serialization Error: Failed to deserialize: ...

The message implies that the database file was created with a newer DuckDB version and uses features that are backward incompatible
with the DuckDB version used to read the file.

There are two potential workarounds:

1. Update your DuckDB version to the latest stable version.


2. Open the database with the latest version of DuckDB, export it to a standard format (e.g., Parquet), then import it using to any version
of DuckDB. See the EXPORT/IMPORT DATABASE statements for details.

Execution Format

Vector is the container format used to store in‑memory data during execution.
DataChunk is a collection of Vectors, used for instance to represent a column list in a PhysicalProjection operator.

Data Flow

DuckDB uses a vectorized query execution model.


All operators in DuckDB are optimized to work on Vectors of a fixed size.

This fixed size is commonly referred to in the code as STANDARD_VECTOR_SIZE.


The default STANDARD_VECTOR_SIZE is 2048 tuples.

Vector Format

Vectors logically represent arrays that contain data of a single type. DuckDB supports different vector formats, which allow the system to
store the same logical data with a different physical representation. This allows for a more compressed representation, and potentially
allows for compressed execution throughout the system. Below the list of supported vector formats is shown.

Flat Vectors Flat vectors are physically stored as a contiguous array, this is the standard uncompressed vector format. For flat vectors
the logical and physical representations are identical.

Constant Vectors Constant vectors are physically stored as a single constant value.

Constant vectors are useful when data elements are repeated ‑ for example, when representing the result of a constant expression in a
function call, the constant vector allows us to only store the value once.

SELECT lst || 'duckdb'


FROM range(1000) tbl(lst);

Since duckdb is a string literal, the value of the literal is the same for every row. In a flat vector, we would have to duplicate the literal
'duckdb' once for every row. The constant vector allows us to only store the literal once.

Constant vectors are also emitted by the storage when decompressing from constant compression.

687
DuckDB Documentation

Dictionary Vectors Dictionary vectors are physically stored as a child vector, and a selection vector that contains indexes into the child
vector.

Dictionary vectors are emitted by the storage when decompressing from dictionary

Just like constant vectors, dictionary vectors are also emitted by the storage.
When deserializing a dictionary compressed column segment, we store this in a dictionary vector so we can keep the data compressed
during query execution.

Sequence Vectors Sequence vectors are physically stored as an offset and an increment value.

Sequence vectors are useful for efficiently storing incremental sequences. They are generally emitted for row identifiers.

Unified Vector Format These properties of the different vector formats are great for optimization purposes, for example you can imagine
the scenario where all the parameters to a function are constant, we can just compute the result once and emit a constant vector.
But writing specialized code for every combination of vector types for every function is unfeasible due to the combinatorial explosion of
possibilities.

Instead of doing this, whenever you want to generically use a vector regardless of the type, the UnifiedVectorFormat can be used.
This format essentially acts as a generic view over the contents of the Vector. Every type of Vector can convert to this format.

Complex Types

String Vectors To efficiently store strings, we make use of our string_t class.

struct string_t {
union {
struct {
uint32_t length;
char prefix[4];
char *ptr;
} pointer;
struct {
uint32_t length;
char inlined[12];
} inlined;
} value;
};

Short strings (<= 12 bytes) are inlined into the structure, while larger strings are stored with a pointer to the data in the auxiliary string
buffer. The length is used throughout the functions to avoid having to call strlen and having to continuously check for null‑pointers. The
prefix is used for comparisons as an early out (when the prefix does not match, we know the strings are not equal and don't need to chase
any pointers).

List Vectors List vectors are stored as a series of list entries together with a child Vector. The child vector contains the values that are
present in the list, and the list entries specify how each individual list is constructed.

struct list_entry_t {
idx_t offset;
idx_t length;
};

The offset refers to the start row in the child Vector, the length keeps track of the size of the list of this row.

List vectors can be stored recursively. For nested list vectors, the child of a list vector is again a list vector.

For example, consider this mock representation of a Vector of type BIGINT[][]:

688
DuckDB Documentation

{
"type": "list",
"data": "list_entry_t",
"child": {
"type": "list",
"data": "list_entry_t",
"child": {
"type": "bigint",
"data": "int64_t"
}
}
}

Struct Vectors Struct vectors store a list of child vectors. The number and types of the child vectors is defined by the schema of the
struct.

Map Vectors Internally map vectors are stored as a LIST[STRUCT(key KEY_TYPE, value VALUE_TYPE)].

Union Vectors Internally UNION utilizes the same structure as a STRUCT. The first ”child” is always occupied by the Tag Vector of the
UNION, which records for each row which of the UNION's types apply to that row.

689
DuckDB Documentation

690
Developer Guides

Building DuckDB from Source


Note. DuckDB binaries are available for stable and nightly builds on the installation page. You should only build DuckDB under
specific circumstances, such as when running on a specific architecture or building an unmerged pull request.

Prerequisites

DuckDB needs CMake and a C++11‑compliant compiler (e.g., GCC, Apple‑Clang, MSVC). Additionally, we recommend using the Ninja build
system, which automatically parallelizes the build process.

Linux Packages Install the required packages with the package manager of your distribution.

Fedora, CentOS, and Red Hat:

sudo yum install -y git g++ cmake ninja-build openssl-devel

Ubuntu and Debian:

sudo apt-get update


sudo apt-get install -y git g++ cmake ninja-build libssl-dev

Alpine Linux:

apk add g++ git make cmake ninja

macOS Install Xcode and Homebrew. Then, install the required packages with:

brew install cmake ninja

Windows Consult the Windows CI workflow for a list of packages used to build DuckDB on Windows.

The DuckDB Python package requires the Microsoft Visual C++ Redistributable package to be built and to run.

Building DuckDB

To build DuckDB we use a Makefile which in turn calls into CMake. We also advise using Ninja as the generator for CMake.

GEN=ninja make

It is not advised to directly call CMake, as the Makefile sets certain variables that are crucial to properly building the package.

Build Type DuckDB can be built in many different settings, most of these correspond directly to CMake but not all of them.

release This build has been stripped of all the assertions and debug symbols and code, optimized for performance.

691
DuckDB Documentation

debug This build runs with all the debug information, including symbols, assertions and DEBUG blocks.
The special debug defines are not automatically set for this build however.

relassert This build does not trigger the #ifdef DEBUG code blocks, but still has debug symbols that make it possible to step
through the execution with line number information and D_ASSERT lines are still checked in this build.

reldebug This build is similar to relassert in many ways, only assertions are also stripped in this build.

benchmark This build is a shorthand for release with BUILD_BENCHMARK=1 set.

tidy-check This creates a build and then runs Clang‑Tidy to check for issues or style violations through static analysis.
The CI will also run this check, causing it to fail if this check fails.

format-fix | format-changes | format-main This doesn't actually create a build, but uses the following format checkers to
check for style issues:

• clang‑format to fix format issues in the code.

• cmake‑format to fix format issues in the CMakeLists.txt files.

The CI will also run this check, causing it to fail if this check fails.

Package Flags For every package that is maintained by core DuckDB, there exists a flag in the Makefile to enable building the package.
These can be enabled by either setting them in the current env, through set up files like bashrc or zshrc, or by setting them before the
call to make, for example:

BUILD_PYTHON=1 make debug

BUILD_PYTHON When this flag is set, the Python package is built.

BUILD_SHELL When this flag is set, the CLI is built, this is usually enabled by default.

BUILD_BENCHMARK When this flag is set, our in‑house Benchmark testing suite is built.
More information about this can be found here.

BUILD_JDBC When this flag is set, the Java package is built.

BUILD_ODBC When this flag is set, the ODBC package is built.

Extension Flags For every in‑tree extension that is maintained by core DuckDB there exists a flag to enable building and statically linking
the extension into the build.

BUILD_AUTOCOMPLETE When this flag is set, the autocomplete extension is built.

BUILD_ICU When this flag is set, the icu extension is built.

BUILD_TPCH When this flag is set, the tpch extension is built, this enables TPCH‑H data generation and query support using dbgen.

692
DuckDB Documentation

BUILD_TPCDS When this flag is set, the tpcds extension is built, this enables TPC‑DS data generation and query support using dsd-
gen.

BUILD_TPCE When this flag is set, the TPCE extension is built, unlike TPC‑H and TPC‑DS this does not enable data generation and query
support, but does enable tests for TPC‑E through our test suite.

BUILD_FTS When this flag is set, the fts (full text search) extension is built.

BUILD_HTTPFS When this flag is set, the httpfs extension is built.

BUILD_JSON When this flag is set, the json extension is built.

BUILD_INET When this flag is set, the inet extension is built.

BUILD_SQLSMITH When this flag is set, the SQLSmith extension is built.

Debug Flags

CRASH_ON_ASSERT D_ASSERT(condition) is used all throughout the code, these will throw an InternalException in debug builds.
With this flag enabled, when the assertion triggers it will instead directly cause a crash.

DISABLE_STRING_INLINE In our execution format string_t has the feature to ”inline” strings that are under a certain length (12
bytes), this means they don't require a separate allocation.
When this flag is set, we disable this and don't inline small strings.

DISABLE_MEMORY_SAFETY Our data structures that are used extensively throughout the non‑performance‑critical code have extra
checks to ensure memory safety, these checks include:

• Making sure nullptr is never dereferenced.

• Making sure index out of bounds accesses don't trigger a crash.

With this flag enabled we remove these checks, this is mostly done to check that the performance hit of these checks is negligible.

DESTROY_UNPINNED_BLOCKS When previously pinned blocks in the BufferManager are unpinned, with this flag enabled we destroy
them instantly to make sure that there aren't situations where this memory is still being used, despite not being pinned.

DEBUG_STACKTRACE When a crash or assertion hit occurs in a test, print a stack trace.
This is useful when debugging a crash that is hard to pinpoint with a debugger attached.

Miscellaneous Flags

DISABLE_UNITY To improve compilation time, we use Unity Build to combine translation units.
This can however hide include bugs, this flag disables using the unity build so these errors can be detected.

693
DuckDB Documentation

DISABLE_SANITIZER In some situations, running an executable that has been built with sanitizers enabled is not support / can cause
problems. Julia is an example of this.
With this flag enabled, the sanitizers are disabled for the build.

Building and Installing Extensions from Source

Extensions can be built from source and installed from the resulting local binary.

Using Build Flags Set the corresponding BUILD_[EXTENSION_NAME] extension flag when running the build, then use the INSTALL
command.

For example, to install the httpfs extension, run the following script:

GEN=ninja BUILD_HTTPFS=1 make


# for release builds
build/release/duckdb -c "INSTALL 'build/release/extension/httpfs/httpfs.duckdb_extension';"
# for debug builds
build/debug/duckdb -c "INSTALL 'build/debug/extension/httpfs/httpfs.duckdb_extension';"

Using a CMake Configuration File Create an extension configuration file named extension_config.cmake with e.g. the following
content:

duckdb_extension_load(autocomplete)
duckdb_extension_load(fts)
duckdb_extension_load(httpfs)
duckdb_extension_load(inet)
duckdb_extension_load(icu)
duckdb_extension_load(json)
duckdb_extension_load(parquet)

Build DuckDB as follows:

GEN=ninja EXTENSION_CONFIGS="extension_config.cmake" make

Then, to install the extensions in one go, run:

# for release builds


cd build/release/extension/
# for debug builds
cd build/debug/extension/
# install extensions
for EXTENSION in *; do
../duckdb -c "INSTALL '${EXTENSION}/${EXTENSION}.duckdb_extension';"
done

Troubleshooting

Building the R Package is Slow By default, R compiles packages using a single thread. To parallelize the compilation, create or edit the
~/.R/Makevars file, and add the following content:

MAKEFLAGS = -j8

Building the R Package on Linux AArch64 Building the R package on Linux running on an ARM64 architecture (AArch64) may result in
the following error message:

/usr/bin/ld: /usr/include/c++/10/bits/basic_string.tcc:206: warning: too many GOT entries for -fpic,


please recompile with -fPIC

694
DuckDB Documentation

To work around this, create or edit the ~/.R/Makevars file:

ALL_CXXFLAGS = $(PKG_CXXFLAGS) -fPIC $(SHLIB_CXXFLAGS) $(CXXFLAGS)

Building the httpfs Extension and Python Package on macOS Problem: The build fails on macOS when both the httpfs extension
and the Python package are included:

GEN=ninja BUILD_PYTHON=1 BUILD_HTTPFS=1 make

ld: library not found for -lcrypto


clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command '/usr/bin/clang++' failed with exit code 1
ninja: build stopped: subcommand failed.
make: *** [release] Error 1

Solution: In general, we recommended using the nightly builds, available under GitHub main on the installation page. If you would like
to build DuckDB from source, avoid using the BUILD_PYTHON=1 flag unless you are actively developing the Python library. Instead, first
build the httpfs extension (if required), then build and install the Python package separately using pip:

GEN=ninja BUILD_HTTPFS=1 make

If the next line complains about pybind11 being missing, or --use-pep517 not being supported, make sure you're using a modern version
of pip and setuptools. python3-pip on your OS may not be modern, so you may need to run python3 -m pip install pip -U
first.

python3 -m pip install tools/pythonpkg --use-pep517 --user

Building the httpfs Extension on Linux Problem: When building the httpfs extension on Linux, the build may fail with the following
error.

CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):


Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY
OPENSSL_INCLUDE_DIR)

Solution: Install the libssl-dev library.

sudo apt-get install -y libssl-dev


GEN=ninja BUILD_HTTPFS=1 make

Profiling

Profiling is important to help understand why certain queries exhibit specific performance characteristics. DuckDB contains several built‑in
features to enable query profiling that will be explained on this page.

For the examples on this page we will use the following example data set:

CREATE TABLE students (sid INTEGER PRIMARY KEY, name VARCHAR);


CREATE TABLE exams (cid INTEGER, sid INTEGER, grade INTEGER, PRIMARY KEY (cid, sid));

INSERT INTO students VALUES (1, 'Mark'), (2, 'Hannes'), (3, 'Pedro');
INSERT INTO exams VALUES (1, 1, 8), (1, 2, 8), (1, 3, 7), (2, 1, 9), (2, 2, 10);

695
DuckDB Documentation

EXPLAIN Statement

The first step to profiling a database engine is figuring out what execution plan the engine is using. The EXPLAIN statement allows you to
peek into the query plan and see what is going on under the hood.

The EXPLAIN statement displays the physical plan, i.e., the query plan that will get executed.

To demonstrate, see the below example:

CREATE TABLE students (name VARCHAR, sid INT);


CREATE TABLE exams (eid INT, subject VARCHAR, sid INT);
INSERT INTO students VALUES ('Mark', 1), ('Joe', 2), ('Matthew', 3);
INSERT INTO exams VALUES (10, 'Physics', 1), (20, 'Chemistry', 2), (30, 'Literature', 3);
EXPLAIN SELECT name FROM students JOIN exams USING (sid) WHERE name LIKE 'Ma%';

┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Physical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ name │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_JOIN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ INNER │
│ sid = sid ├──────────────┐
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │
│ EC: 1 │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ SEQ_SCAN ││ FILTER │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ exams ││ prefix(name, 'Ma') │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ sid ││ EC: 1 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ │
│ EC: 3 ││ │
└───────────────────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ students │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ sid │
│ name │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ Filters: name>=Ma AND name│
│ <Mb AND name IS NOT NULL │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 1 │
└───────────────────────────┘

Note that the query is not actually executed – therefore, we can only see the estimated cardinality (EC) for each operator, which is calculated
by using the statistics of the base tables and applying heuristics for each operator.

696
DuckDB Documentation

Run‑Time Profiling with the EXPLAIN ANALYZE Statement

The query plan helps understand the performance characteristics of the system. However, often it is also necessary to look at the perfor‑
mance numbers of individual operators and the cardinalities that pass through them. For this, you can create a query‑profile graph.

To create the query graphs it is first necessary to gather the necessary data by running the query. In order to do that, we must first enable
the run‑time profiling. This can be done by prefixing the query with EXPLAIN ANALYZE:

EXPLAIN ANALYZE
SELECT name
FROM students
JOIN exams USING (sid)
WHERE name LIKE 'Ma%';

┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Total Time: 0.0008s ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│ EXPLAIN_ANALYZE │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 0 │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ name │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 2 │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_JOIN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ INNER │
│ sid = sid │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ├──────────────┐
│ EC: 1 │ │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │
│ 2 │ │
│ (0.00s) │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ SEQ_SCAN ││ FILTER │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ exams ││ prefix(name, 'Ma') │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ sid ││ EC: 1 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 3 ││ 2 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ││ (0.00s) │
│ 3 ││ │
│ (0.00s) ││ │
└───────────────────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │

697
DuckDB Documentation

│ students │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ sid │
│ name │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ Filters: name>=Ma AND name│
│ <Mb AND name IS NOT NULL │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 1 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 2 │
│ (0.00s) │
└───────────────────────────┘

The output of EXPLAIN ANALYZE contains the estimated cardinality (EC), the actual cardinality, and the execution time for each opera‑
tor.

It is also possible to save the query plan to a file, e.g., in JSON format:

-- All queries performed will be profiled, with output in json format.


-- By default the result is still printed to stdout.
PRAGMA enable_profiling='json';
-- Instead of writing to stdout, write the profiling output to a specific file on disk.
-- This has no effect for `EXPLAIN ANALYZE` queries, which will *always* be
-- returned as query results.
PRAGMA profile_output='/path/to/file.json';

Note. This file is overwritten with each query that is issued. If you want to store the profile output for later it should be copied to a
different file.

Now let us run the query that we inspected before:

SELECT name
FROM students
JOIN exams USING (sid)
WHERE name LIKE 'Ma%';

After the query is completed, the JSON file containing the profiling output has been written to the specified file. We can then render the
query graph using the Python script, provided we have the duckdb python module installed. This script will generate a HTML file and open
it in your web browser.

python -m duckdb.query_graph /path/to/file.json

Notation in Query Plans

In query plans, the hash join operators adhere to the following convention: the probe side of the join is the left operand, while the build
side is the right operand.

Join operators in the query plan show the join type used:

• Inner joins are denoted as INNER.


• Left outer joins and right outer joins are denoted as LEFT and RIGHT, respectively.
• Full outer joins are denoted as FULL.

698
DuckDB Documentation

Testing

How is DuckDB Tested?

Testing is vital to make sure that DuckDB works properly and keeps working properly. For that reason, we put a large emphasis on thorough
and frequent testing:

• We run a batch of small tests on every commit using GitHub Actions, and run a more exhaustive batch of tests on pull requests and
commits in the master branch.

• We use a fuzzer, which automatically reports of issues found through fuzzing DuckDB.

• Writing Tests

• sqllogictest

• Debugging

• Result Verification

• Persistent Testing

• Loops

• Multiple Connections

• Catch

sqllogictest

For testing plain SQL, we use an extended version of the SQL logic test suite, adopted from SQLite. Every test is a single self‑contained file lo‑
cated in the test/sql directory. To run tests located outside of the default test directory, specify --test-dir <root_directory>
and make sure provided test file paths are relative to that root directory.

The test describes a series of SQL statements, together with either the expected result, a statement ok indicator, or a statement
error indicator. An example of a test file is shown below:

# name: test/sql/projection/test_simple_projection.test
# group [projection]

# enable query verification


statement ok
PRAGMA enable_verification

# create table
statement ok
CREATE TABLE a (i INTEGER, j INTEGER);

# insertion: 1 affected row


statement ok
INSERT INTO a VALUES (42, 84);

query II
SELECT * FROM a;
----
42 84

In this example, three statements are executed. The first statements are expected to succeed (prefixed by statement ok). The third
statement is expected to return a single row with two columns (indicated by query II). The values of the row are expected to be 42 and
84 (separated by a tab character). For more information on query result verification, see the result verification section.

699
DuckDB Documentation

The top of every file should contain a comment describing the name and group of the test. The name of the test is always the relative file
path of the file. The group is the folder that the file is in. The name and group of the test are relevant because they can be used to execute
only that test in the unittest group. For example, if we wanted to execute only the above test, we would run the command unittest
test/sql/projection/test_simple_projection.test. If we wanted to run all tests in a specific directory, we would run the
command unittest "[projection]".

Any tests that are placed in the test directory are automatically added to the test suite. Note that the extension of the test is significant.
The sqllogictests should either use the .test extension, or the .test_slow extension. The .test_slow extension indicates that the
test takes a while to run, and will only be run when all tests are explicitly run using unittest *. Tests with the extension .test will be
included in the fast set of tests.

Query Verification

Many simple tests start by enabling query verification. This can be done through the following PRAGMA statement:

statement ok
PRAGMA enable_verification

Query verification performs extra validation to ensure that the underlying code runs correctly. The most important part of that is that it
verifies that optimizers do not cause bugs in the query. It does this by running both an unoptimized and optimized version of the query,
and verifying that the results of these queries are identical.

Query verification is very useful because it not only discovers bugs in optimizers, but also finds bugs in e.g. join implementations. This
is because the unoptimized version will typically run using cross products instead. Because of this, query verification can be very slow to
do when working with larger data sets. It is therefore recommended to turn on query verification for all unit tests, except those involving
larger data sets (more than 10‑100~ rows).

Editors & Syntax Highlighting

The sqllogictests are not exactly an industry standard, but several other systems have adopted them as well. Parsing sqllogictests is inten‑
tionally simple. All statements have to be separated by empty lines. For that reason, writing a syntax highlighter is not extremely difficult.

A syntax highlighter exists for Visual Studio Code. We have also made a fork that supports the DuckDB dialect of the sqllogictests. You can
use the fork by installing the original, then copying the syntaxes/sqllogictest.tmLanguage.json into the installed extension
(on macOS this is located in ~/.vscode/extensions/benesch.sqllogictest-0.1.1).

A syntax highlighter is also available for CLion. It can be installed directly on the IDE by searching SQLTest on the marketplace. A GitHub
repository is also available, with extensions and bug reports being welcome.

Temporary Files For some tests (e.g., CSV/Parquet file format tests) it is necessary to create temporary files. Any temporary files should
be created in the temporary testing directory. This directory can be used by placing the string __TEST_DIR__ in a query. This string will
be replaced by the path of the temporary testing directory.

statement ok
COPY csv_data TO '__TEST_DIR__/output_file.csv.gz' (COMPRESSION GZIP);

Require & Extensions To avoid bloating the core system, certain functionality of DuckDB is available only as an extension. Tests can be
build for those extensions by adding a require field in the test. If the extension is not loaded, any statements that occurs after the require
field will be skipped. Examples of this are require parquet or require icu.

Another usage is to limit a test to a specific vector size. For example, adding require vector_size 512 to a test will prevent the
test from being run unless the vector size greater than or equal to 512. This is useful because certain functionality is not supported for low
vector sizes, but we run tests using a vector size of 2 in our CI.

700
DuckDB Documentation

sqllogictest ‑ Debugging

The purpose of the tests is to figure out when things break. Inevitably changes made to the system will cause one of the tests to fail, and
when that happens the test needs to be debugged.

First, it is always recommended to run in debug mode. This can be done by compiling the system using the command make debug.
Second, it is recommended to only run the test that breaks. This can be done by passing the filename of the breaking test to the test suite as a
command line parameter (e.g., build/debug/test/unittest test/sql/projection/test_simple_projection.test).
For more options on running a subset of the tests see the Triggering which tests to run section.

After that, a debugger can be attached to the program and the test can be debugged. In the sqllogictests it is normally difficult to break on
a specific query, however, we have expanded the test suite so that a function called query_break is called with the line number line as
parameter for every query that is run. This allows you to put a conditional breakpoint on a specific query. For example, if we want to break
on line number 43 of the test file we can create the following break point:

gdb: break query_break if line==43


lldb: break s -n query_break -c line==43

You can also skip certain queries from executing by placing mode skip in the file, followed by an optional mode unskip. Any queries
between the two statements will not be executed.

Triggering Which Tests to Run

When running the unittest program, by default all the fast tests are run. A specific test can be run by adding the name of the test as an
argument. For the sqllogictests, this is the relative path to the test file.

# run only a single test


build/debug/test/unittest test/sql/projection/test_simple_projection.test

All tests in a given directory can be executed by providing the directory as a parameter with square brackets.

# run all tests in the "projection" directory


build/debug/test/unittest "[projection]"

All tests, including the slow tests, can be run by running the tests with an asterisk.

# run all tests, including the slow tests


build/debug/test/unittest "*"

We can run a subset of the tests using the --start-offset and --end-offset parameters:

# run tests the tests 200..250


build/debug/test/unittest --start-offset=200 --end-offset=250

These are also available in percentages:

# run tests 10% - 20%


build/debug/test/unittest --start-offset-percentage=10 --end-offset-percentage=20

The set of tests to run can also be loaded from a file containing one test name per line, and loaded using the -f command.

$ cat test.list
test/sql/join/full_outer/test_full_outer_join_issue_4252.test
test/sql/join/full_outer/full_outer_join_cache.test
test/sql/join/full_outer/test_full_outer_join.test
# run only the tests labeled in the file
$ build/debug/test/unittest -f test.list

701
DuckDB Documentation

sqllogictest ‑ Result Verification

The standard way of verifying results of queries is using the query statement, followed by the letter I times the number of columns that
are expected in the result. After the query, four dashes (----) are expected followed by the result values separated by tabs. For example,

query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42 84
10 20

For legacy reasons the letters R and T are also accepted to denote columns.

NULL Values and Empty Strings

Empty lines have special significance for the SQLLogic test runner: they signify an end of the current statement or query. For that reason,
empty strings and NULL values have special syntax that must be used in result verification. NULL values should use the string NULL, and
empty strings should use the string (empty), e.g.:

query II
SELECT NULL, ''
----
NULL
(empty)

Error Verification

In order to signify that an error is expected, the statement error indicator can be used. The statement error also takes an
optional expected result ‑ which is interpreted as the expected error message. Similar to query, the expected error should be placed after
the four dashes (----) following the query. The test passes if the error message contains the text under statement error ‑ the entire
error message does not need to be provided. It is recommended that you only use a subset of the error message, so that the test does not
break unnecessarily if the formatting of error messages is changed.

statement error
SELECT * FROM non_existent_table;
----
Table with name non_existent_table does not exist!

Regex

In certain cases result values might be very large or complex, and we might only be interested in whether or not the result contains a snippet
of text. In that case, we can use the <REGEX>: modifier followed by a certain regex. If the result value matches the regex the test is passed.
This is primarily used for query plan analysis.

query II
EXPLAIN SELECT tbl.a FROM "data/parquet-testing/arrow/alltypes_plain.parquet" tbl(a) WHERE a=1 OR a=2
----
physical_plan <REGEX>:.*PARQUET_SCAN.*Filters: a=1 OR a=2.*

If we instead want the result not to contain a snippet of text, we can use the <!REGEX>: modifier.

File

As results can grow quite large, and we might want to re‑use results over multiple files, it is also possible to read expected results from files
using the <FILE> command. The expected result is read from the given file. As convention the file path should be provided as relative to
the root of the GitHub repository.

702
DuckDB Documentation

query I
PRAGMA tpch(1)
----
<FILE>:extension/tpch/dbgen/answers/sf1/q01.csv

Row‑Wise vs. Value‑Wise Result Ordering

The result values of a query can be either supplied in row‑wise order, with the individual values separated by tabs, or in value‑wise order.
In value wise order the individual values of the query must appear in row, column order each on an individual line. Consider the following
example in both row‑wise and value‑wise order:

# row-wise
query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42 84
10 20

# value-wise
query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42
84
10
20

Hashes and Outputting Values

Besides direct result verification, the sqllogic test suite also has the option of using MD5 hashes for value comparisons. A test using hashes
for result verification looks like this:

query I
SELECT g, string_agg(x,',') FROM strings GROUP BY g
----
200 values hashing to b8126ea73f21372cdb3f2dc483106a12

This approach is useful for reducing the size of tests when results have many output rows. However, it should be used sparingly, as hash
values make the tests more difficult to debug if they do break.

After it is ensured that the system outputs the correct result, hashes of the queries in a test file can be computed by adding mode output_
hash to the test file. For example:

mode output_hash

query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42 84
10 20

The expected output hashes for every query in the test file will then be printed to the terminal, as follows:

================================================================================
SQL Query
SELECT 42, 84 UNION ALL SELECT 10, 20;
================================================================================
4 values hashing to 498c69da8f30c24da3bd5b322a2fd455
================================================================================

703
DuckDB Documentation

In a similar manner, mode output_result can be used in order to force the program to print the result to the terminal for every query
run in the test file.

Result Sorting

Queries can have an optional field that indicates that the result should be sorted in a specific manner. This field goes in the same location
as the connection label. Because of that, connection labels and result sorting cannot be mixed.

The possible values of this field are nosort, rowsort and valuesort. An example of how this might be used is given below:

query I rowsort
SELECT 'world' UNION ALL SELECT 'hello'
----
hello
world

In general, we prefer not to use this field and rely on ORDER BY in the query to generate deterministic query answers. However, existing
sqllogictests use this field extensively, hence it is important to know of its existence.

Query Labels

Another feature that can be used for result verification are query labels. These can be used to verify that different queries provide the
same result. This is useful for comparing queries that are logically equivalent, but formulated differently. Query labels are provided after
the connection label or sorting specifier.

Queries that have a query label do not need to have a result provided. Instead, the results of each of the queries with the same label are
compared to each other. For example, the following script verifies that the queries SELECT 42+1 and SELECT 44-1 provide the same
result:

query I nosort r43


SELECT 42+1;
----

query I nosort r43


SELECT 44-1;
----

sqllogictest ‑ Persistent Testing

By default, all tests are run in in‑memory mode (unless --force-storage is enabled). In certain cases, we want to force the usage
of a persistent database. We can initiate a persistent database using the load command, and trigger a reload of the database using the
restart command.

# load the DB from disk


load __TEST_DIR__/storage_scan.db

statement ok
CREATE TABLE test (a INTEGER);

statement ok
INSERT INTO test VALUES (11), (12), (13), (14), (15), (NULL)

# ...

restart

query I

704
DuckDB Documentation

SELECT * FROM test ORDER BY a


----
NULL
11
12
13
14
15

Note that by default the tests run with SET wal_autocheckpoint='0KB' ‑ meaning a checkpoint is triggered after every statement.
WAL tests typically run with the following settings to disable this behavior:

statement ok
PRAGMA disable_checkpoint_on_shutdown

statement ok
PRAGMA wal_autocheckpoint='1TB';

sqllogictest ‑ Loops

Loops can be used in sqllogictests when it is required to execute the same query many times but with slight modifications in constant
values. For example, suppose we want to fire off 100 queries that check for the presence of the values 0..100 in a table:

# create the table integers with the values 0..100


statement ok
CREATE TABLE integers AS SELECT * FROM range(0, 100, 1) t1(i);

# verify individually that all 100 values are there


loop i 0 100

# execute the query, replacing the value


query I
SELECT COUNT(*) FROM integers WHERE i=${i};
----
1

# end the loop (note that multiple statements can be part of a loop)
endloop

Similarly, foreach can be used to iterate over a set of values.

foreach partcode millennium century decade year quarter month day hour minute second millisecond
microsecond epoch

query III
SELECT i, DATE_PART('${partcode}', i) AS p, DATE_PART(['${partcode}'], i) AS st
FROM intervals
WHERE p <> st['${partcode}'];
----

endloop

foreach also has a number of preset combinations that should be used when required. In this manner, when new combinations are
added to the preset, old tests will automatically pick up these new combinations.

Preset Expansion

<compression> none uncompressed rle bitpacking dictionary fsst chimp patas

705
DuckDB Documentation

Preset Expansion

<signed> tinyint smallint integer bigint hugeint


<unsigned> utinyint usmallint uinteger ubigint uhugeint
<integral> <signed> <unsigned>
<numeric> <integral> float double
<alltypes> <numeric> bool interval varchar json

Note. Use large loops sparingly. Executing hundreds of thousands of SQL statements will slow down tests unnecessarily. Do not
use loops for inserting data.

Data Generation without Loops

Loops should be used sparingly. While it might be tempting to use loops for inserting data using insert statements, this will considerably
slow down the test cases. Instead, it is better to generate data using the built‑in range and repeat functions.
-- create the table integers with the values [0, 1, .., 98, 99]
CREATE TABLE integers AS SELECT * FROM range(0, 100, 1) t1(i);

-- create the table strings with 100X the value "hello"


CREATE TABLE strings AS SELECT 'hello' AS s FROM range(0, 100, 1);

Using these two functions, together with clever use of cross products and other expressions, many different types of datasets can be effi‑
ciently generated. The RANDOM() function can also be used to generate random data.

An alternative option is to read data from an existing CSV or Parquet file. There are several large CSV files that can be loaded from the
directory test/sql/copy/csv/data/real using a COPY INTO statement or the read_csv_auto function.

The TPC‑H and TPC‑DS extensions can also be used to generate synthetic data, using e.g. CALL dbgen(sf=1) or CALL dsd-
gen(sf=1).

sqllogictest ‑ Multiple Connections

For tests whose purpose is to verify that the transactional management or versioning of data works correctly, it is generally necessary to
use multiple connections. For example, if we want to verify that the creation of tables is correctly transactional, we might want to start a
transaction and create a table in con1, then fire a query in con2 that checks that the table is not accessible yet until committed.

We can use multiple connections in the sqllogictests using connection labels. The connection label can be optionally appended to
any statement or query. All queries with the same connection label will be executed in the same connection. A test that would verify
the above property would look as follows:
statement ok con1
BEGIN TRANSACTION

statement ok con1
CREATE TABLE integers (i INTEGER);

statement error con2


SELECT * FROM integers;

Concurrent Connections

Using connection modifiers on the statement and queries will result in testing of multiple connections, but all the queries will still be run
sequentially on a single thread. If we want to run code from multiple connections concurrently over multiple threads, we can use the
concurrentloop construct. The queries in concurrentloop will be run concurrently on separate threads at the same time.

706
DuckDB Documentation

concurrentloop i 0 10

statement ok
CREATE TEMP TABLE t2 AS (SELECT 1);

statement ok
INSERT INTO t2 VALUES (42);

statement ok
DELETE FROM t2

endloop

One caveat with concurrentloop is that results are often unpredictable ‑ as multiple clients can hammer the database at the same
time we might end up with (expected) transaction conflicts. statement maybe can be used to deal with these situations. statement
maybe essentially accepts both a success, and a failure with a specific error message.

concurrentloop i 1 10

statement maybe
CREATE OR REPLACE TABLE t2 AS (SELECT -54124033386577348004002656426531535114 FROM t2 LIMIT 70%);
----
write-write conflict

endloop

Catch C/C++ Tests

While we prefer the sqllogic tests for testing most functionality, for certain tests only SQL is not sufficient. This typically happens when you
want to test the C++ API. When using pure SQL is really not an option it might be necessary to make a C++ test using Catch.

Catch tests reside in the test directory as well. Here is an example of a catch test that tests the storage of the system:

#include "catch.hpp"
#include "test_helpers.hpp"

TEST_CASE("Test simple storage", "[storage]") {


auto config = GetTestConfig();
unique_ptr<QueryResult> result;
auto storage_database = TestCreatePath("storage_test");

// make sure the database does not exist


DeleteDatabase(storage_database);
{
// create a database and insert values
DuckDB db(storage_database, config.get());
Connection con(db);
REQUIRE_NO_FAIL(con.Query("CREATE TABLE test (a INTEGER, b INTEGER);"));
REQUIRE_NO_FAIL(con.Query("INSERT INTO test VALUES (11, 22), (13, 22), (12, 21), (NULL, NULL)"));
REQUIRE_NO_FAIL(con.Query("CREATE TABLE test2 (a INTEGER);"));
REQUIRE_NO_FAIL(con.Query("INSERT INTO test2 VALUES (13), (12), (11)"));
}
// reload the database from disk a few times
for (idx_t i = 0; i < 2; i++) {
DuckDB db(storage_database, config.get());
Connection con(db);
result = con.Query("SELECT * FROM test ORDER BY a");
REQUIRE(CHECK_COLUMN(result, 0, {Value(), 11, 12, 13}));

707
DuckDB Documentation

REQUIRE(CHECK_COLUMN(result, 1, {Value(), 22, 21, 22}));


result = con.Query("SELECT * FROM test2 ORDER BY a");
REQUIRE(CHECK_COLUMN(result, 0, {11, 12, 13}));
}
DeleteDatabase(storage_database);
}

The test uses the TEST_CASE wrapper to create each test. The database is created and queried using the C++ API. Results are
checked using either REQUIRE_FAIL/REQUIRE_NO_FAIL (corresponding to statement ok and statement error) or REQUIRE(CHECK_
COLUMN(...)) (corresponding to query with a result check). Every test that is created in this way needs to be added to the corresponding
CMakeLists.txt.

708
DuckDB Documentation

Acknowledgments

709
DuckDB Documentation

This document is built with Pandoc using the Eisvogel template. The scripts to build the document are available in the DuckDB‑Web repos‑
itory.

The emojis used in this document are provided by Twemoji under the CC‑BY 4.0 license.

711

You might also like