Composing Programs 4.3 Declarative Programming
Composing Programs 4.3 Declarative Programming
3 Declarative Programming
In addition to streams, data values are often stored in large repositories called databases. A database
consists of a data store containing the data values along with an interface for retrieving and
4.1 Introduction
transforming those values. Each value stored in a database is called a record. Records with similar
structure are grouped into tables. Records are retrieved and transformed using queries, which are
statements in a query language. By far the most ubiquitous query language in use today is called
4.2 Implicit Sequences Structured Query Language or SQL (pronounced "sequel").
4.2.1 Iterators SQL is an example of a declarative programming language. Statements do not describe computations
4.2.2 Iterables directly, but instead describe the desired result of some computation. It is the role of the query
4.2.3 Built-in Iterators interpreter of the database system to design and perform a computational process to produce such a
4.2.4 For Statements result.
4.2.5 Generators and Yield
Statements This interaction differs substantially from the procedural programming paradigm of Python or
4.2.6 Iterable Interface Scheme. In Python, computational processes are described directly by the programmer. A declarative
4.2.7 Creating Iterables with Yield language abstracts away procedural details, instead focusing on the form of the result.
4.2.8 Iterator Interface
4.2.9 Streams 4.3.1 Tables
4.2.10 Python Streams
The SQL language is standardized, but most database systems implement some custom variant of
4.3 Declarative Programming the language that is endowed with proprietary features. In this text, we will describe a small subset of
SQL as it is implemented in Sqlite. You can follow along by downloading Sqlite or by using this online
4.3.1 Tables SQL interpreter.
4.3.2 Select Statements
4.3.3 Joins A table, also called a relation, has a fixed number of named and typed columns. Each row of a table
4.3.4 Interpreting SQL represents a data record and has one value for each column. For example, a table of cities might
4.3.5 Recursive Select have columns latitude longitude that both hold numeric values, as well as a column name that holds a
Statements string. Each row would represent a city location position by its latitude and longitude values.
4.3.6 Aggregation and Grouping
45 93 Minneapolis
4.5 Unification
The second line is the output, which includes one line per row with columns separated by a vertical
4.6 Distributed Computing bar.
https://www.composingprograms.com/pages/43-declarative-programming.html 1/11
2/1/24, 8:48 PM 4.3 Declarative Programming
4.6.1 Messages A multi-line table can be constructed by union, which combines the rows of two tables. The column
4.6.2 Client/Server Architecture names of the left table are used in the constructed table. Spacing within a line does not affect the
4.6.3 Peer-to-Peer Systems result.
A select statement defines a new table either by listing the values in a single row or, more commonly,
by projecting an existing table using a from clause:
select [column description] from [existing table name]
The columns of the resulting table are described by a comma-separated list of expressions that are
each evaluated for each row of the existing input table.
For example, we can create a two-column table that describes each city by how far north or south it is
of Berkeley. Each degree of latitude measures 60 nautical miles to the north.
Column descriptions are expressions in a language that shares many properties with Python: infix
operators such as + and %, built-in functions such as abs and round, and parentheses that describe
evaluation order. Names in these expressions, such as latitude above, evaluate to the column value
in the row being projected.
Optionally, each expression can be followed by the keyword as and a column name. When the entire
table is given a name, it is often helpful to give each column a name so that it can be referenced in
future select statements. Columns described by a simple name are named automatically.
https://www.composingprograms.com/pages/43-declarative-programming.html 2/11
2/1/24, 8:48 PM 4.3 Declarative Programming
Where Clauses. A select statement can also include a where clause with a filtering expression. This
expression filters the rows that are projected. Only a row for which the filtering expression evaluates
to a true value will be used to produce a row in the resulting table.
Order Clauses. A select statement can also express an ordering over the resulting table. An order
clause contains an ordering expression that is evaluated for each unfiltered row. The resulting values
of this expression are used as a sorting criterion for the result table.
The combination of these features allows a select statement to express a wide range of projections of
an input table into a related output table.
4.3.3 Joins
Databases typically contain multiple tables, and queries can require information contained within
different tables to compute a desired result. For instance, we may have a second table describing the
mean daily high temperature of different cities.
Data are combined by joining multiple tables together into one, a fundamental operation in database
systems. There are many methods of joining, all closely related, but we will focus on just one method
in this text. When tables are joined, the resulting table contains a new row for each combination of
rows in the input tables. If two tables are joined and the left table has m rows and the right table has n
rows, then the joined table will have m ⋅ n rows. Joins are expressed in SQL by separating table
names by commas in the from clause of a select statement.
https://www.composingprograms.com/pages/43-declarative-programming.html 3/11
2/1/24, 8:48 PM 4.3 Declarative Programming
45|93|Minneapolis|Chicago|59
45|93|Minneapolis|Minneapolis|55
Joins are typically accompanied by a where clause that expresses a relationship between the two
tables. For example, if we wanted to collect data into a table that would allow us to correlate latitude
and temperature, we would select rows from the join where the same city is mentioned in each.
Within the cities table, the city name is stored in a column called name. Within the temps table, the city
name is stored in a column called city. The where clause can select for rows in the joined table in
which these values are equal. In SQL, numeric equality is tested with a single = symbol.
sqlite> select name, latitude, temp from cities, temps where name = city;
Berkeley|38|68
Minneapolis|45|55
Tables may have overlapping column names, and so we need a method for disambiguating column
names by table. A table may also be joined with itself, and so we need a method for disambiguating
tables. To do so, SQL allows us to give aliases to tables within a from clause using the keyword as and
to refer to a column within a particular table using a dot expression. The following select statement
computes the temperature difference between pairs of unequal cities. The alphabetical ordering
constraint in the where clause ensures that each pair will only appear once in the result.
Our two means of combining tables in SQL, join and union, allow for a great deal of expressive power
in the language.
In order to create an interpreter for the subset of SQL we have introduced so far, we need to create a
representation for tables, a parser for statements written as text, and an evaluator for parsed
statements. The sql interpreter example includes all of these components, providing a simple but
functional demonstration of a declarative language interpreter.
In this implementation, each table has its own a class, and each row in a table is represented by an
instance of its table's class. A row has one attribute per column in the table, and a table is a sequence
of rows.
The class for a table is created using the namedtuple function in the collections package of the
Python standard library, which returns a new sub-class of tuple that gives names to each element in
the tuple.
Consider the cities table from the previous section, repeated below.
https://www.composingprograms.com/pages/43-declarative-programming.html 4/11
2/1/24, 8:48 PM 4.3 Declarative Programming
The result of a select statement can be interpreted using sequence operations. Consider the
distances table from the previous section, repeated below.
This table is generated from the name and latitude columns of the cities table. This resulting table can
be generated by mapping a function over the rows of the input table, a function that returns a
DistancesRow for each CitiesRow.
The design of our SQL interpreter generalizes this approach. A select statement is represented as an
instance of a class Select that is constructed from the clauses of the select statement.
https://www.composingprograms.com/pages/43-declarative-programming.html 5/11
2/1/24, 8:48 PM 4.3 Declarative Programming
The execute method joins input tables, filters and orders the resulting rows, then maps a function
called make_row over those resulting rows. The make_row function is created in the Select constructor by
a call to create_make_row, a higher-order function that creates a new class for the resulting table and
defines how to project an input row to an output row. (A version of this function with more error
handling and special cases appears in sql.)
Finally, we need to define the join function that creates the input rows. Given an env dictionary
contains existing tables (lists of rows) keyed by their name, the join function groups together all
combinations of rows in the input tables using the product function in the itertools package. It maps a
function called make_env over the joined rows, a function that converts each combination of rows into a
dictionary so that it can be used to evaluate expressions. (A version of this function with more error
handling and special cases appears in sql.)
Above, row._fields evaluates to the column names of the table containing the row. The _fields
attribute exists because the type of row is a namedtuple class.
Our interpreter is complete enough to execute select statements. For instance, we can compute the
latitude distance from Berkeley for all other cities, ordered by their longitude.
https://www.composingprograms.com/pages/43-declarative-programming.html 6/11
2/1/24, 8:48 PM 4.3 Declarative Programming
Row(name='Minneapolis', distance=300)
Row(name='Cambridge', distance=240)
We can also store this resulting table in the environment and join it with the cities table, retrieving the
longitude for each city.
The full sql example program also contains a simple parser for select statements, as well as execute
methods for create table and union. The interpreter can correctly execute all SQL statements included
within the text so far. While this simple interpreter only implements a small amount of the full
Structured Query Language, its structure demonstrates the relationship between sequence
processing operations and query languages.
Query Plans. Declarative languages describe the form of a result, but do not explicitly describe how
that result should be computed. This interpreter always joins, filters, orders, and then projects the
input rows in order to compute the result rows. However, more efficient ways to compute the same
result may exist, and query interpreters are free to choose among them. Choosing efficient
procedures for computing query results is a core feature of database systems.
For example, consider the final select statement above. Rather than computing the join of cities and
distances and then filtering the result, the same result may be computed by first sorting both tables by
the name column and then joining only the rows that have the same name in a linear pass through the
sorted tables. When tables are large, efficiency gains from query plan selection can be substantial.
Select statements can optionally include a with clause that generates and names additional tables
used in computing the final result. The full syntax of a select statement, not including unions, has the
following form:
with [tables] select [columns] from [names] where [condition] order by [order]
We have already demonstrated the allowed values for [columns] and [names]. [condition] and [order]
are expressions that can be evaluated for an input row. The [tables] portion is a comma-separated
https://www.composingprograms.com/pages/43-declarative-programming.html 7/11
2/1/24, 8:48 PM 4.3 Declarative Programming
sqlite> with
...> states(city, state) as (
...> select "Berkeley", "California" union
...> select "Boston", "Massachusetts" union
...> select "Cambridge", "Massachusetts" union
...> select "Chicago", "Illinois" union
...> select "Pasadena", "California"
...> )
...> select a.city, b.city, a.state from states as a, states as b
...> where a.state = b.state and a.city < b.city;
Berkeley|Pasadena|California
Boston|Cambridge|Massachusetts
A table defined within a with clause may have a single recursive case that defines output rows in
terms of other output rows. For example, the with clause below defines a table of integers from 5 to
15, of which the odd values are selected and squared.
sqlite> with
...> ints(n) as (
...> select 5 union
...> select n+1 from ints where n < 15
...> )
...> select n, n*n from ints where n % 2 = 1;
5|25
7|49
9|81
11|121
13|169
15|225
Multiple tables can be defined in a with clause, separated by commas. The example below computes
all Pythagorean triples from a table of integers, their squares, and the sums of pairs of squares. A
Pythagorean triple consists of integers a, b, and c such that a2 + b2 = c 2 .
sqlite> with
...> ints(n) as (
...> select 1 union select n+1 from ints where n < 20
...> ),
...> squares(x, xx) as (
...> select n, n*n from ints
...> ),
...> sum_of_squares(a, b, sum) as (
...> select a.x, b.x, a.xx + b.xx
...> from squares as a, squares as b where a.x < b.x
...> )
...> select a, b, x from squares, sum_of_squares where sum = xx;
3|4|5
6|8|10
5|12|13
9|12|15
https://www.composingprograms.com/pages/43-declarative-programming.html 8/11
2/1/24, 8:48 PM 4.3 Declarative Programming
8|15|17
12|16|20
Designing recursive queries involves ensuring that the appropriate information is available in each
input row to compute a result row. To compute Fibonacci numbers, for example, the input row needs
not only the current but also the previous element in order to compute the next element.
sqlite> with
...> fib(previous, current) as (
...> select 0, 1 union
...> select current, previous+current from fib
...> where current <= 100
...> )
...> select previous from fib;
0
1
1
2
3
5
8
13
21
34
55
89
These examples demonstrate that recursion is a powerful means of combination, even in declarative
languages.
Building strings. Two strings can be concatenated into a longer string using the || operator in SQL.
This feature can be used to construct sentences by concatenating phrases. For example, one way to
construct an English sentence is to concatenate a subject noun phrase, a verb, and an object noun
phrase.
https://www.composingprograms.com/pages/43-declarative-programming.html 9/11
2/1/24, 8:48 PM 4.3 Declarative Programming
As an exercise, use a recursive local table to generate sentences such as, "the dog that chased the
cat that chased the bird also chased the bird."
The select statements introduced so far can join, project, and manipulate individual rows. In addition,
a select statement can perform aggregation operations over multiple rows. The aggregate functions
max, min, count, and sum return the maximum, minimum, number, and sum of the values in a column.
Multiple aggregate functions can be applied to the same set of rows by defining more than one
column. Only columns that are included by the where clause are considered in the aggreagation.
The distinct keyword ensures that no repeated values in a column are included in the aggregation.
Only two distinct values of legs appear in the animals table. The special count(*) syntax counts the
number of rows.
Each of these select statements has produced a table with a single row. The group by and having
clauses of a select statement are used to partition rows into groups and select only a subset of the
groups. Any aggregate functions in the having clause or column description will apply to each group
independently, rather than the entire set of rows in the table.
For example, to compute the maximum weight of both a four-legged and a two-legged animal from
this table, the first statement below groups together dogs and cats as one group and birds as a
separate group. The result indicates that the maximum weight for a two-legged animal is 3 (the bird)
and for a four-legged animal is 20 (the dog). The second query lists the values in the legs column for
which there are at least two distinct names.
https://www.composingprograms.com/pages/43-declarative-programming.html 10/11
2/1/24, 8:48 PM 4.3 Declarative Programming
Multiple columns and full expressions can appear in the group by clause, and groups will be formed
for every unique combination of values that result. Typically, the expression used for grouping also
appears in the column description, so that it is easy to identify which result row resulted from each
group.
sqlite> select max(name) from animals group by legs, weight order by name;
bird
dog
ferret
penguin
t-rex
sqlite> select max(name), legs, weight from animals group by legs, weight
....> having max(weight) < 100;
bird|2|6
penguin|2|10
ferret|4|10
dog|4|20
sqlite> select count(*), weight/legs from animals group by weight/legs;
2|2
1|3
2|5
1|6000
A having clause can contain the same filtering as a where clause, but can also include calls to
aggregate functions. For the fastest execution and clearest use of the language, a condition that
filters individual rows based on their contents should appear in a where clause, while a having clause
should be used only when aggregation is required in the condition (such as specifying a minimum
count for a group).
When using a group by clause, column descriptions can contain expressions that do not aggregate. In
some cases, the SQL interpreter will choose the value from a row that corresponds to another column
that includes aggregation. For example, the following statement gives the name of an animal with
maximal weight.
However, whenever the row that corresponds to aggregation is unclear (for instance, when
aggregating with count instead of max), the value chosen may be arbitrary. For the clearest and most
predictable use of the language, a select statement that includes a group by clause should include at
least one aggregate column and only include non-aggregate columns if their contents is predictable
from the aggregation.
Continue: 4.4 Logic Programming
Composing Programs by John DeNero, based on the textbook Structure and Interpretation of Computer Programs by Harold Abelson and Gerald Jay Sussman, is licensed under a Creative
Commons Attribution-ShareAlike 3.0 Unported License.
https://www.composingprograms.com/pages/43-declarative-programming.html 11/11