summaryrefslogtreecommitdiff
path: root/doc/src/sgml/query.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/query.sgml')
-rw-r--r--doc/src/sgml/query.sgml899
1 files changed, 604 insertions, 295 deletions
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index 82c4ffe697..04fcce1985 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -1,102 +1,106 @@
<!--
-$Header: /cvsroot/pgsql/doc/src/sgml/query.sgml,v 1.17 2001/01/13 23:58:55 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/query.sgml,v 1.18 2001/09/02 23:27:49 petere Exp $
-->
- <chapter id="query">
- <title>The Query Language</title>
-
- <para>
- The <productname>Postgres</productname> query language is a variant of
- the <acronym>SQL</acronym> standard. It
- has many extensions to <acronym>SQL</acronym> such as an
- extensible type system,
- inheritance, functions and production rules. These are
- features carried over from the original
- <productname>Postgres</productname> query
- language, <productname>PostQuel</productname>.
- This section provides an overview
- of how to use <productname>Postgres</productname>
- <acronym>SQL</acronym> to perform simple operations.
- This manual is only intended to give you an idea of our
- flavor of <acronym>SQL</acronym> and is in no way a complete tutorial on
- <acronym>SQL</acronym>. Numerous books have been written on
- <acronym>SQL92</acronym>, including
- <xref linkend="MELT93" endterm="MELT93"> and
- <xref linkend="DATE97" endterm="DATE97">.
- You should be aware that some language features
- are extensions to the standard.
- </para>
-
- <sect1 id="query-psql">
- <title>Interactive Monitor</title>
-
- <para>
- In the examples that follow, we assume that you have
- created the mydb database as described in the previous
- subsection and have started <application>psql</application>.
- Examples in this manual can also be found in source distribution
- in the directory <filename>src/tutorial/</filename>. Refer to the
- <filename>README</filename> file in that directory for how to use them. To
- start the tutorial, do the following:
+ <chapter id="tutorial-sql">
+ <title>The <acronym>SQL</acronym> Language</title>
+
+ <sect1 id="tutorial-sql-intro">
+ <title>Introduction</title>
+
+ <para>
+ This chapter provides an overview of how to use
+ <acronym>SQL</acronym> to perform simple operations. This
+ tutorial is only intended to give you an introduction and is in no
+ way a complete tutorial on <acronym>SQL</acronym>. Numerous books
+ have been written on <acronym>SQL92</acronym>, including <xref
+ linkend="MELT93" endterm="MELT93"> and <xref linkend="DATE97"
+ endterm="DATE97">. You should be aware that some language
+ features are extensions to the standard.
+ </para>
+
+ <para>
+ In the examples that follow, we assume that you have created a
+ database named <quote>mydb</quote>, as described in the previous
+ chapter, and have started <application>psql</application>.
+ </para>
+
+ <para>
+ Examples in this manual can also be found in source distribution
+ in the directory <filename>src/tutorial/</filename>. Refer to the
+ <filename>README</filename> file in that directory for how to use
+ them. To start the tutorial, do the following:
<screen>
-<prompt>$</prompt> <userinput>cd <replaceable>...</replaceable>/src/tutorial</userinput>
+<prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
<prompt>$</prompt> <userinput>psql -s mydb</userinput>
<computeroutput>
-Welcome to the POSTGRESQL interactive sql monitor:
- Please read the file COPYRIGHT for copyright terms of POSTGRESQL
-
- type \? for help on slash commands
- type \q to quit
- type \g or terminate with semicolon to execute query
- You are currently connected to the database: postgres
+...
</computeroutput>
<prompt>mydb=&gt;</prompt> <userinput>\i basics.sql</userinput>
</screen>
+
+ The <literal>\i</literal> command reads in commands from the
+ specified files. The <literal>-s</literal> option puts you in
+ single step mode which pauses before sending a query to the
+ server. The commands used in this section are in the file
+ <filename>basics.sql</filename>.
</para>
+ </sect1>
+
+
+ <sect1 id="tutorial-concepts">
+ <title>Concepts</title>
<para>
- The <literal>\i</literal> command read in queries from the specified
- files. The <literal>-s</literal> option puts you in single step mode which
- pauses before sending a query to the backend. Queries
- in this section are in the file <filename>basics.sql</filename>.
+ <indexterm><primary>relational database</primary></indexterm>
+ <indexterm><primary>hierarchical database</primary></indexterm>
+ <indexterm><primary>object-oriented database</primary></indexterm>
+ <indexterm><primary>relation</primary></indexterm>
+ <indexterm><primary>table</primary></indexterm>
+
+ <productname>PostgreSQL</productname> is a <firstterm>relational
+ database management system</firstterm> (<acronym>RDBMS</acronym>).
+ That means it is a system for managing data stored in
+ <firstterm>relations</firstterm>. Relation is essentially a
+ mathematical term for <firstterm>table</firstterm>. The notion of
+ storing data in tables is so commonplace today that it might
+ seem inherently obvious, but there are a number of other ways of
+ organizing databases. Files and directories on Unix-like
+ operating systems form an example of a hierarchical database. A
+ more modern development is the object-oriented database.
</para>
<para>
- <application>psql</application>
- has a variety of <literal>\d</literal> commands for showing system information.
- Consult these commands for more details;
- for a listing, type <literal>\?</literal> at the <application>psql</application> prompt.
+ <indexterm><primary>row</primary></indexterm>
+ <indexterm><primary>column</primary></indexterm>
+
+ Each table is a named collection of <firstterm>rows</firstterm>.
+ Each row has the same set of named <firstterm>columns</firstterm>,
+ and each column is of a specific data type. Whereas columns have
+ a fixed order in each row, it is important to remember that SQL
+ does not guarantee the order of the rows within the table in any
+ way (unless they are explicitly sorted).
</para>
- </sect1>
-
- <sect1 id="query-concepts">
- <title>Concepts</title>
<para>
- The fundamental notion in <productname>Postgres</productname> is
- that of a <firstterm>table</firstterm>, which is a named
- collection of <firstterm>rows</firstterm>. Each row has the same
- set of named <firstterm>columns</firstterm>, and each column is of
- a specific type. Furthermore, each row has a permanent
- <firstterm>object identifier</firstterm> (<acronym>OID</acronym>)
- that is unique throughout the database cluster. Historially,
- tables have been called classes in
- <productname>Postgres</productname>, rows are object instances,
- and columns are attributes. This makes sense if you consider the
- object-relational aspects of the database system, but in this
- manual we will use the customary <acronym>SQL</acronym>
- terminology. As previously discussed,
- tables are grouped into databases, and a collection of databases
- managed by a single <application>postmaster</application> process
- constitutes a database cluster.
+ <indexterm><primary>cluster</primary></indexterm>
+
+ Tables are grouped into databases, and a collection of databases
+ managed by a single <productname>PostgreSQL</productname> server
+ instance constitutes a database <firstterm>cluster</firstterm>.
</para>
</sect1>
- <sect1 id="query-table">
+
+ <sect1 id="tutorial-table">
<title>Creating a New Table</title>
+ <indexterm zone="tutorial-table">
+ <primary>CREATE TABLE</primary>
+ </indexterm>
+
<para>
You can create a new table by specifying the table
name, along with all column names and their types:
@@ -110,39 +114,82 @@ CREATE TABLE weather (
date date
);
</programlisting>
+
+ You can enter this into <command>psql</command> with the line
+ breaks. <command>psql</command> will recognize that the command
+ is not terminated until the semicolon.
+ </para>
+
+ <para>
+ White space (i.e., spaces, tabs, and newlines) may be used freely
+ in SQL commands. That means you can type the command aligned
+ differently than above, or even all on one line. Two dashes
+ (<quote><literal>--</literal></quote>) introduce comments.
+ Whatever follows them is ignored up to the end of the line. SQL
+ is also case insensitive about key words and identifiers, except
+ when identifiers are double-quoted to preserve the case (not done
+ above).
+ </para>
+
+ <para>
+ <type>varchar(80)</type> specifies a data type that can store
+ arbitrary character strings up to 80 characters in length.
+ <type>int</type> is the normal integer type. <type>real</type> is
+ a type for storing single precision floating point numbers.
+ <type>date</type> should be self-explanatory. (Yes, the column of
+ type <type>date</type> is also named <literal>date</literal>.
+ This may be convenient or confusing -- you choose.)
</para>
<para>
- Note that both keywords and identifiers are case-insensitive;
- identifiers can preserve case by surrounding them with
- double-quotes as allowed
- by <acronym>SQL92</acronym>.
- <productname>Postgres</productname> <acronym>SQL</acronym>
- supports the usual
+ <productname>PostgreSQL</productname> supports the usual
<acronym>SQL</acronym> types <type>int</type>,
- <type>float</type>, <type>real</type>, <type>smallint</type>,
-<type>char(N)</type>,
- <type>varchar(N)</type>, <type>date</type>, <type>time</type>,
- and <type>timestamp</type>, as well as other types of general utility and
- a rich set of geometric types. As we will
- see later, <productname>Postgres</productname> can be customized
- with an
- arbitrary number of
- user-defined data types. Consequently, type names are
- not syntactical keywords, except where required to support special
- cases in the <acronym>SQL92</acronym> standard.
- So far, the <productname>Postgres</productname>
- <command>CREATE</command> command
- looks exactly like
- the command used to create a table in a traditional
- relational system. However, we will presently see that
- tables have properties that are extensions of the
- relational model.
+ <type>smallint</type>, <type>real</type>, <type>double
+ precision</type>, <type>char(<replaceable>N</>)</type>,
+ <type>varchar(<replaceable>N</>)</type>, <type>date</type>,
+ <type>time</type>, <type>timestamp</type>, and
+ <type>interval</type> as well as other types of general utility
+ and a rich set of geometric types.
+ <productname>PostgreSQL</productname> can be customized with an
+ arbitrary number of user-defined data types. Consequently, type
+ names are not syntactical keywords, except where required to
+ support special cases in the <acronym>SQL</acronym> standard.
+ </para>
+
+ <para>
+ The second example will store cities and their associated
+ geographical location:
+<programlisting>
+CREATE TABLE cities (
+ name varchar(80),
+ location point
+);
+</programlisting>
+ The <type>point</type> type is such a
+ <productname>PostgreSQL</productname>-specific data type.
+ </para>
+
+ <para>
+ <indexterm>
+ <primary>DROP TABLE</primary>
+ </indexterm>
+
+ Finally, it should be mentioned that if you don't need a table any
+ longer or want to recreate it differently you can remove it using
+ the following command:
+<synopsis>
+DROP TABLE <replaceable>tablename</replaceable>;
+</synopsis>
</para>
</sect1>
- <sect1 id="query-populate">
- <title>Populating a Table with Rows</title>
+
+ <sect1 id="tutorial-populate">
+ <title>Populating a Table With Rows</title>
+
+ <indexterm zone="tutorial-populate">
+ <primary>INSERT</primary>
+ </indexterm>
<para>
The <command>INSERT</command> statement is used to populate a table with
@@ -151,129 +198,184 @@ CREATE TABLE weather (
<programlisting>
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
</programlisting>
+
+ Note that all data types use rather obvious input formats. The
+ <type>date</type> column is actually quite flexible in what it
+ accepts, but for this tutorial we will stick to the unambiguous
+ format shown here.
</para>
<para>
- You can also use <command>COPY</command> to load large
- amounts of data from flat (<acronym>ASCII</acronym>) files.
- This is usually faster because the data is read (or written) as a
- single atomic
- transaction directly to or from the target table. An example would be:
+ The <type>point</type> type requires a coordinate pair as input,
+ as shown here:
+<programlisting>
+INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
+</programlisting>
+ </para>
+ <para>
+ The syntax used so far requires you to remember the order of the
+ columns. An alternative syntax allows you to list the columns
+ explicitly:
<programlisting>
-COPY weather FROM '/home/user/weather.txt' USING DELIMITERS '|';
+INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
+ VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
+</programlisting>
+ You can also list the columns in a different order if you wish or
+ even omit some columns, e.g., unknown precipitation:
+<programlisting>
+INSERT INTO weather (date, city, temp_hi, temp_lo)
+ VALUES ('1994-11-29', 'Hayward', 54, 37);
+</programlisting>
+ Many developers consider explicitly listing the columns better
+ style than relying on the order implicitly.
+ </para>
+
+ <para>
+ Please enter all the commands shown above so you have some data to
+ work with in the following sections.
+ </para>
+
+ <para>
+ <indexterm>
+ <primary>COPY</primary>
+ </indexterm>
+
+ You could also have used <command>COPY</command> to load large
+ amounts of data from flat text files. This is usually faster
+ because the <command>COPY</command> is optimized for this
+ application while allowing less flexibility than
+ <command>INSERT</command>. An example would be:
+
+<programlisting>
+COPY weather FROM '/home/user/weather.txt';
</programlisting>
where the path name for the source file must be available to the
- backend server
- machine, not the client, since the backend server reads the file directly.
+ backend server machine, not the client, since the backend server
+ reads the file directly. You can read more about the
+ <command>COPY</command> command in the <citetitle>Reference
+ Manual</citetitle>.
</para>
</sect1>
- <sect1 id="query-query">
+
+ <sect1 id="tutorial-select">
<title>Querying a Table</title>
<para>
- The <classname>weather</classname> table can be queried with normal relational
- selection and projection queries. A <acronym>SQL</acronym>
- <command>SELECT</command>
- statement is used to do this. The statement is divided into
- a target list (the part that lists the columns to be
- returned) and a qualification (the part that specifies
- any restrictions). For example, to retrieve all the
- rows of weather, type:
+ <indexterm><primary>query</primary></indexterm>
+ <indexterm><primary>SELECT</primary></indexterm>
+
+ To retrieve data from a table it is
+ <firstterm>queried</firstterm>. An <acronym>SQL</acronym>
+ <command>SELECT</command> statement is used to do this. The
+ statement is divided into a select list (the part that lists the
+ columns to be returned), a table list (the part that lists the
+ tables from which to retrieve the data), and an optional
+ qualification (the part that specifies any restrictions). For
+ example, to retrieve all the rows of
+ <classname>weather</classname>, type:
<programlisting>
SELECT * FROM weather;
</programlisting>
+ (where <literal>*</literal> means <quote>all columns</quote>) and
+ the output should be:
+<screen>
+ city | temp_lo | temp_hi | prcp | date
+---------------+---------+---------+------+------------
+ San Francisco | 46 | 50 | 0.25 | 1994-11-27
+ San Francisco | 43 | 57 | 0 | 1994-11-29
+ Hayward | 37 | 54 | | 1994-11-29
+(3 rows)
+</screen>
+ </para>
- and the output should be:
-<programlisting>
-+--------------+---------+---------+------+------------+
-|city | temp_lo | temp_hi | prcp | date |
-+--------------+---------+---------+------+------------+
-|San Francisco | 46 | 50 | 0.25 | 1994-11-27 |
-+--------------+---------+---------+------+------------+
-|San Francisco | 43 | 57 | 0 | 1994-11-29 |
-+--------------+---------+---------+------+------------+
-|Hayward | 37 | 54 | | 1994-11-29 |
-+--------------+---------+---------+------+------------+
-</programlisting>
- You may specify any arbitrary expressions in the target list. For
+ <para>
+ You may specify any arbitrary expressions in the target list. For
example, you can do:
<programlisting>
SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
</programlisting>
+ This should give:
+<screen>
+ city | temp_avg | date
+---------------+----------+------------
+ San Francisco | 48 | 1994-11-27
+ San Francisco | 50 | 1994-11-29
+ Hayward | 45 | 1994-11-29
+(3 rows)
+</screen>
+ Notice how the <literal>AS</literal> clause is used to relabel the
+ output column. (It is optional.)
</para>
<para>
- Arbitrary Boolean operators
- (<command>AND</command>, <command>OR</command> and
- <command>NOT</command>) are
- allowed in the qualification of any query. For example,
+ Arbitrary Boolean operators (<literal>AND</literal>,
+ <literal>OR</literal>, and <literal>NOT</literal>) are allowed in
+ the qualification of a query. For example, the following
+ retrieves the weather of San Francisco on rainy days:
<programlisting>
SELECT * FROM weather
WHERE city = 'San Francisco'
AND prcp > 0.0;
</programlisting>
-results in:
-<programlisting>
-+--------------+---------+---------+------+------------+
-|city | temp_lo | temp_hi | prcp | date |
-+--------------+---------+---------+------+------------+
-|San Francisco | 46 | 50 | 0.25 | 1994-11-27 |
-+--------------+---------+---------+------+------------+
-</programlisting>
+ Result:
+<screen>
+ city | temp_lo | temp_hi | prcp | date
+---------------+---------+---------+------+------------
+ San Francisco | 46 | 50 | 0.25 | 1994-11-27
+(1 row)
+</screen>
</para>
<para>
- As a final note, you can specify that the results of a
- select can be returned in a <firstterm>sorted order</firstterm>
- or with duplicate rows removed.
+ <indexterm><primary>ORDER BY</primary></indexterm>
+ <indexterm><primary>DISTINCT</primary></indexterm>
+ <indexterm><primary>duplicate</primary></indexterm>
+
+ As a final note, you can request that the results of a select can
+ be returned in sorted order or with duplicate rows removed. (Just
+ to make sure the following won't confuse you,
+ <literal>DISTINCT</literal> and <literal>ORDER BY</literal> can be
+ used separately.)
<programlisting>
SELECT DISTINCT city
FROM weather
ORDER BY city;
</programlisting>
- </para>
- </sect1>
-
- <sect1 id="query-selectinto">
- <title>Redirecting SELECT Queries</title>
-
- <para>
- Any <command>SELECT</command> query can be redirected to a new table
-<programlisting>
-SELECT * INTO TABLE temp FROM weather;
-</programlisting>
- </para>
- <para>
- This forms an implicit <command>CREATE</command> command, creating a new
- table temp with the column names and types specified
- in the target list of the <command>SELECT INTO</command> command. We can
- then, of course, perform any operations on the resulting
- table that we can perform on other tables.
+<screen>
+ city
+---------------
+ Hayward
+ San Francisco
+(2 rows)
+</screen>
</para>
</sect1>
- <sect1 id="query-join">
+
+ <sect1 id="tutorial-join">
<title>Joins Between Tables</title>
+ <indexterm zone="tutorial-join">
+ <primary>join</primary>
+ </indexterm>
+
<para>
- Thus far, our queries have only accessed one table at a
- time. Queries can access multiple tables at once, or
- access the same table in such a way that multiple
- rows of the table are being processed at the same
- time. A query that accesses multiple rows of the
- same or different tables at one time is called a join
- query.
- As an example, say we wish to find all the records that
- are in the temperature range of other records. In
- effect, we need to compare the temp_lo and temp_hi
- columns of each WEATHER row to the temp_lo and
- temp_hi columns of all other WEATHER columns.
+ Thus far, our queries have only accessed one table at a time.
+ Queries can access multiple tables at once, or access the same
+ table in such a way that multiple rows of the table are being
+ processed at the same time. A query that accesses multiple rows
+ of the same or different tables at one time is called a
+ <firstterm>join</firstterm> query. As an example, say you wish to
+ list all the weather records together with the location of the
+ associated city. In effect, we need to compare the city column of
+ each row of the weather table with the name column of all rows in
+ the cities table.
<note>
<para>
This is only a conceptual model. The actual join may
@@ -281,102 +383,189 @@ SELECT * INTO TABLE temp FROM weather;
to the user.
</para>
</note>
-
- We can do this with the following query:
+ This would be accomplished by the following query:
<programlisting>
-SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
- W2.city, W2.temp_lo AS low, W2.temp_hi AS high
- FROM weather W1, weather W2
- WHERE W1.temp_lo < W2.temp_lo
- AND W1.temp_hi > W2.temp_hi;
+SELECT *
+ FROM weather, cities
+ WHERE city = name;
+</programlisting>
-+--------------+-----+------+---------------+-----+------+
-|city | low | high | city | low | high |
-+--------------+-----+------+---------------+-----+------+
-|San Francisco | 43 | 57 | San Francisco | 46 | 50 |
-+--------------+-----+------+---------------+-----+------+
-|San Francisco | 37 | 54 | San Francisco | 46 | 50 |
-+--------------+-----+------+---------------+-----+------+
-</programlisting>
+<screen>
+ city | temp_lo | temp_hi | prcp | date | name | location
+---------------+---------+---------+------+------------+---------------+-----------
+ San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
+ San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
+(2 rows)
+</screen>
- <note>
- <para>
- The semantics of such a join are
- that the qualification
- is a truth expression defined for the Cartesian product of
- the tables indicated in the query. For those rows in
- the Cartesian product for which the qualification is true,
- <productname>Postgres</productname> computes and returns the
- values specified in the target list.
- <productname>Postgres</productname> <acronym>SQL</acronym>
- does not assign any meaning to
- duplicate values in such expressions.
- This means that <productname>Postgres</productname>
- sometimes recomputes the same target list several times;
- this frequently happens when Boolean expressions are connected
- with an "or". To remove such duplicates, you must use
- the <command>SELECT DISTINCT</command> statement.
- </para>
- </note>
</para>
<para>
- In this case, both <literal>W1</literal> and
- <literal>W2</literal> are surrogates for a
- row of the table weather, and both range over all
- rows of the table. (In the terminology of most
- database systems, <literal>W1</literal> and <literal>W2</literal>
- are known as <firstterm>range variables</firstterm>.)
- A query can contain an arbitrary number of
- table names and surrogates.
+ Observe two things about the result set:
+ <itemizedlist>
+ <listitem>
+ <para>
+ There is no result row for the city of Hayward. This is
+ because there is no matching entry in the
+ <classname>cities</classname> table for Hayward, so the join
+ cannot process the rows in the weather table. We will see
+ shortly how this can be fixed.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ There are two columns containing the city name. This is
+ correct because the lists of columns of the
+ <classname>weather</classname> and the
+ <classname>cities</classname> tables are concatenated. In
+ practice this is undesirable, though, so you will probably want
+ to list the output columns explicitly rather than using
+ <literal>*</literal>:
+<programlisting>
+SELECT city, temp_lo, temp_hi, prcp, date, location
+ FROM weather, cities
+ WHERE city = name;
+</programlisting>
+ </para>
+ </listitem>
+ </itemizedlist>
</para>
- </sect1>
- <sect1 id="query-update">
- <title>Updates</title>
+ <formalpara>
+ <title>Exercise:</title>
+
+ <para>
+ Attempt to find out the semantics of this query when the
+ <literal>WHERE</literal> clause is omitted.
+ </para>
+ </formalpara>
<para>
- You can update existing rows using the
- <command>UPDATE</command> command.
- Suppose you discover the temperature readings are
- all off by 2 degrees as of Nov 28, you may update the
- data as follow:
+ Since the columns all had different names, the parser
+ automatically found out which table they belong to, but it is good
+ style to fully qualify column names in join queries:
<programlisting>
-UPDATE weather
- SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
- WHERE date > '1994-11-28';
+SELECT weather.city, weather.temp_lo, weather.temp_hi, weather.prcp, weather.date, cities.location
+ FROM weather, cities
+ WHERE cities.name = weather.city;
</programlisting>
</para>
- </sect1>
-
- <sect1 id="query-delete">
- <title>Deletions</title>
<para>
- Deletions are performed using the <command>DELETE</command> command:
+ Join queries of the kind seen thus far can also be written in this
+ alternative form:
+
<programlisting>
-DELETE FROM weather WHERE city = 'Hayward';
+SELECT *
+ FROM weather INNER JOIN cities ON (weather.city = cities.name);
</programlisting>
- All weather recording belonging to Hayward are removed.
- One should be wary of queries of the form
+ This syntax is not as commonly used as the one above, but we show
+ it here to help you understand the following topics.
+ </para>
+
+ <para>
+ <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
+
+ Now we will figure out how we can get the Hayward records back in.
+ What we want the query to do is to scan the
+ <classname>weather</classname> table and for each row to find the
+ matching <classname>cities</classname> row. If no matching row is
+ found we want some <quote>empty values</quote> to be substituted
+ for the <classname>cities</classname> table's columns. This kind
+ of query is called an <firstterm>outer join</firstterm>. (The
+ joins we have seen to far are inner joins.) The command looks
+ like this:
+
<programlisting>
-DELETE FROM <replaceable>tablename</replaceable>;
+SELECT *
+ FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
+
+ city | temp_lo | temp_hi | prcp | date | name | location
+---------------+---------+---------+------+------------+---------------+-----------
+ Hayward | 37 | 54 | | 1994-11-29 | |
+ San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
+ San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
+(3 rows)
</programlisting>
- Without a qualification, <command>DELETE</command> will simply
- remove all rows from the given table, leaving it
- empty. The system will not request confirmation before
- doing this.
+ In particular, this query is a <firstterm>left outer
+ join</firstterm> because the table mentioned on the left of the
+ join operator will have each of its rows in the output at least
+ once, whereas the table on the right will only have those rows
+ output that match some row of the left table, and will have empty
+ values substituted appropriately.
+ </para>
+
+ <formalpara>
+ <title>Exercise:</title>
+
+ <para>
+ There are also right outer joins and full outer joins. Try to
+ find out what those do.
+ </para>
+ </formalpara>
+
+ <para>
+ <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
+ <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
+
+ We can also join a table against itself. This is called a
+ <firstterm>self join</firstterm>. As an example, suppose we wish
+ to find all the weather records that are in the temperature range
+ of other weather records. So we need to compare the
+ <structfield>temp_lo</> and <structfield>temp_hi</> columns of
+ each <classname>weather</classname> row to the
+ <structfield>temp_lo</structfield> and
+ <structfield>temp_hi</structfield> columns of all other
+ <classname>weather</classname> rows. We can do this with the
+ following query:
+
+<programlisting>
+SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
+ W2.city, W2.temp_lo AS low, W2.temp_hi AS high
+ FROM weather W1, weather W2
+ WHERE W1.temp_lo < W2.temp_lo
+ AND W1.temp_hi > W2.temp_hi;
+
+ city | low | high | city | low | high
+---------------+-----+------+---------------+-----+------
+ San Francisco | 43 | 57 | San Francisco | 46 | 50
+ Hayward | 37 | 54 | San Francisco | 46 | 50
+(2 rows)
+</programlisting>
+
+ Here we have relabeled the weather table as <literal>W1</> and
+ <literal>W2</> to be able to distinguish the left and right side
+ of the join. You can also use these kinds of aliases in other
+ queries to save some typing, e.g.:
+<programlisting>
+SELECT *
+ FROM weather w, cities c
+ WHERE w.city = c.name;
+</programlisting>
+ You will encounter this style of abbreviating quite frequently.
</para>
</sect1>
- <sect1 id="query-agg">
- <title>Using Aggregate Functions</title>
+
+ <sect1 id="tutorial-agg">
+ <title>Aggregate Functions</title>
+
+ <indexterm zone="tutorial-agg">
+ <primary>aggregate</primary>
+ </indexterm>
<para>
+ <indexterm><primary>average</primary></indexterm>
+ <indexterm><primary>count</primary></indexterm>
+ <indexterm><primary>max</primary></indexterm>
+ <indexterm><primary>min</primary></indexterm>
+ <indexterm><primary>sum</primary></indexterm>
+
Like most other relational database products,
<productname>PostgreSQL</productname> supports
aggregate functions.
@@ -388,94 +577,214 @@ DELETE FROM <replaceable>tablename</replaceable>;
</para>
<para>
- It is important to understand the interaction between aggregates and
- SQL's <command>WHERE</command> and <command>HAVING</command> clauses.
- The fundamental difference between <command>WHERE</command> and
- <command>HAVING</command> is this: <command>WHERE</command> selects
- input rows before groups and aggregates are computed (thus, it controls
- which rows go into the aggregate computation), whereas
- <command>HAVING</command> selects group rows after groups and
- aggregates are computed. Thus, the
- <command>WHERE</command> clause may not contain aggregate functions;
- it makes no sense to try to use an aggregate to determine which rows
- will be inputs to the aggregates. On the other hand,
- <command>HAVING</command> clauses always contain aggregate functions.
- (Strictly speaking, you are allowed to write a <command>HAVING</command>
- clause that doesn't use aggregates, but it's wasteful; the same condition
- could be used more efficiently at the <command>WHERE</command> stage.)
- </para>
-
- <para>
As an example, we can find the highest low-temperature reading anywhere
with
- <programlisting>
+<programlisting>
SELECT max(temp_lo) FROM weather;
- </programlisting>
+</programlisting>
+
+<screen>
+ max
+-----
+ 46
+(1 row)
+</screen>
+ </para>
+
+ <para>
+ <indexterm><primary>subquery</primary></indexterm>
If we want to know what city (or cities) that reading occurred in,
we might try
- <programlisting>
-SELECT city FROM weather WHERE temp_lo = max(temp_lo);
- </programlisting>
+<programlisting>
+SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>WRONG</lineannotation>
+</programlisting>
but this will not work since the aggregate
- <function>max</function> can't be used in
- <command>WHERE</command>. However, as is often the case the query can be
- restated to accomplish the intended result; here by using a
- <firstterm>subselect</firstterm>:
+ <function>max</function> cannot be used in the
+ <literal>WHERE</literal> clause. However, as is often the case
+ the query can be restated to accomplish the intended result; here
+ by using a <firstterm>subquery</firstterm>:
- <programlisting>
+<programlisting>
SELECT city FROM weather
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
- </programlisting>
+</programlisting>
+
+<screen>
+ city
+---------------
+ San Francisco
+(1 row)
+</screen>
- This is OK because the sub-select is an independent computation that
- computes its own aggregate separately from what's happening in the outer
- select.
+ This is OK because the sub-select is an independent computation
+ that computes its own aggregate separately from what is happening
+ in the outer select.
</para>
<para>
- Aggregates are also very useful in combination with
- <command>GROUP BY</command> clauses. For example, we can get the
- maximum low temperature observed in each city with
+ <indexterm><primary>GROUP BY</primary></indexterm>
+ <indexterm><primary>HAVING</primary></indexterm>
+
+ Aggregates are also very useful in combination with <literal>GROUP
+ BY</literal> clauses. For example, we can get the maximum low
+ temperature observed in each city with
- <programlisting>
+<programlisting>
SELECT city, max(temp_lo)
FROM weather
GROUP BY city;
- </programlisting>
+</programlisting>
+
+<screen>
+ city | max
+---------------+-----
+ Hayward | 37
+ San Francisco | 46
+(2 rows)
+</screen>
which gives us one output row per city. We can filter these grouped
- rows using <command>HAVING</command>:
+ rows using <literal>HAVING</literal>:
- <programlisting>
+<programlisting>
SELECT city, max(temp_lo)
FROM weather
GROUP BY city
- HAVING min(temp_lo) < 0;
- </programlisting>
+ HAVING max(temp_lo) < 40;
+</programlisting>
+
+<screen>
+ city | max
+---------+-----
+ Hayward | 37
+(1 row)
+</screen>
which gives us the same results for only the cities that have some
- below-zero readings. Finally, if we only care about cities whose
- names begin with "<literal>P</literal>", we might do
+ below-forty readings. Finally, if we only care about cities whose
+ names begin with <quote><literal>S</literal></quote>, we might do
- <programlisting>
+<programlisting>
SELECT city, max(temp_lo)
FROM weather
- WHERE city like 'P%'
+ WHERE city LIKE 'S%'
GROUP BY city
- HAVING min(temp_lo) < 0;
- </programlisting>
+ HAVING max(temp_lo) < 40;
+</programlisting>
+ </para>
- Note that we can apply the city-name restriction in
- <command>WHERE</command>, since it needs no aggregate. This is
- more efficient than adding the restriction to <command>HAVING</command>,
+ <para>
+ It is important to understand the interaction between aggregates and
+ SQL's <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
+ The fundamental difference between <literal>WHERE</literal> and
+ <literal>HAVING</literal> is this: <literal>WHERE</literal> selects
+ input rows before groups and aggregates are computed (thus, it controls
+ which rows go into the aggregate computation), whereas
+ <literal>HAVING</literal> selects group rows after groups and
+ aggregates are computed. Thus, the
+ <literal>WHERE</literal> clause must not contain aggregate functions;
+ it makes no sense to try to use an aggregate to determine which rows
+ will be inputs to the aggregates. On the other hand,
+ <literal>HAVING</literal> clauses always contain aggregate functions.
+ (Strictly speaking, you are allowed to write a <literal>HAVING</literal>
+ clause that doesn't use aggregates, but it's wasteful; the same condition
+ could be used more efficiently at the <literal>WHERE</literal> stage.)
+ </para>
+
+ <para>
+ Note that we can apply the city name restriction in
+ <literal>WHERE</literal>, since it needs no aggregate. This is
+ more efficient than adding the restriction to <literal>HAVING</literal>,
because we avoid doing the grouping and aggregate calculations
- for all rows that fail the <command>WHERE</command> check.
+ for all rows that fail the <literal>WHERE</literal> check.
+ </para>
+ </sect1>
+
+
+ <sect1 id="tutorial-update">
+ <title>Updates</title>
+
+ <indexterm zone="tutorial-update">
+ <primary>UPDATE</primary>
+ </indexterm>
+
+ <para>
+ You can update existing rows using the
+ <command>UPDATE</command> command.
+ Suppose you discover the temperature readings are
+ all off by 2 degrees as of November 28, you may update the
+ data as follow:
+
+<programlisting>
+UPDATE weather
+ SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
+ WHERE date > '1994-11-28';
+</programlisting>
+ </para>
+
+ <para>
+ Look at the new state of the data:
+<programlisting>
+SELECT * FROM weather;
+
+ city | temp_lo | temp_hi | prcp | date
+---------------+---------+---------+------+------------
+ San Francisco | 46 | 50 | 0.25 | 1994-11-27
+ San Francisco | 41 | 55 | 0 | 1994-11-29
+ Hayward | 35 | 52 | | 1994-11-29
+(3 rows)
+</programlisting>
</para>
</sect1>
+
+ <sect1 id="tutorial-delete">
+ <title>Deletions</title>
+
+ <indexterm zone="tutorial-delete">
+ <primary>DELETE</primary>
+ </indexterm>
+
+ <para>
+ Suppose you are no longer interested in the weather of Hayward,
+ then you can do the following to delete those rows from the table.
+ Deletions are performed using the <command>DELETE</command>
+ command:
+<programlisting>
+DELETE FROM weather WHERE city = 'Hayward';
+</programlisting>
+
+ All weather recording belonging to Hayward are removed.
+
+<programlisting>
+SELECT * FROM weather;
+</programlisting>
+
+<screen>
+ city | temp_lo | temp_hi | prcp | date
+---------------+---------+---------+------+------------
+ San Francisco | 46 | 50 | 0.25 | 1994-11-27
+ San Francisco | 41 | 55 | 0 | 1994-11-29
+(2 rows)
+</screen>
+ </para>
+
+ <para>
+ One should be wary of queries of the form
+<synopsis>
+DELETE FROM <replaceable>tablename</replaceable>;
+</synopsis>
+
+ Without a qualification, <command>DELETE</command> will simply
+ remove all rows from the given table, leaving it
+ empty. The system will not request confirmation before
+ doing this.
+ </para>
+ </sect1>
+
</chapter>
<!-- Keep this comment at the end of the file