diff options
Diffstat (limited to 'doc/src/sgml/query.sgml')
-rw-r--r-- | doc/src/sgml/query.sgml | 899 |
1 files changed, 604 insertions, 295 deletions
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml index 82c4ffe697..04fcce1985 100644 --- a/doc/src/sgml/query.sgml +++ b/doc/src/sgml/query.sgml @@ -1,102 +1,106 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/query.sgml,v 1.17 2001/01/13 23:58:55 petere Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/query.sgml,v 1.18 2001/09/02 23:27:49 petere Exp $ --> - <chapter id="query"> - <title>The Query Language</title> - - <para> - The <productname>Postgres</productname> query language is a variant of - the <acronym>SQL</acronym> standard. It - has many extensions to <acronym>SQL</acronym> such as an - extensible type system, - inheritance, functions and production rules. These are - features carried over from the original - <productname>Postgres</productname> query - language, <productname>PostQuel</productname>. - This section provides an overview - of how to use <productname>Postgres</productname> - <acronym>SQL</acronym> to perform simple operations. - This manual is only intended to give you an idea of our - flavor of <acronym>SQL</acronym> and is in no way a complete tutorial on - <acronym>SQL</acronym>. Numerous books have been written on - <acronym>SQL92</acronym>, including - <xref linkend="MELT93" endterm="MELT93"> and - <xref linkend="DATE97" endterm="DATE97">. - You should be aware that some language features - are extensions to the standard. - </para> - - <sect1 id="query-psql"> - <title>Interactive Monitor</title> - - <para> - In the examples that follow, we assume that you have - created the mydb database as described in the previous - subsection and have started <application>psql</application>. - Examples in this manual can also be found in source distribution - in the directory <filename>src/tutorial/</filename>. Refer to the - <filename>README</filename> file in that directory for how to use them. To - start the tutorial, do the following: + <chapter id="tutorial-sql"> + <title>The <acronym>SQL</acronym> Language</title> + + <sect1 id="tutorial-sql-intro"> + <title>Introduction</title> + + <para> + This chapter provides an overview of how to use + <acronym>SQL</acronym> to perform simple operations. This + tutorial is only intended to give you an introduction and is in no + way a complete tutorial on <acronym>SQL</acronym>. Numerous books + have been written on <acronym>SQL92</acronym>, including <xref + linkend="MELT93" endterm="MELT93"> and <xref linkend="DATE97" + endterm="DATE97">. You should be aware that some language + features are extensions to the standard. + </para> + + <para> + In the examples that follow, we assume that you have created a + database named <quote>mydb</quote>, as described in the previous + chapter, and have started <application>psql</application>. + </para> + + <para> + Examples in this manual can also be found in source distribution + in the directory <filename>src/tutorial/</filename>. Refer to the + <filename>README</filename> file in that directory for how to use + them. To start the tutorial, do the following: <screen> -<prompt>$</prompt> <userinput>cd <replaceable>...</replaceable>/src/tutorial</userinput> +<prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput> <prompt>$</prompt> <userinput>psql -s mydb</userinput> <computeroutput> -Welcome to the POSTGRESQL interactive sql monitor: - Please read the file COPYRIGHT for copyright terms of POSTGRESQL - - type \? for help on slash commands - type \q to quit - type \g or terminate with semicolon to execute query - You are currently connected to the database: postgres +... </computeroutput> <prompt>mydb=></prompt> <userinput>\i basics.sql</userinput> </screen> + + The <literal>\i</literal> command reads in commands from the + specified files. The <literal>-s</literal> option puts you in + single step mode which pauses before sending a query to the + server. The commands used in this section are in the file + <filename>basics.sql</filename>. </para> + </sect1> + + + <sect1 id="tutorial-concepts"> + <title>Concepts</title> <para> - The <literal>\i</literal> command read in queries from the specified - files. The <literal>-s</literal> option puts you in single step mode which - pauses before sending a query to the backend. Queries - in this section are in the file <filename>basics.sql</filename>. + <indexterm><primary>relational database</primary></indexterm> + <indexterm><primary>hierarchical database</primary></indexterm> + <indexterm><primary>object-oriented database</primary></indexterm> + <indexterm><primary>relation</primary></indexterm> + <indexterm><primary>table</primary></indexterm> + + <productname>PostgreSQL</productname> is a <firstterm>relational + database management system</firstterm> (<acronym>RDBMS</acronym>). + That means it is a system for managing data stored in + <firstterm>relations</firstterm>. Relation is essentially a + mathematical term for <firstterm>table</firstterm>. The notion of + storing data in tables is so commonplace today that it might + seem inherently obvious, but there are a number of other ways of + organizing databases. Files and directories on Unix-like + operating systems form an example of a hierarchical database. A + more modern development is the object-oriented database. </para> <para> - <application>psql</application> - has a variety of <literal>\d</literal> commands for showing system information. - Consult these commands for more details; - for a listing, type <literal>\?</literal> at the <application>psql</application> prompt. + <indexterm><primary>row</primary></indexterm> + <indexterm><primary>column</primary></indexterm> + + Each table is a named collection of <firstterm>rows</firstterm>. + Each row has the same set of named <firstterm>columns</firstterm>, + and each column is of a specific data type. Whereas columns have + a fixed order in each row, it is important to remember that SQL + does not guarantee the order of the rows within the table in any + way (unless they are explicitly sorted). </para> - </sect1> - - <sect1 id="query-concepts"> - <title>Concepts</title> <para> - The fundamental notion in <productname>Postgres</productname> is - that of a <firstterm>table</firstterm>, which is a named - collection of <firstterm>rows</firstterm>. Each row has the same - set of named <firstterm>columns</firstterm>, and each column is of - a specific type. Furthermore, each row has a permanent - <firstterm>object identifier</firstterm> (<acronym>OID</acronym>) - that is unique throughout the database cluster. Historially, - tables have been called classes in - <productname>Postgres</productname>, rows are object instances, - and columns are attributes. This makes sense if you consider the - object-relational aspects of the database system, but in this - manual we will use the customary <acronym>SQL</acronym> - terminology. As previously discussed, - tables are grouped into databases, and a collection of databases - managed by a single <application>postmaster</application> process - constitutes a database cluster. + <indexterm><primary>cluster</primary></indexterm> + + Tables are grouped into databases, and a collection of databases + managed by a single <productname>PostgreSQL</productname> server + instance constitutes a database <firstterm>cluster</firstterm>. </para> </sect1> - <sect1 id="query-table"> + + <sect1 id="tutorial-table"> <title>Creating a New Table</title> + <indexterm zone="tutorial-table"> + <primary>CREATE TABLE</primary> + </indexterm> + <para> You can create a new table by specifying the table name, along with all column names and their types: @@ -110,39 +114,82 @@ CREATE TABLE weather ( date date ); </programlisting> + + You can enter this into <command>psql</command> with the line + breaks. <command>psql</command> will recognize that the command + is not terminated until the semicolon. + </para> + + <para> + White space (i.e., spaces, tabs, and newlines) may be used freely + in SQL commands. That means you can type the command aligned + differently than above, or even all on one line. Two dashes + (<quote><literal>--</literal></quote>) introduce comments. + Whatever follows them is ignored up to the end of the line. SQL + is also case insensitive about key words and identifiers, except + when identifiers are double-quoted to preserve the case (not done + above). + </para> + + <para> + <type>varchar(80)</type> specifies a data type that can store + arbitrary character strings up to 80 characters in length. + <type>int</type> is the normal integer type. <type>real</type> is + a type for storing single precision floating point numbers. + <type>date</type> should be self-explanatory. (Yes, the column of + type <type>date</type> is also named <literal>date</literal>. + This may be convenient or confusing -- you choose.) </para> <para> - Note that both keywords and identifiers are case-insensitive; - identifiers can preserve case by surrounding them with - double-quotes as allowed - by <acronym>SQL92</acronym>. - <productname>Postgres</productname> <acronym>SQL</acronym> - supports the usual + <productname>PostgreSQL</productname> supports the usual <acronym>SQL</acronym> types <type>int</type>, - <type>float</type>, <type>real</type>, <type>smallint</type>, -<type>char(N)</type>, - <type>varchar(N)</type>, <type>date</type>, <type>time</type>, - and <type>timestamp</type>, as well as other types of general utility and - a rich set of geometric types. As we will - see later, <productname>Postgres</productname> can be customized - with an - arbitrary number of - user-defined data types. Consequently, type names are - not syntactical keywords, except where required to support special - cases in the <acronym>SQL92</acronym> standard. - So far, the <productname>Postgres</productname> - <command>CREATE</command> command - looks exactly like - the command used to create a table in a traditional - relational system. However, we will presently see that - tables have properties that are extensions of the - relational model. + <type>smallint</type>, <type>real</type>, <type>double + precision</type>, <type>char(<replaceable>N</>)</type>, + <type>varchar(<replaceable>N</>)</type>, <type>date</type>, + <type>time</type>, <type>timestamp</type>, and + <type>interval</type> as well as other types of general utility + and a rich set of geometric types. + <productname>PostgreSQL</productname> can be customized with an + arbitrary number of user-defined data types. Consequently, type + names are not syntactical keywords, except where required to + support special cases in the <acronym>SQL</acronym> standard. + </para> + + <para> + The second example will store cities and their associated + geographical location: +<programlisting> +CREATE TABLE cities ( + name varchar(80), + location point +); +</programlisting> + The <type>point</type> type is such a + <productname>PostgreSQL</productname>-specific data type. + </para> + + <para> + <indexterm> + <primary>DROP TABLE</primary> + </indexterm> + + Finally, it should be mentioned that if you don't need a table any + longer or want to recreate it differently you can remove it using + the following command: +<synopsis> +DROP TABLE <replaceable>tablename</replaceable>; +</synopsis> </para> </sect1> - <sect1 id="query-populate"> - <title>Populating a Table with Rows</title> + + <sect1 id="tutorial-populate"> + <title>Populating a Table With Rows</title> + + <indexterm zone="tutorial-populate"> + <primary>INSERT</primary> + </indexterm> <para> The <command>INSERT</command> statement is used to populate a table with @@ -151,129 +198,184 @@ CREATE TABLE weather ( <programlisting> INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27'); </programlisting> + + Note that all data types use rather obvious input formats. The + <type>date</type> column is actually quite flexible in what it + accepts, but for this tutorial we will stick to the unambiguous + format shown here. </para> <para> - You can also use <command>COPY</command> to load large - amounts of data from flat (<acronym>ASCII</acronym>) files. - This is usually faster because the data is read (or written) as a - single atomic - transaction directly to or from the target table. An example would be: + The <type>point</type> type requires a coordinate pair as input, + as shown here: +<programlisting> +INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)'); +</programlisting> + </para> + <para> + The syntax used so far requires you to remember the order of the + columns. An alternative syntax allows you to list the columns + explicitly: <programlisting> -COPY weather FROM '/home/user/weather.txt' USING DELIMITERS '|'; +INSERT INTO weather (city, temp_lo, temp_hi, prcp, date) + VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29'); +</programlisting> + You can also list the columns in a different order if you wish or + even omit some columns, e.g., unknown precipitation: +<programlisting> +INSERT INTO weather (date, city, temp_hi, temp_lo) + VALUES ('1994-11-29', 'Hayward', 54, 37); +</programlisting> + Many developers consider explicitly listing the columns better + style than relying on the order implicitly. + </para> + + <para> + Please enter all the commands shown above so you have some data to + work with in the following sections. + </para> + + <para> + <indexterm> + <primary>COPY</primary> + </indexterm> + + You could also have used <command>COPY</command> to load large + amounts of data from flat text files. This is usually faster + because the <command>COPY</command> is optimized for this + application while allowing less flexibility than + <command>INSERT</command>. An example would be: + +<programlisting> +COPY weather FROM '/home/user/weather.txt'; </programlisting> where the path name for the source file must be available to the - backend server - machine, not the client, since the backend server reads the file directly. + backend server machine, not the client, since the backend server + reads the file directly. You can read more about the + <command>COPY</command> command in the <citetitle>Reference + Manual</citetitle>. </para> </sect1> - <sect1 id="query-query"> + + <sect1 id="tutorial-select"> <title>Querying a Table</title> <para> - The <classname>weather</classname> table can be queried with normal relational - selection and projection queries. A <acronym>SQL</acronym> - <command>SELECT</command> - statement is used to do this. The statement is divided into - a target list (the part that lists the columns to be - returned) and a qualification (the part that specifies - any restrictions). For example, to retrieve all the - rows of weather, type: + <indexterm><primary>query</primary></indexterm> + <indexterm><primary>SELECT</primary></indexterm> + + To retrieve data from a table it is + <firstterm>queried</firstterm>. An <acronym>SQL</acronym> + <command>SELECT</command> statement is used to do this. The + statement is divided into a select list (the part that lists the + columns to be returned), a table list (the part that lists the + tables from which to retrieve the data), and an optional + qualification (the part that specifies any restrictions). For + example, to retrieve all the rows of + <classname>weather</classname>, type: <programlisting> SELECT * FROM weather; </programlisting> + (where <literal>*</literal> means <quote>all columns</quote>) and + the output should be: +<screen> + city | temp_lo | temp_hi | prcp | date +---------------+---------+---------+------+------------ + San Francisco | 46 | 50 | 0.25 | 1994-11-27 + San Francisco | 43 | 57 | 0 | 1994-11-29 + Hayward | 37 | 54 | | 1994-11-29 +(3 rows) +</screen> + </para> - and the output should be: -<programlisting> -+--------------+---------+---------+------+------------+ -|city | temp_lo | temp_hi | prcp | date | -+--------------+---------+---------+------+------------+ -|San Francisco | 46 | 50 | 0.25 | 1994-11-27 | -+--------------+---------+---------+------+------------+ -|San Francisco | 43 | 57 | 0 | 1994-11-29 | -+--------------+---------+---------+------+------------+ -|Hayward | 37 | 54 | | 1994-11-29 | -+--------------+---------+---------+------+------------+ -</programlisting> - You may specify any arbitrary expressions in the target list. For + <para> + You may specify any arbitrary expressions in the target list. For example, you can do: <programlisting> SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather; </programlisting> + This should give: +<screen> + city | temp_avg | date +---------------+----------+------------ + San Francisco | 48 | 1994-11-27 + San Francisco | 50 | 1994-11-29 + Hayward | 45 | 1994-11-29 +(3 rows) +</screen> + Notice how the <literal>AS</literal> clause is used to relabel the + output column. (It is optional.) </para> <para> - Arbitrary Boolean operators - (<command>AND</command>, <command>OR</command> and - <command>NOT</command>) are - allowed in the qualification of any query. For example, + Arbitrary Boolean operators (<literal>AND</literal>, + <literal>OR</literal>, and <literal>NOT</literal>) are allowed in + the qualification of a query. For example, the following + retrieves the weather of San Francisco on rainy days: <programlisting> SELECT * FROM weather WHERE city = 'San Francisco' AND prcp > 0.0; </programlisting> -results in: -<programlisting> -+--------------+---------+---------+------+------------+ -|city | temp_lo | temp_hi | prcp | date | -+--------------+---------+---------+------+------------+ -|San Francisco | 46 | 50 | 0.25 | 1994-11-27 | -+--------------+---------+---------+------+------------+ -</programlisting> + Result: +<screen> + city | temp_lo | temp_hi | prcp | date +---------------+---------+---------+------+------------ + San Francisco | 46 | 50 | 0.25 | 1994-11-27 +(1 row) +</screen> </para> <para> - As a final note, you can specify that the results of a - select can be returned in a <firstterm>sorted order</firstterm> - or with duplicate rows removed. + <indexterm><primary>ORDER BY</primary></indexterm> + <indexterm><primary>DISTINCT</primary></indexterm> + <indexterm><primary>duplicate</primary></indexterm> + + As a final note, you can request that the results of a select can + be returned in sorted order or with duplicate rows removed. (Just + to make sure the following won't confuse you, + <literal>DISTINCT</literal> and <literal>ORDER BY</literal> can be + used separately.) <programlisting> SELECT DISTINCT city FROM weather ORDER BY city; </programlisting> - </para> - </sect1> - - <sect1 id="query-selectinto"> - <title>Redirecting SELECT Queries</title> - - <para> - Any <command>SELECT</command> query can be redirected to a new table -<programlisting> -SELECT * INTO TABLE temp FROM weather; -</programlisting> - </para> - <para> - This forms an implicit <command>CREATE</command> command, creating a new - table temp with the column names and types specified - in the target list of the <command>SELECT INTO</command> command. We can - then, of course, perform any operations on the resulting - table that we can perform on other tables. +<screen> + city +--------------- + Hayward + San Francisco +(2 rows) +</screen> </para> </sect1> - <sect1 id="query-join"> + + <sect1 id="tutorial-join"> <title>Joins Between Tables</title> + <indexterm zone="tutorial-join"> + <primary>join</primary> + </indexterm> + <para> - Thus far, our queries have only accessed one table at a - time. Queries can access multiple tables at once, or - access the same table in such a way that multiple - rows of the table are being processed at the same - time. A query that accesses multiple rows of the - same or different tables at one time is called a join - query. - As an example, say we wish to find all the records that - are in the temperature range of other records. In - effect, we need to compare the temp_lo and temp_hi - columns of each WEATHER row to the temp_lo and - temp_hi columns of all other WEATHER columns. + Thus far, our queries have only accessed one table at a time. + Queries can access multiple tables at once, or access the same + table in such a way that multiple rows of the table are being + processed at the same time. A query that accesses multiple rows + of the same or different tables at one time is called a + <firstterm>join</firstterm> query. As an example, say you wish to + list all the weather records together with the location of the + associated city. In effect, we need to compare the city column of + each row of the weather table with the name column of all rows in + the cities table. <note> <para> This is only a conceptual model. The actual join may @@ -281,102 +383,189 @@ SELECT * INTO TABLE temp FROM weather; to the user. </para> </note> - - We can do this with the following query: + This would be accomplished by the following query: <programlisting> -SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high, - W2.city, W2.temp_lo AS low, W2.temp_hi AS high - FROM weather W1, weather W2 - WHERE W1.temp_lo < W2.temp_lo - AND W1.temp_hi > W2.temp_hi; +SELECT * + FROM weather, cities + WHERE city = name; +</programlisting> -+--------------+-----+------+---------------+-----+------+ -|city | low | high | city | low | high | -+--------------+-----+------+---------------+-----+------+ -|San Francisco | 43 | 57 | San Francisco | 46 | 50 | -+--------------+-----+------+---------------+-----+------+ -|San Francisco | 37 | 54 | San Francisco | 46 | 50 | -+--------------+-----+------+---------------+-----+------+ -</programlisting> +<screen> + city | temp_lo | temp_hi | prcp | date | name | location +---------------+---------+---------+------+------------+---------------+----------- + San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53) + San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53) +(2 rows) +</screen> - <note> - <para> - The semantics of such a join are - that the qualification - is a truth expression defined for the Cartesian product of - the tables indicated in the query. For those rows in - the Cartesian product for which the qualification is true, - <productname>Postgres</productname> computes and returns the - values specified in the target list. - <productname>Postgres</productname> <acronym>SQL</acronym> - does not assign any meaning to - duplicate values in such expressions. - This means that <productname>Postgres</productname> - sometimes recomputes the same target list several times; - this frequently happens when Boolean expressions are connected - with an "or". To remove such duplicates, you must use - the <command>SELECT DISTINCT</command> statement. - </para> - </note> </para> <para> - In this case, both <literal>W1</literal> and - <literal>W2</literal> are surrogates for a - row of the table weather, and both range over all - rows of the table. (In the terminology of most - database systems, <literal>W1</literal> and <literal>W2</literal> - are known as <firstterm>range variables</firstterm>.) - A query can contain an arbitrary number of - table names and surrogates. + Observe two things about the result set: + <itemizedlist> + <listitem> + <para> + There is no result row for the city of Hayward. This is + because there is no matching entry in the + <classname>cities</classname> table for Hayward, so the join + cannot process the rows in the weather table. We will see + shortly how this can be fixed. + </para> + </listitem> + + <listitem> + <para> + There are two columns containing the city name. This is + correct because the lists of columns of the + <classname>weather</classname> and the + <classname>cities</classname> tables are concatenated. In + practice this is undesirable, though, so you will probably want + to list the output columns explicitly rather than using + <literal>*</literal>: +<programlisting> +SELECT city, temp_lo, temp_hi, prcp, date, location + FROM weather, cities + WHERE city = name; +</programlisting> + </para> + </listitem> + </itemizedlist> </para> - </sect1> - <sect1 id="query-update"> - <title>Updates</title> + <formalpara> + <title>Exercise:</title> + + <para> + Attempt to find out the semantics of this query when the + <literal>WHERE</literal> clause is omitted. + </para> + </formalpara> <para> - You can update existing rows using the - <command>UPDATE</command> command. - Suppose you discover the temperature readings are - all off by 2 degrees as of Nov 28, you may update the - data as follow: + Since the columns all had different names, the parser + automatically found out which table they belong to, but it is good + style to fully qualify column names in join queries: <programlisting> -UPDATE weather - SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2 - WHERE date > '1994-11-28'; +SELECT weather.city, weather.temp_lo, weather.temp_hi, weather.prcp, weather.date, cities.location + FROM weather, cities + WHERE cities.name = weather.city; </programlisting> </para> - </sect1> - - <sect1 id="query-delete"> - <title>Deletions</title> <para> - Deletions are performed using the <command>DELETE</command> command: + Join queries of the kind seen thus far can also be written in this + alternative form: + <programlisting> -DELETE FROM weather WHERE city = 'Hayward'; +SELECT * + FROM weather INNER JOIN cities ON (weather.city = cities.name); </programlisting> - All weather recording belonging to Hayward are removed. - One should be wary of queries of the form + This syntax is not as commonly used as the one above, but we show + it here to help you understand the following topics. + </para> + + <para> + <indexterm><primary>join</primary><secondary>outer</secondary></indexterm> + + Now we will figure out how we can get the Hayward records back in. + What we want the query to do is to scan the + <classname>weather</classname> table and for each row to find the + matching <classname>cities</classname> row. If no matching row is + found we want some <quote>empty values</quote> to be substituted + for the <classname>cities</classname> table's columns. This kind + of query is called an <firstterm>outer join</firstterm>. (The + joins we have seen to far are inner joins.) The command looks + like this: + <programlisting> -DELETE FROM <replaceable>tablename</replaceable>; +SELECT * + FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name); + + city | temp_lo | temp_hi | prcp | date | name | location +---------------+---------+---------+------+------------+---------------+----------- + Hayward | 37 | 54 | | 1994-11-29 | | + San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53) + San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53) +(3 rows) </programlisting> - Without a qualification, <command>DELETE</command> will simply - remove all rows from the given table, leaving it - empty. The system will not request confirmation before - doing this. + In particular, this query is a <firstterm>left outer + join</firstterm> because the table mentioned on the left of the + join operator will have each of its rows in the output at least + once, whereas the table on the right will only have those rows + output that match some row of the left table, and will have empty + values substituted appropriately. + </para> + + <formalpara> + <title>Exercise:</title> + + <para> + There are also right outer joins and full outer joins. Try to + find out what those do. + </para> + </formalpara> + + <para> + <indexterm><primary>join</primary><secondary>self</secondary></indexterm> + <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm> + + We can also join a table against itself. This is called a + <firstterm>self join</firstterm>. As an example, suppose we wish + to find all the weather records that are in the temperature range + of other weather records. So we need to compare the + <structfield>temp_lo</> and <structfield>temp_hi</> columns of + each <classname>weather</classname> row to the + <structfield>temp_lo</structfield> and + <structfield>temp_hi</structfield> columns of all other + <classname>weather</classname> rows. We can do this with the + following query: + +<programlisting> +SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high, + W2.city, W2.temp_lo AS low, W2.temp_hi AS high + FROM weather W1, weather W2 + WHERE W1.temp_lo < W2.temp_lo + AND W1.temp_hi > W2.temp_hi; + + city | low | high | city | low | high +---------------+-----+------+---------------+-----+------ + San Francisco | 43 | 57 | San Francisco | 46 | 50 + Hayward | 37 | 54 | San Francisco | 46 | 50 +(2 rows) +</programlisting> + + Here we have relabeled the weather table as <literal>W1</> and + <literal>W2</> to be able to distinguish the left and right side + of the join. You can also use these kinds of aliases in other + queries to save some typing, e.g.: +<programlisting> +SELECT * + FROM weather w, cities c + WHERE w.city = c.name; +</programlisting> + You will encounter this style of abbreviating quite frequently. </para> </sect1> - <sect1 id="query-agg"> - <title>Using Aggregate Functions</title> + + <sect1 id="tutorial-agg"> + <title>Aggregate Functions</title> + + <indexterm zone="tutorial-agg"> + <primary>aggregate</primary> + </indexterm> <para> + <indexterm><primary>average</primary></indexterm> + <indexterm><primary>count</primary></indexterm> + <indexterm><primary>max</primary></indexterm> + <indexterm><primary>min</primary></indexterm> + <indexterm><primary>sum</primary></indexterm> + Like most other relational database products, <productname>PostgreSQL</productname> supports aggregate functions. @@ -388,94 +577,214 @@ DELETE FROM <replaceable>tablename</replaceable>; </para> <para> - It is important to understand the interaction between aggregates and - SQL's <command>WHERE</command> and <command>HAVING</command> clauses. - The fundamental difference between <command>WHERE</command> and - <command>HAVING</command> is this: <command>WHERE</command> selects - input rows before groups and aggregates are computed (thus, it controls - which rows go into the aggregate computation), whereas - <command>HAVING</command> selects group rows after groups and - aggregates are computed. Thus, the - <command>WHERE</command> clause may not contain aggregate functions; - it makes no sense to try to use an aggregate to determine which rows - will be inputs to the aggregates. On the other hand, - <command>HAVING</command> clauses always contain aggregate functions. - (Strictly speaking, you are allowed to write a <command>HAVING</command> - clause that doesn't use aggregates, but it's wasteful; the same condition - could be used more efficiently at the <command>WHERE</command> stage.) - </para> - - <para> As an example, we can find the highest low-temperature reading anywhere with - <programlisting> +<programlisting> SELECT max(temp_lo) FROM weather; - </programlisting> +</programlisting> + +<screen> + max +----- + 46 +(1 row) +</screen> + </para> + + <para> + <indexterm><primary>subquery</primary></indexterm> If we want to know what city (or cities) that reading occurred in, we might try - <programlisting> -SELECT city FROM weather WHERE temp_lo = max(temp_lo); - </programlisting> +<programlisting> +SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>WRONG</lineannotation> +</programlisting> but this will not work since the aggregate - <function>max</function> can't be used in - <command>WHERE</command>. However, as is often the case the query can be - restated to accomplish the intended result; here by using a - <firstterm>subselect</firstterm>: + <function>max</function> cannot be used in the + <literal>WHERE</literal> clause. However, as is often the case + the query can be restated to accomplish the intended result; here + by using a <firstterm>subquery</firstterm>: - <programlisting> +<programlisting> SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather); - </programlisting> +</programlisting> + +<screen> + city +--------------- + San Francisco +(1 row) +</screen> - This is OK because the sub-select is an independent computation that - computes its own aggregate separately from what's happening in the outer - select. + This is OK because the sub-select is an independent computation + that computes its own aggregate separately from what is happening + in the outer select. </para> <para> - Aggregates are also very useful in combination with - <command>GROUP BY</command> clauses. For example, we can get the - maximum low temperature observed in each city with + <indexterm><primary>GROUP BY</primary></indexterm> + <indexterm><primary>HAVING</primary></indexterm> + + Aggregates are also very useful in combination with <literal>GROUP + BY</literal> clauses. For example, we can get the maximum low + temperature observed in each city with - <programlisting> +<programlisting> SELECT city, max(temp_lo) FROM weather GROUP BY city; - </programlisting> +</programlisting> + +<screen> + city | max +---------------+----- + Hayward | 37 + San Francisco | 46 +(2 rows) +</screen> which gives us one output row per city. We can filter these grouped - rows using <command>HAVING</command>: + rows using <literal>HAVING</literal>: - <programlisting> +<programlisting> SELECT city, max(temp_lo) FROM weather GROUP BY city - HAVING min(temp_lo) < 0; - </programlisting> + HAVING max(temp_lo) < 40; +</programlisting> + +<screen> + city | max +---------+----- + Hayward | 37 +(1 row) +</screen> which gives us the same results for only the cities that have some - below-zero readings. Finally, if we only care about cities whose - names begin with "<literal>P</literal>", we might do + below-forty readings. Finally, if we only care about cities whose + names begin with <quote><literal>S</literal></quote>, we might do - <programlisting> +<programlisting> SELECT city, max(temp_lo) FROM weather - WHERE city like 'P%' + WHERE city LIKE 'S%' GROUP BY city - HAVING min(temp_lo) < 0; - </programlisting> + HAVING max(temp_lo) < 40; +</programlisting> + </para> - Note that we can apply the city-name restriction in - <command>WHERE</command>, since it needs no aggregate. This is - more efficient than adding the restriction to <command>HAVING</command>, + <para> + It is important to understand the interaction between aggregates and + SQL's <literal>WHERE</literal> and <literal>HAVING</literal> clauses. + The fundamental difference between <literal>WHERE</literal> and + <literal>HAVING</literal> is this: <literal>WHERE</literal> selects + input rows before groups and aggregates are computed (thus, it controls + which rows go into the aggregate computation), whereas + <literal>HAVING</literal> selects group rows after groups and + aggregates are computed. Thus, the + <literal>WHERE</literal> clause must not contain aggregate functions; + it makes no sense to try to use an aggregate to determine which rows + will be inputs to the aggregates. On the other hand, + <literal>HAVING</literal> clauses always contain aggregate functions. + (Strictly speaking, you are allowed to write a <literal>HAVING</literal> + clause that doesn't use aggregates, but it's wasteful; the same condition + could be used more efficiently at the <literal>WHERE</literal> stage.) + </para> + + <para> + Note that we can apply the city name restriction in + <literal>WHERE</literal>, since it needs no aggregate. This is + more efficient than adding the restriction to <literal>HAVING</literal>, because we avoid doing the grouping and aggregate calculations - for all rows that fail the <command>WHERE</command> check. + for all rows that fail the <literal>WHERE</literal> check. + </para> + </sect1> + + + <sect1 id="tutorial-update"> + <title>Updates</title> + + <indexterm zone="tutorial-update"> + <primary>UPDATE</primary> + </indexterm> + + <para> + You can update existing rows using the + <command>UPDATE</command> command. + Suppose you discover the temperature readings are + all off by 2 degrees as of November 28, you may update the + data as follow: + +<programlisting> +UPDATE weather + SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2 + WHERE date > '1994-11-28'; +</programlisting> + </para> + + <para> + Look at the new state of the data: +<programlisting> +SELECT * FROM weather; + + city | temp_lo | temp_hi | prcp | date +---------------+---------+---------+------+------------ + San Francisco | 46 | 50 | 0.25 | 1994-11-27 + San Francisco | 41 | 55 | 0 | 1994-11-29 + Hayward | 35 | 52 | | 1994-11-29 +(3 rows) +</programlisting> </para> </sect1> + + <sect1 id="tutorial-delete"> + <title>Deletions</title> + + <indexterm zone="tutorial-delete"> + <primary>DELETE</primary> + </indexterm> + + <para> + Suppose you are no longer interested in the weather of Hayward, + then you can do the following to delete those rows from the table. + Deletions are performed using the <command>DELETE</command> + command: +<programlisting> +DELETE FROM weather WHERE city = 'Hayward'; +</programlisting> + + All weather recording belonging to Hayward are removed. + +<programlisting> +SELECT * FROM weather; +</programlisting> + +<screen> + city | temp_lo | temp_hi | prcp | date +---------------+---------+---------+------+------------ + San Francisco | 46 | 50 | 0.25 | 1994-11-27 + San Francisco | 41 | 55 | 0 | 1994-11-29 +(2 rows) +</screen> + </para> + + <para> + One should be wary of queries of the form +<synopsis> +DELETE FROM <replaceable>tablename</replaceable>; +</synopsis> + + Without a qualification, <command>DELETE</command> will simply + remove all rows from the given table, leaving it + empty. The system will not request confirmation before + doing this. + </para> + </sect1> + </chapter> <!-- Keep this comment at the end of the file |