0% found this document useful (0 votes)
30 views

4.6.2 Using Indexes: 4.7 Complex Queries

GiST indexes in PostgreSQL have two advantages over R-Tree indexes for spatial data: (1) GiST indexes are null safe and can index columns with null values, and (2) GiST indexes support lossiness which allows storing only the bounding box of large spatial objects in the index. The document then discusses how to ensure spatial indexes are used effectively, including vacuuming tables regularly to update statistics and potentially adjusting random_page_cost to influence the query planner. Complex spatial queries are also discussed, noting only bounding box-based operators like && can leverage indexes, while functions like distance() cannot, and demonstrating use of && to reduce the number of distance calculations needed.

Uploaded by

Mathias Eder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

4.6.2 Using Indexes: 4.7 Complex Queries

GiST indexes in PostgreSQL have two advantages over R-Tree indexes for spatial data: (1) GiST indexes are null safe and can index columns with null values, and (2) GiST indexes support lossiness which allows storing only the bounding box of large spatial objects in the index. The document then discusses how to ensure spatial indexes are used effectively, including vacuuming tables regularly to update statistics and potentially adjusting random_page_cost to influence the query planner. Complex spatial queries are also discussed, noting only bounding box-based operators like && can leverage indexes, while functions like distance() cannot, and demonstrating use of && to reduce the number of distance calculations needed.

Uploaded by

Mathias Eder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

PostGIS 1.5.

1 Manual
36 / 315

VACUUM ANALYZE [table_name] [column_name];


-- This is only needed for PostgreSQL 7.4 installations and below
SELECT UPDATE_GEOMETRY_STATS([table_name], [column_name]);

GiST indexes have two advantages over R-Tree indexes in PostgreSQL. Firstly, GiST indexes are "null safe", meaning they can
index columns which include null values. Secondly, GiST indexes support the concept of "lossiness" which is important when
dealing with GIS objects larger than the PostgreSQL 8K page size. Lossiness allows PostgreSQL to store only the "important"
part of an object in an index -- in the case of GIS objects, just the bounding box. GIS objects larger than 8K will cause R-Tree
indexes to fail in the process of being built.

4.6.2 Using Indexes


Ordinarily, indexes invisibly speed up data access: once the index is built, the query planner transparently decides when to use
index information to speed up a query plan. Unfortunately, the PostgreSQL query planner does not optimize the use of GiST
indexes well, so sometimes searches which should use a spatial index instead default to a sequence scan of the whole table.
If you find your spatial indexes are not being used (or your attribute indexes, for that matter) there are a couple things you can
do:
Firstly, make sure statistics are gathered about the number and distributions of values in a table, to provide the query planner with better information to make decisions around index usage. For PostgreSQL 7.4 installations and below this is done
by running update_geometry_stats([table_name, column_name]) (compute distribution) and VACUUM ANALYZE [table_name] [column_name] (compute number of values). Starting with PostgreSQL 8.0 running VACUUM ANALYZE will
do both operations. You should regularly vacuum your databases anyways -- many PostgreSQL DBAs have VACUUM run as
an off-peak cron job on a regular basis.
If vacuuming does not work, you can force the planner to use the index information by using the SET ENABLE_SEQSCAN=OFF
command. You should only use this command sparingly, and only on spatially indexed queries: generally speaking, the planner
knows better than you do about when to use normal B-Tree indexes. Once you have run your query, you should consider setting
ENABLE_SEQSCAN back on, so that other queries will utilize the planner as normal.

Note
As of version 0.6, it should not be necessary to force the planner to use the index with ENABLE_SEQSCAN.

If you find the planner wrong about the cost of sequential vs index scans try reducing the value of random_page_cost in
postgresql.conf or using SET random_page_cost=#. Default value for the parameter is 4, try setting it to 1 or 2. Decrementing
the value makes the planner more inclined of using Index scans.

4.7 Complex Queries


The raison detre of spatial database functionality is performing queries inside the database which would ordinarily require
desktop GIS functionality. Using PostGIS effectively requires knowing what spatial functions are available, and ensuring that
appropriate indexes are in place to provide good performance.

4.7.1 Taking Advantage of Indexes


When constructing a query it is important to remember that only the bounding-box-based operators such as && can take advantage of the GiST spatial index. Functions such as distance() cannot use the index to optimize their operation. For example,
the following query would be quite slow on a large table:

PostGIS 1.5.1 Manual


37 / 315

SELECT the_geom
FROM geom_table
WHERE ST_Distance(the_geom, ST_GeomFromText(POINT(100000 200000), -1)) < 100

This query is selecting all the geometries in geom_table which are within 100 units of the point (100000, 200000). It will be
slow because it is calculating the distance between each point in the table and our specified point, ie. one ST_Distance()
calculation for each row in the table. We can avoid this by using the && operator to reduce the number of distance calculations
required:
SELECT the_geom
FROM geom_table
WHERE the_geom && BOX3D(90900 190900, 100100 200100)::box3d
AND
ST_Distance(the_geom, ST_GeomFromText(POINT(100000 200000), -1)) < 100

This query selects the same geometries, but it does it in a more efficient way. Assuming there is a GiST index on the_geom,
the query planner will recognize that it can use the index to reduce the number of rows before calculating the result of the distance() function. Notice that the BOX3D geometry which is used in the && operation is a 200 unit square box centered
on the original point - this is our "query box". The && operator uses the index to quickly reduce the result set down to only
those geometries which have bounding boxes that overlap the "query box". Assuming that our query box is much smaller than
the extents of the entire geometry table, this will drastically reduce the number of distance calculations that need to be done.

Change in Behavior
As of PostGIS 1.3.0, most of the Geometry Relationship Functions, with the notable exceptions of ST_Disjoint and
ST_Relate, include implicit bounding box overlap operators.

4.7.2 Examples of Spatial SQL


The examples in this section will make use of two tables, a table of linear roads, and a table of polygonal municipality boundaries.
The table definitions for the bc_roads table is:
Column
| Type
| Description
------------+-------------------+------------------gid
| integer
| Unique ID
name
| character varying | Road Name
the_geom
| geometry
| Location Geometry (Linestring)

The table definition for the bc_municipality table is:


Column
| Type
| Description
-----------+-------------------+------------------gid
| integer
| Unique ID
code
| integer
| Unique ID
name
| character varying | City / Town Name
the_geom
| geometry
| Location Geometry (Polygon)

1. What is the total length of all roads, expressed in kilometers?


You can answer this question with a very simple piece of SQL:
SELECT sum(ST_Length(the_geom))/1000 AS km_roads FROM bc_roads;
km_roads
-----------------70842.1243039643
(1 row)

You might also like