GeospatialPython.com: conversion

Showing posts with label conversion. Show all posts

Monday, May 20, 2013

New __geo_interface__ for PyShp

Christian Ledermann took the initiative to fork pyshp and add the __geo_interface__ convention.

http://twitter.com/GeoJSON

The __geo_interface__ is a community standard riding the current "less is more" entropy wave to get away from heavy data exchange standards, make software compatible, and get some work done.

This standard is very pythonic and well thought out which is no surprise because Sean Gillies and Howard Butler are a driving forces behind it. The goal is to make moving data around among libraries with different specialties, like Shapely and PySAL, easier. It is closely tied to GeoJSON which is getting a lot of traction and shaking up the industry and community.

Christian's __geo_interface__ implementation for PyShp is here:

https://github.com/cleder/pyshp

He also wrote some ogr2ogr-style conversion samples to show you how to use it here:
https://github.com/cleder/geo_file_conv

I'm 100% behind these ideas and will roll this interface into the main trunk. But there's nothing stopping you from using Christian's fork today.

Enjoy!

Monday, September 5, 2011

Map Projections

A reader pointed out to me recently that that the pyshp documetnatin or wiki should include something about map projections. And he is right.   Many programmers working with shapefiles are not necessarily geospatial professionals but have found themselves working with geodata on some project.

It is very difficult to just "scratch the surface" of GIS. You don't have to dig very deep into this field before you uncover some of the eccentricities of geographic data. Map projections are one such feature that is easy to understand at a basic level but has huge implications for geospatial programmers.

Map projections are conceptually straight-forward and intuitive. If you try to take any three-dimensional object and flatten it onto a plane, such as your screen or a sheet of paper, the object is distorted. (Remember the orange peel experiment from 7th grade geography?) You can manipulate this distortion to preserve common properties such as area, scale, bearing, distance, shape, etc.

I won't go into the details of map projections as there are thousands of web pages and online videos devoted to the subject. But there are some things you need to know for dealing with them programmatically. First of all, most geospatial data formats don't even contain any information about map projections. This lack of metadata is really mostly just geospatial cultural history with some technical reasons. And furthermore, while the concept of map projections is easy to grasp, the math to transform a coordinate from one projection to another is quite complex. The end result is most data libraries don't deal with projections in any way.

But now, thanks to modern software and the Internet making data exchange easier and more common, nearly every data format, both images and vector, have tacked on a metadata format that defines the projection. For shapefiles this is the .prj projection file which follows the naming convention .prj   In this file, if it exists, you will find a string defining the projection in a format called well-known text or WKT. And here's a gotch that blew my mind as a programmer a long time ago: if you don't have that projection definition, and you don't know who created the data - there is no way you are ever going to figure it out. The coordinates in the file are just numbers and offer no clue to the projection. You don't run into this problem much any more but it used to be quite common because GIS shops typically produced maps and not data. All your coworkers knew the preferred projection for your shop so nobody bothered to create a bunch of metadata. But now, modern GIS software won't even let you load a shapefile into a map without forcing you to choose a projection if it's not already defined. And that's a good thing.

If you do need to deal with projections programmatically you basically have one choice: the PROJ4 library. It is one of the few free libraries, if not the only library period, that comprehensively deals with re-projecting goespatial data.  Fortunately it has bindings for just about every language out there and is incorporated into many libraries including OGR. There is a Python project called pyproj which provides python bindings.

So be aware that projections are not trivial and can often add a lot of complexity to what would otherwise be a simple programming project. And also know that pyshp does nothing to work with map projections. I did an earlier post on how to create a .prj file for a shapefile and why I chose not to include this functionality in the library itself.

Here are some other resources related to map projections.

SpatialReference.org - a clearning house for projection definitions

PROJ4 - the #1 map projection library

OGR2OSM - Python script to convert OGR vector formats to the Open Street Map format with projection support

PyProj - Python bindings for Proj4 library

GDAL - Python bindings to GDAL which contains OGR and PROJ4 allowing you to reporject raster and vector data

Monday, February 28, 2011

Changing a Shapefile's Type

A polygon, line, and point version of the same shapefile.

Sometimes you want to convert a shapefile from one type to another. For example you may want to convert a line shapefile to a polygon or a polygon to a point or multipoint shapefile. There are many reasons for this type of operations ranging from error checking, to special queries, to inconvenient distribution formats. For example a lot of coastline data is distributed as line data but you may want to convert it to a polygon to estimate coastal erosion using area comparisons between two different dates.

Performing this type of conversion is very straightforward using the Python Shapefile Library. In fact the conversion is basically a one-off version of the shapefile merge example I wrote about recently. You read in one shapefile and write the features and records out to another of the correct type. There are a couple of pitfalls you need to be wary of though. One is the current version (1.0) of the PSL requires you to explicitly set the shape type of each record if you want to convert them. The second issue is if you are converting to a single point shapefile where each point feature is a record you must compensate for the imbalance in the dbf records by copying the record from the parent feature for each point. Instead of dealing with this issue you could simply create a multi-point shapefile where each shape record is allowed to be a collection of points. Which method you choose depends on what you are trying to do with the output. The examples below cover both methods.

The example in this post takes a state boundary polygon file and converts it to a line shapefile, then a multipoint shapefile, then a regular point shapefile. Note the difference between the point shapefile and the line and multipoint examples.

"""
Convert one shapefile type to another 
"""

import shapefile


# Create a line and a multi-point 
# and single point version of
# a polygon shapefile

# The shapefile type we are converting to
newType = shapefile.POLYLINE

# This is the shapefile we are trying
# to convert. In this case it's a
# state boundary polygon file for 
# Mississippi with one polygon and
# one dbf record.
r = shapefile.Reader("Mississippi")

## POLYLINE version
w = shapefile.Writer(newType)
w._shapes.extend(r.shapes())
# You must explicity set the shapeType of each record.
# Eventually the library will set them to the same
# as the file shape type automatically.
for s in w.shapes():
  s.shapeType = newType
w.fields = list(r.fields)
w.records.extend(r.records())
w.save("Miss_Line")

## MULTIPOINT version
newType = shapefile.MULTIPOINT

w = shapefile.Writer(newType)
w._shapes.extend(r.shapes())
for s in w.shapes():
  s.shapeType = newType
w.fields = list(r.fields)
w.records.extend(r.records())
w.save("Miss_MPoint")

## POINT version
newType = shapefile.POINT

w = shapefile.Writer(newType)
# For a single point shapefile
# from another type we
# "flatten" each shape
# so each point is a new record.
# This means we must also assign
# each point a record which means
# records are usually duplicated.
for s in r.shapeRecords():
  for p in s.shape.points:
    w.point(*p)
    w.records.append(s.record)  
w.fields = list(r.fields)
w.save("Miss_Point")

You can download the state boundary polygon shapefile used in the example from the GeospatialPython Google Code Project Downloads section. You can download the sample script above from the subversion repository of that same project.

And of course the Python Shapefile Library is here.

Saturday, December 4, 2010

Rasterizing Shapefiles

Converting a shapefile into an image has two common uses. The first is in web mapping servers. All data in the map is fused into an image which is then optionally tiled and cached at different scales. This method is how Google Maps, ESRI ArcGIS Server, and UMN Mapserver all work. UMN Mapserver even includes a command-line utility called "Shp2Image" which converts its "mapscript" configuration file into an image for quick testing. The second common reason to convert a shapefile into an image is to use it as a mask to clip remotely-sensed imagery. In both cases most geospatial software packages handle these operations for you behind the scenes.

The very simple script below shows you how you can rasterize a shapefile using the Python Shapefile Library (PSL) and the Python Imaging Library (PIL). PIL is a very old and well-developed library originally created to process remote sensing imagery however it has absolutely no spatial capability. What it does have is the ability to read and write multiple image formats and can handle very large images. It also has an API that lets you easily import and export data to and from other libraries using python strings and arrays. The PIL ImageDraw module provides an easy way to draw on an image canvas.

The following script reads in a shapefile, grabs the points from the first and only polygon, draws them to an image, and then saves the image as a PNG file with an accompanying .pgw world file to make it a geospatial image. Most modern GIS packages handle PNG images but you could just as easily change the file and worldfile extension to jpg and jgw respectively for even better compatibility. As usual I created minimal variables to keep the code short and as easy to understand as possible.

import shapefile
import Image, ImageDraw

# Read in a shapefile
r = shapefile.Reader("mississippi")
# Geographic x & y distance
xdist = r.bbox[2] - r.bbox[0]
ydist = r.bbox[3] - r.bbox[1]
# Image width & height
iwidth = 400
iheight = 600
xratio = iwidth/xdist
yratio = iheight/ydist
pixels = []
for x,y in r.shapes()[0].points:
  px = int(iwidth - ((r.bbox[2] - x) * xratio))
  py = int((r.bbox[3] - y) * yratio)
  pixels.append((px,py))
img = Image.new("RGB", (iwidth, iheight), "white")
draw = ImageDraw.Draw(img)
draw.polygon(pixels, outline="rgb(203, 196, 190)", 
                fill="rgb(198, 204, 189)")
img.save("mississippi.png")

# Create a world file
wld = file("mississippi.pgw", "w")
wld.write("%s\n" % (xdist/iwidth))
wld.write("0.0\n")
wld.write("0.0\n")
wld.write("-%s\n" % (ydist/iheight))
wld.write("%s\n" % r.bbox[0])
wld.write("%s\n" % r.bbox[3])
wld.close

You can download this script here:
http://geospatialpython.googlecode.com/svn/trunk/shp2img.py

You can download the shapefile used here:
http://geospatialpython.googlecode.com/files/Mississippi.zip

Of course you will also need the Python Shapefile Library found here and the latest version of the Python Imaging Library from here.

The image created by this script is featured at the top of this post.

The idea of using a shapefile as a clipping mask for an image can be done with GDAL. The python API for GDAL includes integration with the well-known Python Numeric (NumPy) package using a module called "gdalnumeric". Both gdalnumeric and PIL contain "tostring" and "fromstring" methods which allow you to move image data back and forth between the packages. GDAL and NumPy make handling geospatial data as numerical arrays easier and PIL's API makes creating a polygon clipping mask much easier.

I'll cover using PIL, GDAL, NumPy, and PSL together in a future post. I'll also demonstrate a way where the above operation can be performed using pure Python.

Pages