Pages

Showing posts with label mapping. Show all posts
Showing posts with label mapping. Show all posts

Monday, May 20, 2013

New __geo_interface__ for PyShp



Christian Ledermann took the initiative to fork pyshp and add the __geo_interface__ convention.
http://twitter.com/GeoJSON


The __geo_interface__ is a community standard riding the current "less is more" entropy wave to get away from heavy data exchange standards, make software compatible, and get some work done.

This standard is very pythonic and well thought out which is no surprise because Sean Gillies and Howard Butler are a driving forces behind it.  The goal is to make moving data around among libraries with different specialties, like Shapely and PySAL, easier.  It is closely tied to GeoJSON which is getting a lot of traction and shaking up the industry and community.

Christian's  __geo_interface__ implementation for PyShp is here:

https://github.com/cleder/pyshp

He also wrote some ogr2ogr-style conversion samples to show you how to use it here:
https://github.com/cleder/geo_file_conv

I'm 100% behind these ideas and will roll this interface into the main trunk.  But there's nothing stopping you from using Christian's fork today.

Enjoy!

Monday, September 5, 2011

Map Projections

A reader pointed out to me recently that that the pyshp documetnatin or wiki should include something about map projections.  And he is right.   Many programmers working with shapefiles are not necessarily geospatial professionals but have found themselves working with geodata on some project.

It is very difficult to just "scratch the surface" of GIS.  You don't have to dig very deep into this field before you uncover some of the eccentricities of geographic data. Map projections are one such feature that is easy to understand at a basic level but has huge implications for geospatial programmers.

Map projections are conceptually straight-forward and intuitive.  If you try to take any three-dimensional object and flatten it onto a plane, such as your screen or a sheet of paper, the object is distorted.  (Remember the orange peel experiment from 7th grade geography?) You can manipulate this distortion to preserve common properties such as area, scale, bearing, distance, shape, etc. 

I won't go into the details of map projections as there are thousands of web pages and online videos devoted to the subject.  But there are some things you need to know for dealing with them programmatically.  First of all, most geospatial data formats don't even contain any information about map projections.  This lack of metadata is really mostly just geospatial cultural history with some technical reasons.  And furthermore, while the concept of map projections is easy to grasp, the math to transform a coordinate from one projection to another is quite complex.  The end result is most data libraries don't deal with projections in any way.

But now, thanks to modern software and the Internet making data exchange easier and more common, nearly every data format, both images and vector, have tacked on a metadata format that defines the projection.  For shapefiles this is the .prj projection file which follows the naming convention .prj   In this file, if it exists, you will find a string defining the projection in a format called well-known text or WKT.  And here's a gotch that blew my mind as a programmer a long time ago: if you don't have that projection definition, and you don't know who created the data - there is no way you are ever going to figure it out.  The coordinates in the file are just numbers and offer no clue to the projection.  You don't run into this problem much any more but it used to be quite common because GIS shops typically produced maps and not data.  All your coworkers knew the preferred projection for your shop so nobody bothered to create a bunch of metadata.  But now, modern GIS software won't even let you load a shapefile into a map without forcing you to choose a projection if it's not already defined.  And that's a good thing.

If you do need to deal with projections programmatically you basically have one choice: the PROJ4 library.  It is one of the few free libraries, if not the only library period, that comprehensively deals with re-projecting goespatial data.  Fortunately it has bindings for just about every language out there and is incorporated into many libraries including OGR.  There is a Python project called pyproj which provides python bindings.

So be aware that projections are not trivial and can often add a lot of complexity to what would otherwise be a simple programming project.  And also know that pyshp does nothing to work with map projections.  I did an earlier post on how to create a .prj file for a shapefile and why I chose not to include this functionality in the library itself.

Here are some other resources related to map projections.

SpatialReference.org - a clearning house for projection definitions

PROJ4 - the #1 map projection library

OGR2OSM - Python script to convert OGR vector formats to the Open Street Map format with projection support

PyProj - Python bindings for Proj4 library

GDAL - Python bindings to GDAL which contains OGR and PROJ4 allowing you to reporject raster and vector data

Wednesday, August 24, 2011

Smoothed Best Estimate of Trajectory (SBET) Format

A buddy of mine in the LiDAR industry asked me if we had any software which could handle the Applanix SBET binary format.  I wasn't familiar with it but it turns out on Google the only parsers you can find just happend to be in Python. 

I'm still not 100% sure what these datasets do so if you know feel free to post a comment.

So if you need to work with this format here are two options:

http://arsf-dan.nerc.ac.uk/trac/attachment/wiki/Processing/SyntheticDataset/sbet_handler.py

http://schwehr.blogspot.com/2010/12/best-possible-python-file-reading.html

And if you're into LiDAR you are probably already aware of PyLAS for the LAS format;

http://code.google.com/p/pylas/

Thursday, December 2, 2010

Dot Density Maps with Python and OGR

If you use Python for GIS sooner or later you'll use GDAL for manipulating raster data and its vector cousin OGR for working with vector data. OGR has a Python API for most of the methods in the C++ library and even provides some basic geometry analysis. And most importantly it can read/write and therefore convert data in a variety of vector file and database formats.

OGR provides a fast way to create dot density maps.  A dot density map represents statistical information about an area as mathematically distributed points. Areas with higher values have a higher concentration of points. This is one of my favorite types of maps because it is a great example of GIS - visualizing geographic data in a way that is instantly comprehensible.

I'm using OGR in this example because it can read and write shapefiles. But unlike the Python Shapefile Library it can also perform basic geometry operations needed for this sample. Most GIS programs would display the population information on some type of memory layer instead of actually outputting a shapefile for the density layer as demonstrated here.  But we're going to keep things simple for this example and just create a shapefile.

Assuming you have Python installed, here are some basic gdal/ogr installation instructions.
1. Go to http://trac.osgeo.org/gdal/wiki/DownloadingGdalBinaries and download the gdal binary for your platform
2. Extract the directory to your hard drive
3. Add the "bin" directory within the gdal folder to your system shell path
4. Set the path to the "data" directory in the gdal folder to an environment variable called "GDAL_DATA"
5. Install the appropriate python module for your Python version and platform from here: http://pypi.python.org/pypi/GDAL/1.6.0#downloads

If you want to follow along with the example below you can download the source shapefile:
http://pyshp.googlecode.com/files/GIS_CensusTract.zip

The end result of this demo is pictured above with both the input census block and output dot density shapefiles. 

The following code will read in the source shapefile, calculate the number of points needed to represent the population density evenly, and then create the point shapefile:

from osgeo import ogr
import random
# Open shapefile, get OGR "layer", grab 1st feature
source = ogr.Open("GIS_CensusTract_poly.shp")
county = source.GetLayer("GIS_CensusTract_poly")
feature = county.GetNextFeature()
# Set up the output shapefile and layer
driver = ogr.GetDriverByName('ESRI Shapefile')
output = driver.CreateDataSource("PopDensity.shp")
dots = output.CreateLayer("PopDensity", geom_type=ogr.wkbPoint)
while feature is not None:
  field_index = feature.GetFieldIndex("POPULAT11")
  population = int(feature.GetField(field_index))
  # 1 dot = 100 people
  density = population / 100
  # Track dots created
  count = 0   
  while count < density:
    geometry = feature.GetGeometryRef()
    minx, maxx, miny, maxy = geometry.GetEnvelope()
    x = random.uniform(minx,maxx)
    y = random.uniform(miny,maxy)
    f = ogr.Feature(feature_def=dots.GetLayerDefn())
    wkt = "POINT(%f %f)" % (x,y)
    point = ogr.CreateGeometryFromWkt(wkt)
    # Don't use the random point unless it's inside the polygon.
    # It should be close as it's in the bounding box
    if feature.GetGeometryRef().Contains(point):
        f.SetGeometryDirectly(point)
        dots.CreateFeature(f)
        count += 1
    # Destroy C object.
    f.Destroy()
  feature = county.GetNextFeature()
source.Destroy()
output.Destroy()    

There is no error handling in this sample so if you run it multiple times you need delete the output dot density shapefile.

Note that this type of rendering only works when you have one polygon representing each data value. For example you couldn't do this operation with a world country boundary shapefile because islands like Hawaii associated with a country would force an inaccurate representation. For that type of map you need to use a choropleth map.

Also note that when you use OGR for shapefile editing you must specify a "layer" after opening a file. This extra step is necessary because OGR handles dozens of formats, some of which are layered vector formats such as DWG using the same API. Also because OGR is a wrapped C library you have to adjust to explicitly destroying objects and extreme camel casing on method calls usually not found in Python.

OGR and the raster equivalent GDAL are two very powerful libraries which dominate the open source geospatial world. They are also included in several well-known commercial packages thanks to the commercial-friendly MIT license.

Sunday, November 28, 2010

Introducing the Python Shapefile Library

Over Thanksgiving I finally got around to releasing the Python Shapefile Library. It is a single file of pure Python with no dependencies. It reads and writes shp, shx, and dbf for all 15 types of shapefiles in a pythonic way. You can find it with documentation here in the CheeseShop or search for "pyshp" on Google Code.

This library simply reads and writes shapefiles with no support for geometry calculations or the other eight or nine other supporting and undocumented shapefile formats including indexes and projection files which have been added since the specification was published in 1998.

Here's a basic example of writing a polygon shapefile:
import shapefile
w = shapefile.Writer(shapefile.POLYGON)
w.poly(parts=[[[1,5],[5,5],[5,1],[3,3],[1,1]]])
w.field('FIRST_FLD','C','40')
w.field('SECOND_FLD','C','40')
w.record('First','Polygon')
w.save('shapefiles/test/polygon')
There are plenty of other examples in the documentation.

The library consists of a Reader class, a Writer class, and an Editor class which simplifies making changes to an existing shapefile by giving you one object to work with so you don't have to juggle the Reader and the Writer objects yourself.

Beyond the docstring tests and some unit tests I tried PSL out in Jython with no issues. It's been awhile since I've run the tests. I want to try out Jython again as well as the other Python implementations which have a "struct" and some form of "os" module. I don't expect any issues with IronPython.

My company sells industrial-strength, native shapefile libraries for Java and Visual Basic which I was not involved in developing. I wrote this simple library to fully learn the shapefile specification for my own curiosity and to lead to some improvements in our commercial libraries. I learned quite a bit and we plan to release some very interesting features to our JShapefile and VBShapefile libraries in 2011 which will solve some major annoyances faced by developers who work with the shapefile format on a regular basis. More on that later...

PSL is not the only way to write shapefiles with Python however as far as I know it is the only complete pure Python library. Every other option is a Python wrapper around a C or C++ library (not that there's anything wrong with that) or partially-developed in Python only. I like having a pure Python, dependency-free, no-setup choice even if it's much slower than a highly-optimized, C-based module. Here's why:
  1. C-based modules can't follow your code everywhere - at least not easily (ex. Google App Engine and other web hosts, many embedded platforms, Python on different runtimes such as Jython and IronPython)
  2. Unless the developer really goes out of his or her way, C-based geospatial libraries wrapped in Python have kludgy-feeling methods and return opaque objects. There are notable exceptions to this rule but they are few and far between.
  3. Speed is the #1 reason developers cite as a reason to create C-based Python modules. In the geospatial domain the complexity of the data formats and spatial calculations makes wrapping libraries the easier choice. But most developers use Python because of the speed of development and ease of maintenance rather than program execution. In the rapidly-growing geospatial technology world new ideas are coming out every day. Rapid application development is key. The more easy-to-use, easy-to-change libraries the better.
Here are some other Python shapefile tools.

ShpUtils - Zack Johnson's pure-Python shapefile reader.

Shapelib - The original C-based shapefile library with Python bindings.

Pyshape - an alternative shapelib wrapper

OGR - General vector read/write library from shapelib creator Frank Warmerdam

Shapefile - a pure-Python read/write module under development

Tuesday, February 17, 2009

Zippity Doo Dah

Back in mid 2003 NVision had just a handful of geospatial programmers and we were relatively unknown outside of the NASA Stennis Space Center incubator which housed our tiny office. So I was surprised when a stranger who had googled us called asking about a adding a zipcode store finder to his website. "Simple enough," I thought.

I quickly did the calculations in my head for labor and data preparation of an HTML iframe containing a form with an ArcIMS map. I put the stranger on hold and ran my estimate by my boss. It sounded reasonable to him. So I took the stranger off hold and said, "Yes sir, we can do that for about $2,000." He repeated the figure back to me in disgust, "Two-THOUsand dollars? Are you kidding me?" He hung up. I was caught off guard. Didn't this guy realize what he was asking for? Geocoding stores, running those points against polygons, returning a distance to the would-be customer - and all on the Internet of all places. This was before GoogleMaps mind you.

Of course the point of all this is to show how far we've come in 6 years. Now zipcode calculators are a nifty hack people blog about using their cell phone on the way to the airport. And there are plenty for Python.

Here's several Python zipcode calculators in order of complexity.

This one uses a database built in MySQL:;
http://jehiah.cz/archive/spatial-proximity-searching-using-latlongs

This one provides a python interface to geonames.org:
http://www.zindep.com/news/interface-python-pour-geonames/?searchterm=geonames

Here's one using SQLObject to store the data
http://www.zachary.com/s/blog/2005/01/12/python_zipcode_geo-programming


And just to show how common this concept has become - it's now a casual entry in the Python Cookbook alongside other bits of trivial python code:
http://code.activestate.com/recipes/393241/


Granted some of these scripts are distance calculators but I have to concede the stranger who called me in 2003 was absolutely right but maybe a little ahead of his time. It was really more of a $200 problem than $2000.

Thursday, February 12, 2009

Mapnik - Maybe the best Python Mapping Platform Yet

The vast majority of geospatial libraries are written in either C or C++ for two reasons: 1) Speed and 2) Many of these libraries have began development when C/C++ were the languages du jour.

Over the years Python bindings have appeared for many of these libraries to make them more easier to use. These bindings are extremely helpful as so much of geospatial work involves one-off data conversions or quick map production. Over the last few years several Python libraries have emerged developed from the ground up for Python and these are downright fun!

One of the most notable examples is Mapnik started by Artem Pavlenko. Mapnik caught my eye when it first started because it promised to have a great Python API. Mapnik also had a sharp focus on attractive maps using anti-aliasing and some relatively new graphics libraries. When I checked in on Mapnik recently I was really impressed.


View Larger Map



Mapnik fills all the checkboxes for a great python tool and it's a great geospatial library as well. It has pre-compiled binaries for python 2.5 so there's no need to go back and forth to the mailing list just to get it to compile before you see if you like it or not. Most opensource libraries have troubles keeping documentation current with the rapid pace of development. Mapnik addresses this problem by heavy use of example/test scripts and a busy wiki with lots of examples from users and developers.

Another interesting feature of Mapnik is its straightforward xml dialiect for styling maps allowing you to cleanly separate map appearance from programming logic. And not to forget one of the core values of the library the rendering is truly great. There are several output options including svg and pdf using either the Agg or Cairo graphics libraries. The developers openly compare the rendering quality to GoogleMaps and rightfully so. Mapnik goes beyond antialiasing and rouned line joints to address even more interesting rendering challenges. One striking example of the rendering quality is the new "BuildingSymbolizer" which creates a nifty "pseudo 3D building effect" on polygons.

Mapnik was designed from the start for both web and desktop use. OpenStreetMap uses the library to render it's map. Go to OpenStreetMap.org and zoom into London to see it in action. From its trendy Django website and Trac Wiki to its slick rendering, xml map descriptors, and ready-to-run pythonic API Mapnik is a modern geospatial python library that will certainly add users to the geospatial python community.