Saturday, June 25, 2011

Cleanup on the Map Aisle

Spills happen. A sticky honey jar in the grocery store. A glass of orange juice at breakfast. Oil spills in the Gulf of Mexico. Node spills in OpenStreetMap. Some can be cleaned up with a mop and some water. Others require a more technical solution. What is a "node spill" you ask? It's when nodes get uploaded to OSM that don't have any tags on them and aren't part of a way. These tagless, unconnected nodes add no useful information to the map and are just dead weight in the database. Where do they come from? There are at least a couple of common sources of node spills. One is editor bugs and simple user error. These are generally pretty small spills of 10s or maybe 100s of nodes. The bigger problem is imports.

The topic of imports deserves its own post. For now I'll just say that badly performed imports and insufficient checking after imports can lead to 10s of thousands of empty nodes. The basic problem is that nodes get uploaded first and don't become part of a way until the way is uploaded later. So if something goes wrong with the way upload, the nodes end up in an orphaned state.

I have come across a few of these nodes before but what really caught my attention was a failed NHD (National Hydrography Dataset) import in Oklahoma. It happened to poke up into Kansas a little bit where I noticed it next to a state highway I was editing. After a lengthy thread on the talk-us mailing list I eventually found a good way to detect these nodes. Since then I have made my way across the US and some of Canada deleting these useless nodes from the database. As detailed in this message it goes something like this:

  • Use the XAPI to perform a query of the form /node[not(way)][bbox=a,b,c,d]
  • Open the result of this query in JOSM and apply a filter to hide all nodes with tags
  • Perform some checks to make sure there really is no useful data
  • Delete nodes and upload