Sunday, March 11, 2012

Remapping using TIGER 2011

The license change mapping process I talked about in my previous post works well where there is still some clean data to go off of and you're just having to redo some work that a decliner did in splitting ways or adding some nodes to refine the geometry. But what about when an entire area is on the chopping block? I present the OSM Inspector view in Glendale, CA:



The area I worked on was just south of here and was actually even worse although I didn't get a "before" screenshot. Less yellow, more red. Here is what it looked like on Simon Poole's "badmap" rendering. Pretty much the entire road network is going to get nuked:


Most of this data is from a single, very prolific contributor who is not likely to agree to the new terms. It is also highly unlikely that anyone in the area is going to be able to redo all of his work before April 1st. So I reached for a bigger paintbrush: TIGER 2011.

To use TIGER data in OSM is no simple matter. For one thing, the Census Bureau uses abbreviations rather heavily and OSM is strongly anti-abbreviation, and for good reason. Unfortunately it is not a simple matter of throwing a regular expression at it and expanding all "Ave" to "Avenue" and "N" to "North" - what if there is a street named "N Street" which runs parallel to "O Street"? But then again, there may be a legitimate "North Street" somewhere. The TIGER data does indicate what pre- and post-fixes are present on a given road so the full name can be reconstructed with some effort. But to use this information is a little cumbersome. And of course the accuracy is always an issue but that's kind of a given with this data.

First I downloaded the shapefile for Los Angeles county. Then I had to get the "Feature Names Relationship File" and use QGIS to join the two based on the LINEARID field. I ended up with a shapefile with all the name information present. The complication here is that sometimes roads have multiple entries in the relationship file and I'm not sure what QGIS does in this case. Since I was only working on a small area, I reviewed the data and made sure it all looked sane. Since a whole county's worth of data can be hard to work with in JOSM, I clipped out just the area I wanted to work on and exported that as a new shapefile in QGIS.

The next step was to convert this data to .osm format. I used a new version of the ogr2osm python script written by Andrew Guertin to do this. To translate shapefile attributes to OSM tags requires a translation file. I cooked one up based on some previous traffic on the talk-us mailing list for mapping TIGER road types to OSM highway=* tags. For the name, I did some more looking at the data and came up with a bare minimum implementation for the few hundred highways I had to work with. I don't expect this to work well in other places without modification. But here is what I came up with. Patches welcome :)
https://github.com/ToeBee/ogr2osm-translations/blob/master/tiger2011.py

And that was the easy part.

Now to integrate this data into OSM. First I downloaded the entire area in JOSM, selected everything and ran the LicenseChange plugin against it. Of course the whole screen lit up in red. Then I opened the .osm file generated by ogr2osm. I also changed my JOSM settings to render the "inactive" layer in a bright color so I could easily see where things were missing.

Then it was just a matter of spending a few hours deleting ways in the area, switching to the TIGER layer, copying the missing roads and then pasting them into the "live" layer of OSM data. Then of course I had to find all the places where the old and the new data intersected and recombine them to maintain correct topology. The JOSM validator can help with this although I'm sure some mistakes made it in. The larger an area you can do at once, the less fiddling you have to do with the topology since all the data you copy/paste from the TIGER data will (for the most part) have correct internal topology. But bigger areas are harder to keep track of as you go. I usually tried to find relatively self-contained areas with as few as possible connections to the rest of the world. This spot on the map is a little unique in that it seems to be in a hilly area and it isn't your average grid pattern of streets. So there were actually isolated networks that I could deal with in chunks.

Of course I always had to keep an eye out for TIGER strangeness. Some fictional roads and that kind of thing. 99% of what I touched was just residential roads. Where I did touch tertiary/secondary ones I did some additional comparison since TIGER's road classifications don't always match up with reality.

This whole process was kind of complicated when there was actually something I  didn't have to replace. I always prefer local mapper contributions over TIGER data so I had to make sure I didn't steamroll over what little clean data I did manage to find.

So at the end of the day (literally) the area now looks like this on badmap. Obviously I was only concentrating on the road network so the parks are still there.


And even from a lower zoom, there is a noticeable hole in OSMI:


Definitely an improvement but there is still much, MUCH more work to do :(

The changeset I did this in can be seen here:
http://www.openstreetmap.org/browse/changeset/10937678

1 comment:

  1. I have been using the TIGER 2011 data for NW Tucson. Residential streets are pretty good, but there is not granularity with other roads. I look forward to trying out your translation file to see if it helps me.

    ReplyDelete