Sunday, November 21, 2010

Nationwide TIGER Map

Update: You should of course still read this post because it is is awesome but there is a new post with an updated version of the map here.

I presented most of this material this weekend at WhereCamp5280 and promised to put this online soon. I have a few more details and maybe some raw data that I will put in another post after I get back home after Thanksgiving but for now I'll throw up what I have.

Well after some sweat, blood and tears (mostly on the part of my hard drives) I was eventually successful in extracting the data I needed to make a map of the entire US! If you just want the map feel free to skip down but I am going to document my trials and tribulations a little.

First, some things that did not work:

Importing the whole planet file into an apidb using osmosis. I actually did this before I made the original Kansas map but it is part of the workflow I went through to get to the final map. The initial response I got on IRC was that it would take maybe 200 GB and a few days, maybe a week. I thought it would be fun to poke around in the database a bit beyond just this map so I decided to try it. I believe the time/space estimate may be accurate for importing to a pgsql database for rendering however I wanted a full apidb. The result:
Yes, that is 3 weeks. I kept adding disk space to the volume hoping each time that it was almost done. The good news is that ext4's online resizing works great! But after having my heart broken one too many times I went hunting for more information which I probably should have done first. On the mailing list it was pointed out that the size of the main OSM apidb (The Source) is 1.4 TB and while that does include historic data that isn't in the planet file, I realized that I may only half way done so I decided that I needed a different approach.

I tried trimming out a rough bounding box for the US which worked. Then I tried filtering out only nodes used in ways since I am only interested in the imported TIGER ways. I was hoping that a lot of the overhead in creating the apidb had to do with having to hit the node table every time a way was inserted due to foreign key constraints and such. I don't have a pretty graph for this attempt but apparently osmosis builds a data structure in memory for every node before filtering out the unused ones. Or something... The end result is that a computer with 16 GB of memory in it was still not enough to finish the job and failed within minutes with an "Out of Memory" exception from osmosis.

My last ditch effort was just to import the whole US. I didn't use a fancy polygon to filter, just a straight bounding box that included Alaska and Hawaii so I'm sure I got some extra data (like a good chunk of Canada) but this ended up working... after 80 hours and about to 200 GB.

Success!

Click for a bigger version to see those small counties!
The first thing you might notice is that there are some counties with no data. This is because the TIGER ways in those counties, for one reason or another, do not have tiger:county tags. Some were imported by a user other than DaveHansenTiger and they chose not to import all the TIGER attributes. Some were imported by him but just didn't have the tags. It was suggested at WhereCamp that these may have been the first counties imported and that he may have still been fine-tuning the import process. This seems like a plausible theory.

Other than the missing data, the map makes sense for the most part. The west and northeast coasts are more edited than the center of the country. Large cities are easy to pick out. But there are a few individual counties that do stand out here and there. I'm guessing there are more people like me who have taken an interest in their local area out in the middle of nowhere and have made noticeable contributions to the map. The level of activity in Iowa does seem a little odd. I should single out Iowa and take a look at the data to see if I missed a bot or something. But who knows - it might just be a really motivated Iowan. 

Anyway, there you have it. As I said above, there will be another post with some details and data at a later time. Right now I am tired and need to get to bed after spending the day gathering GPS traces on the slopes of Winter Park with Steve, Hurricane, Josh and Richard. It was all for the map - I can assure you that no "fun" was had by anyone!

Well maybe a little.

1 comment:

  1. One suggestion I will make, in the name of visualization, is to use a separate color for 0 modified ways. Right now the map mixes 4 percent changed in with that, and it seems like 0 is a special "nobrainer" category.

    ReplyDelete