FlowingData Forums » Statistics and Data

Looking for U.S. population by zip dataset

Started 2 years ago by karmakaze / 7 posts

  1. I tried searching on census.gov and data.gov, but couldn't find the data in a single table for download.

    Actually, I'm not really interested in the raw data, rather I'm interested in creating a sample set of 10,000 "users" spread evenly across the U.S. by zipcode for some performance testing.

    I realize real users most likely won't follow that distribution, but it's sure better than assigning a random zip between 00000 and 99999 :)

  2. I may be wrong, but zip code is for easy postal delivery, not always very good for population statistics.

    I think they use by county only.

  3. i think you're right vtstarin. zipcodes don't really have specific boundaries so you can't really get population per zip. for example, a zipcode might only be used for single post office.

  4. So, would you guys suggest using county then? Is there a data set of all U.S. counties by population, or would I need to compile that by state?

    As I said, I'm really just looking for a way to generate 10,000 mock users spread somewhat realistically throughout the US, so if there's already a good sample I could use instead that would work too.

    Actually, to get more specific, I'd love to have a mock network of 10,000 'facebook' users, with 'location' and friend graph. Both facebook and location in quotes since I don't need real data, just something that would approximate it to test performance.

    Basically I'm trying to do a smaller scale version of this: http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3/

    But with the added wrinkle of location. So I want to generate a network of 10,000 users, each with a location and friends list, and then use that mock data to test performance of a couple different implementations.

  5. yeah, i'd go with population by county, which you can get from US Census:

    http://www.census.gov/popest/counties/counties.html

  6. Awesome, thanks!

    Know of any free geocoding services or datasets that let you query nearby counties? I'm thinking to make my friends network a little more accurate, I'll take 0-10 hops on a random walk to a county within, say, 50 miles. If I start with the 'people' with the least number of friends in the graph, by the time I get to people with a significant number of friends, they should have friends already set in a random location in the country. I think this is somewhat realistic (if you only have a couple friends, chances are they live nearby you, if you have more, there's a higher likelyhood of having friends spread across the country).

  7. This is probably no longer an open question, but a good place to go for zipcode info is http://mcdc2.missouri.edu/websas/geocorr2k.html.

    In the 2000 census, they actually made zipcodes a SPATIAL geography with defined border, a set population etc. Prior to that, zipcodes changed quite a bit and did not have definitely locational boundaries. ZCTAs are mostly like zipcodes except sometimes they are n:n. Still for purposes like this, I think you'd be fine to use them.

    There is also a 1991 zip population dataset available from the same site. However, make a time series of consistent zip code areas is a very tricky endeavor that requires quite a bit of assumptions.


Reply

You must log in to post.

About this Topic