FlowingData Forums » Statistics and Data

US housing crisis data

Started 2 years ago by hadley / 2 posts

  1. The US housing crisis has undermined the world economy in far reaching and poorly understood ways. Although there is a lot of speculation over the causes and the effects of the housing crisis, most hypotheses are not backed up by data. We hope to promote well-informed policy and discussion, and aid exploration and analysis, by making creating an accessible and reproducible repository of data and analysis

    Data related to the housing crisis exists in large (up to 10 gb), independent, and often messy data sets. The variety and inconsistency of data creates an obstacle for analysis, and this summer we have working to provide views of this data that are consistent, concise and complete. To ensure that all manipulation is transparent, both data cleaning and analysis have been carried out with the source statistical software R. Both code and data are freely licensed and made available on github. To date we have cleaned and organised 13 data sets related to the housing crisis, and by keeping the code transparent and reproducible, we hope to inspire others to contribute their data and ideas.

    This research project is a collaboration between Rice University undergrads, graduate students, and Hadley Wickham, an Assistant Professor of Statistics. It is funded by the NSF's Vertically Integrated Grants for Research and Education in Mathematical Sciences (VIGRE) program, NSF grant DMS-0739420.

  2. This is a great idea. It's sad to think about how much data cleaning code I've thrown out and wasted.


Reply

You must log in to post.

About this Topic

Tags

No tags yet.