Business Intelligence Network business intelligence resources

Blog: David Loshin

« The Value of a Link | Main | XML Schemas and Strict Typing - First thoughts »

Open Source Data

We hear a lot about open source software and its potential benefits to the marketplace. How about the concept of open source data? The idea is creating a repository of data that is readily available, can be configured for business benefit, and is collectively supported by a development community.

One place to start is with public data, such as what is available from the US Census Bureau.

Every 10 years, the US Census Bureau conducts a census, and as part of that process, collects a huge amount of demographic data about on avery granular level, geographically. They then spend the next 5+ years analyzing the data and preparing it for release, while at the same time preparing for the next decennial census.

The problem is that sometimes, by the time the decennial data is released it no longer accurately reflects an area's demographics. For example, consider how rapidly real estate prices have risen in the past 5 years - yet 2005 home prices are not captured in Census 2000 data. Similarly, the Tiger/Line data that contains information about street addresses is occasionally updated, but new streets and subdivisions are constantly being built, so it is likely that there are omissions in the Census data set.

There are many other public domain, public records, or generally available data sets that are of great interest to the BI community. So here is the challenge: Tell me how you feel about a project to take on a publicly available data set and create an "Open Source" approach to managing various approaches to maintaining and presenting that data. One example might be taking the Census decennial data and formulating it into a relational data structure mapped across the geographic Tiger/Line data? Pose your ideas as comments to this enrty...

  Posted by David Loshin on August 14, 2005 1:21 PM |

Post a comment