Posted October 12, 2009 7:58 AM
Permalink | No Comments |
Currently at the Dataflux IDEAS conference, and have sat through two sessions in which the speakers are discussing how the amount of data is exploding, with the implication that we need more effective data governance to manage this flood. While I cannot disagree with the sentiment, I'd have to suggest that buried within both speakers' messages lies the challenge:
Instead of attempting to address new data challenges with traditional approaches, we need to reinvent data governance in the context of the changing ways that people are using that data.
The more data there is, the more difficult it is to filter out the signal from the noise. How does one distinguish between the requirements for overseeing signal and those for noise? More to follow...
Before my current career in data management, I had a previous life as a software developer, working on designing and implementing compilers for Fortran and C for massively parallel processing (MPP) computers, and while I have been working on data quality and BI for the past 13 years or so, I still have a great interest in the high performance computing space. Recently I have had the opportunity to indulge that interest with respect to learning about the distributed/parallel programming model that Google has championed called MapReduce, and its relationship to the use of analytical database management systems.
There are some similarities, some differences, and ultimately, the two paradigms are complementary when it comes to supporting end-user business needs. If you are interested in the thought processes, check out this analysis paper, funded by Vertica, which compares and contrasts both high performance approaches.