Posted October 12, 2009 7:58 AM
Permalink | No Comments |
Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!
David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at firstname.lastname@example.org or at (301) 754-6350.
Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!
Currently at the Dataflux IDEAS conference, and have sat through two sessions in which the speakers are discussing how the amount of data is exploding, with the implication that we need more effective data governance to manage this flood. While I cannot disagree with the sentiment, I'd have to suggest that buried within both speakers' messages lies the challenge:
Instead of attempting to address new data challenges with traditional approaches, we need to reinvent data governance in the context of the changing ways that people are using that data.
The more data there is, the more difficult it is to filter out the signal from the noise. How does one distinguish between the requirements for overseeing signal and those for noise? More to follow...
Before my current career in data management, I had a previous life as a software developer, working on designing and implementing compilers for Fortran and C for massively parallel processing (MPP) computers, and while I have been working on data quality and BI for the past 13 years or so, I still have a great interest in the high performance computing space. Recently I have had the opportunity to indulge that interest with respect to learning about the distributed/parallel programming model that Google has championed called MapReduce, and its relationship to the use of analytical database management systems.
There are some similarities, some differences, and ultimately, the two paradigms are complementary when it comes to supporting end-user business needs. If you are interested in the thought processes, check out this analysis paper, funded by Vertica, which compares and contrasts both high performance approaches.