However, I'd like to take the opportunity to focus our minds once again on a more fundamental question: how is IT going to manage data quality and reliability in a rapidly expanding data environment, both in terms of data volumes and places to store the data? I'm currently describing a logical enterprise architecture, Business Integrated Insight (BI2), that focuses on this.
So, for me, what the acquisition emphasizes, like that of Sybase by SAP, is that specialized databases, with their sophisticated features and functions, are rapidly entering the mainstream of database usage. Their ability to handle large data volumes with vast improvements in query performance has become increasingly valuable in a wide range of industries that want to analyze enormous quantities of very detailed data at relatively low cost. How to do this? Vendors of these systems typically have a simple answer: copy all the required data into our machine and away you go!
My concern is that IT ends up with yet another copy of the corporate data, and a very large copy at that, that must be kept current in meaning, structure and content on an ongoing basis. Any slippage in maintaining one or more of these characteristics leads inevitably to data quality problems and eventually to erroneous decisions. Such issues typically emerge unexpectedly, in time-constrained or high-risk situations and lead to expensive and highly visible firefighting actions by IT. Unfortunately, such occurrences are common in BI environments, but typically relate to unmanaged spreadsheets or relatively small data marts. We have just jumped the problem size up by a couple of orders of magnitude.
So, am I suggesting that you shouldn't be using these specialized databases? Would I recommend that you stand in front of a speeding freight train? Clearly not!
There are two ways that these problems will be addressed. One falls upon customer IT departments, while the other comes back to the database industry and the vendors, whether acquiring or acquired. These paths will need to be followed in parallel.
IT departments need to define and adopt stringent "data copy minimization" policies. The purist in me would like to say "elimination" rather than "minimization". However, that's clearly impossible. Minimization of data copies, in the real world, requires IT to evaluate the risks of yet another copy of data, the possibility of using an existing set of data for the new requirement and, if a new copy of the data is absolutely needed, whether existing analytic solutions could be migrated to this new copy of data and the existing data copies eliminated.
Meanwhile, it is incumbent upon the database industry to take a step back and look at the broader picture of data management needs in the context of emerging technologies and the explosive growth in data volumes. The basic question that needs to be asked is: how can the enormous power and speed of these emerging technologies be crafted into solutions that equally support divergent data use cases on a single copy of data? And, if not on a single copy, how can multiple copies of data be managed to complete consistency invisibly within the database technology?
Tough questions, perhaps, but ones that the acquirers in this industry, with their deep pockets, need to invest in. As the database market re-converges, the vendors that solve this architectural conundrum will become the market leaders in highly consistent, pervasive and minimally duplicated data that enables IT to focus on solving real business needs rather than managing data quality. Wouldn't that be wonderful?
Posted July 7, 2010 1:18 PM
Permalink | No Comments |



