Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

September 2010 Archives

Speeding up database performance for analytic work has been all the rage recently.  Most of the new players in the field tout a combination of hardware and software advances to achieve 10-100 times and more improvement in query speeds.  Netezza's approach has been more hardware-oriented than most--their major innovation being the FPGA (field-programmable gate array) that sits between the disk and processor in each Snippet Blade (basically, MPP node).  The FPGA is coded with a number of Fast Engines, two of which, in particular, drive performance: the Compress engine, which compresses and decompresses data to and from the disk, and the Project and Restrict engine responsible for removing unneeded data from the stream coming off the disk.  Netezza say that data volumes through the rest of the system can be reduced by as much as 95% in this manner.

So, the FPGA is the magic ingredient.  Combine that with a re-architecting of Netezza in TwinFin, released last August, that more effectively layered the disk access and moved to Intel-based CPUs on IBM BladeCenter technology, and you can see why Daniel Abadi came to the very prescient conclusion a month ago that IBM would be a likely suitor to acquire Netezza.

It seems likely that the short-term intent of the acquisition is to boost IBM's presence in the appliance market, competing especially with Oracle Exadata, not to mention EMC Greenplum and Terdata.  Of more interest is the medium- and longer-term directions for the combined product line and for data warehousing in general.  Curt Monash has already given his well-judged thoughts on the product implications, to which I'd like to add some.

My thoughts relate to the broad parallel between FPGA programming and microcode.  You could argue that Netezza's FPGA is basically a microcoded accelerator for analytic access to data on commodity hard drives.  As a long-time proponent of microcoded dedicated components and accelerators in its systems, dating all the way back to the System/360, IBM's way of thinking and Netezza's approach align nicely.  The question, of course, is how transparently it could be done underneath DB2, and further, the willingness of DB2 for Linux, UNIX and Windows to embrace the use of accelerators as DB2 for z/OS has.  The possible application of this approach under the Informix database shouldn't be forgotten either.

The interesting thing here is that the Netezza Fast Engine approach is inherently extensible.  The MPP node passes information to the FPGA as to the characteristics of the query, allowing it to perform appropriate preprocessing on the data streaming to or from the disk.  In theory, at least, there is no reason why such preprocessing couldn't be applied in situations other than analytic.  Using contextual metadata to qualify OLTP records?  Preprocessing content to mine implicit metadata?  Encryption / decryption?  It all lines up well with my contention that we are seeing the beginning of a convergence between the currently separate worlds of operational, informational and collaborative information.

But, what does this acquisition suggest for data warehousing in general?  Well, despite my long history with and high regard for IBM, I do fear that this acquisition is part of a trend that is reducing innovation in the industry.  The explosion of start-ups in BI over the past few years has resulted in a wonderful blossoming of new ideas, in stark contrast to the previous ten years, when traditional relational databases were the answer; now, what's the question.  Big companies find it very difficult to nurture innovation and their acquisitions often end up killing the spark that made the start-up worth acquiring in the first place.  IBM is by no means the worst in this regard, but I do hope that the inventions and innovations the characterized Netezza continue to live and thrive in Big Blue...  for the good of the data warehousing industry.


Posted September 22, 2010 3:02 PM
Permalink | No Comments |
Simon Arkell and Jamie MacLennan briefed me over the past couple of days on their new cloud-based, self-service, predictive analytics software, Predixion Insight, launched on 13 September.  Closely integrated with Microsoft Excel and PowerPivot, Predixion Insight offers business analysts a powerful and compelling set of predictive analytic function in an environment that is familiar to almost every business person today.

To a first level of approximation, predictive analytics is a modern outgrowth of data mining.  Both areas have traditionally been associated with large (often MPP) machines, complex data preparation and manipulation processes and PhDs in statistics.  What Predixion Software has done is to move the entire process within the reach of people with little of those three things.  The heavy lifting is done in the cloud, and at a licensing cost of $99 per user per month, the required computing power and statistical algorithms are made readily available to most businesses.  While some knowledge of statistics and data manipulation is still needed, the use of the familiar Excel paradigm makes the whole process less threatening.  And Predixion divides the tasks over two tabs on the Excel ribbon: Insight Analytics, aimed at those creating analyses and Insight Now, which gathers tasks related to running existing parameterized, analyses.  

Bottom line is that Predixion brings to predictive analytics what spreadsheets brought to planning, and given its close integration with the spreadsheet behemoth, the case is very compelling.  In competition with the traditional data mining and predictive analytics tools, Predixion Insight has the potential to be a very disruptive influence in the market.

The other potential for disruption is in data management and quality programs in businesses.  Of course, this is neither new nor unique to Predixion; but looking at what it can do and how easily it does it brings the question right to the front of my mind.  The problem has been around since the 1980s when spreadsheets first became popular and has grown ever since.  Self-service BI exacerbates the issue still further.  It is sometimes argued that managing and controlling the data sources, usually through a data warehouse, can address these issues.  But the sad truth is that once the data is out of the centrally-managed environment, the cat is out the bag.  Spreadsheets enable users to manipulate data as they need (or want) with no easy way to audit or track what they've done.  Errors and indeed fraud cannot be easily detected.  And they are readily promulgated through the organization as users share spreadsheets and build upon each other's work.  Self-service, predictive analytics ups the ante even further.  

Statistics are often clumped with lies and damn lies, and with good reason.  Not only are they easily misused, but they are often misunderstood or misapplied.  The dangers inherent in making predictive analytics available to a wider audience in the business should not be underestimated by IT or by the audit functions of large businesses, particularly those in the financial industry.  It could be argued that the recent financial meltdown was caused in part by an overreliance on mathematical and statistical models.  And these were often in the hands of people with PhDs.  What is the danger of giving the same tools to marketing people and middle managers?  And, as we know, Sarbanes-Oxley carries possible jail penalties for c-level executives who sign off on misunderstood financials!

My advice to Predixion Software is to build a lot more mandatory tracking and auditability function into their product.  And there is a huge upside to doing this as well.  It provides the basis for real social networking and collaboration in BI and for ensuring that the true sources of business innovation and insight are fully integrated into the core information provision infrastructure of the company as I've written about here and here.  By the way, the same advice goes to Microsoft, if they're listening!

Posted September 17, 2010 4:59 AM
Permalink | No Comments |