We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

For some time now, when it comes to big data, my mantra has been "big data is simply all data".  IBM's April 3 announcement served admirably to reinforce that point of view. Was it a big data announcement, a DB2 announcement, or a hardware announcement?  The short answer is "yes", to all the above and more.

Weaving together a number of threads, Big Blue created a credible storyline that can be summarized in three key thoughts: larger, faster and simpler.  As many of you may know, I worked for IBM until early 2008, so my views on this announcement are informed by my knowledge of how the company works or, perhaps, used to work.  Last Wednesday, I came away impressed.  Here were a number of diverse, individual product developments that conform to a single theme across different lines and businesses.

Take BLU acceleration as a case in point.  The headline, of course, is that DB2 LUW (on Linux, Unix and Windows) 10.5 introduces a hybrid architecture.  Data can be stored in columnar tables with extensive compression, making use of in-memory storage and taking further advantage of parallel and vector processing techniques available on modern processors.  The result is an up to 25% improvement in analytic and reporting performance (and considerably more in specific queries) and up to 90% data compression.  In addition, the elimination of indexes and aggregates simplifies considerably the need for manual tuning and maintenance of the database.  This is a direction that has long been shown by small, newer vendors such as ParAccel and Vertica (now part of HP), so it is hardly a surprise.  IBM can claim a technically superior implementation, but more impressive is the successful retrofitting into the existing product base.  And the re-use of the technology in the separate Informix TimeSeries code base to enhance analytics and reporting there too, as well as the promise that it will be extended to other data workloads in the future.  It seems the product development organization is really pulling together across different product lines.  That's no mean feat within IBM.

Another hint at the strength of the development team was the quiet announcement of a technology preview of JSON support in DB2 at the same time as the availability of 10.5.  JSON is one of the darlings of the NoSQL movement that provides significant agility to support unpredictable and changing data needs.  See my May 2012 white paper "Business Intelligence--NoSQL... No Problem" for more details.  As in its support for other NoSQL technologies, such as XML and RDF graph databases, IBM has chosen to incorporate support for JSON into DB2.  There are pros and cons to this approach.  Performance and scalability may not match a pure JSON database, but the ability to take advantage of the ACID and RAS characteristics of an existing, full-feature database like DB2 makes it a good choice where business continuity is a strong requirement.  IBM clearly recognizes that the world of data is no longer all SQL, but that for certain types of non-relational data, the difference is sufficiently small that they can be handled as an adjunct to the relational model through a "subservient" engine, allowing easier joining of NoSQL and SQL data types.  This is a vital consideration for machine-generated data, one of three information domains I've defined in a recent white paper, "The Big Data Zoo--Taming the Beasts".

The announcement didn't ignore the little yellow elephant, either.  The PureData System family has been expanded with the PureData System for Hadoop, with built-in analytics acceleration and archiving, and provides significantly simpler and faster deployment of projects requiring the MapReduce environment.  And InfoSphere BigInsights 2.1 offers the Big SQL interface to Hadoop, an alternative file system, GPFS-FPO, with enhanced security and no single point of failure, as well as high availability.

While the announcement clearly targeted Big Data--at the Speed of Business, the underlying message, as seen above, is much broader.  This view is of an emerging information ecosystem that must be considered from a fully holistic viewpoint.  A key role, and perhaps even the primary role, for BigInsights / Hadoop is in exploratory analytics, where innovative, what-if thinking is given free rein.  But the useful insights gained here must eventually be transferred to production (and back) in a reliable, secure, managed environment--typically a relational database.  This environment must also operate at speed, with large data volumes and with ease of management and use.  These are characteristics that are clearly emphasized in this announcement.  They are also key components of the integrated information platform I described in the Data Zoo white paper already mentioned.  Missing still are some of the integration-oriented aspects such as the comprehensive, cross-platform metadata management, data integration and virtualization required to tie it all together.  IBM has more to do to deliver on the full breadth of this vision, but this announcement is a big step in the right direction.

Posted April 8, 2013 9:14 AM
Permalink | No Comments |

Leave a comment


Search this blog
Categories ›
Archives ›
Recent Entries ›