Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

February 2010 Archives

Having worked with CEO, Scott Davis of Lyzasoft and produced a white paper on Collaborative Analytics in the first half of 2009, it came as no surprise to me that version 2.0 of Lyza had a major emphasis in the same area.  What did surprise me, however, was how far they have advanced the concepts and implementation in such a short timeframe!

Successful collaboration between decision makers requires an environment that facilitates a free-flowing but well-managed conversation about ongoing analyses as they evolve from initial ideas to full-fledged solutions to business problems.  Consider a common scenario.  The first analyst gathers data she considers relevant and creates an initial set of assumptions, data manipulations and results.  She shares this via e-mail with her peers for confirmation, and she receives suggestions for improvement, some of which she incorporates in a new version.  Her manager reviews the work personally and makes further suggestions; a new version emerges.  She also shared the intermediate solution with a second department, and the analyst there created another solution based on the original.  Meanwhile, the first analyst finds an error in her logic buried deep in cell Sheet3!AB102...

We all know the problems with multiple unmanaged copies, rework, silently propagated errors and so on in the usual spreadsheet- and e-mail-based business analysis environment.  Lyza and Lyza Commons together address these issues by creating a comprehensive tracking and auditing mechanism for every step of an analysis and providing an integrated environment for sharing and discussing work among collaborators.  Integral metadata links all copies derived from an initial analysis.  Twitter-like conversations (called Blurbs) about an analysis are linked to the referenced object creating a comprehensive context for the conversation and the underlying analysis.  The folks at Lyzasoft have also come up with a security concept for sharing analyses they call Mesh Trust that should make sense in most enterprise collaboration environments.

My bottom line?  Lyza and Lyza Commons 2.0 provide a seamless blending of analytic function, managed and controlled access to information resources and enterprise-adapted social networking around analytic results and their provenance.  This is precisely the type of function needed by businesses who want to regain control of spreadmarts that have run amok.  This is the right conceptual foundation for real, meaningful business insight and innovation going forward.

Posted February 25, 2010 2:58 PM
Permalink | No Comments |
As mentioned in my last post, ParAccel had a really interesting announcement coming out this week.  I was talking about their partnering with Fusion-io to attach SSD technology in their Paraccel Analytic Appliance for even faster query performance.  ParAccel are not alone in their use of SSD; Teradata's 4555 and Oracle's Exadata 2 also include the technology.  For me, it's not even about faster query results for users.  It's about the implications for the entire Data Warehouse architecture.

Over the past couple of years, we've seen dramatic improvements in database performance due to hardware and software advances such as in-memory databases, columnar storage, massively parallel processing, compression, and so on as described in my white paper from April 2009.  SSD, in one sense, is just another piece of accelerating technology.  However, add it to the existing list, and you begin to see the possibility of revisiting old assumptions about what is possible within a single database.  Here are a few ideas to play with:

  • Do you still need that Data Mart?  With so much faster performance, maybe the queries you now run in the Mart could run directly on the EDW.  Reducing data duplication has enormous benefits, on storage volumes, but principally in reducing maintenance of ETL to the Marts.
  • Where to do operational BI?  It was once considered necessary to install a separate ODS to support closer to real-time access to consolidated atomic data.  But with such a fast database, couldn't you just trickle feed the data and do it all in the Warehouse itself.  One less copy of data and one less set of ETL can't be all bad!
  • ETL or ELT?  Extract, transform and load has been the traditional way of loading a Warehouse, with a special engine to do the transform step.  Well, with a faster and more powerful database engine, you have the option to try extract, load and transform and let the Warehouse database do the transform work.
Although ParAccel, like all the smaller vendors are focusing more on selling to the "bigger, faster, more complex analytics applications" market at present, I'm pretty sure that the work ParAccel is doing under the covers on query optimization, workload management, loading and updating features will pave the way for a sea change in how we do data warehousing in the next few years.


Posted February 17, 2010 2:34 PM
Permalink | No Comments |
Kim Stanick and Rick Glick of ParAccel were at the Boulder BI Brain Trust (BBBT) last Friday. They have an exciting announcement coming soon, and much of what was discussed was under NDA, so I can't give details here. But about half-way through their presentation, they threw up a slide saying simply "EDW: What's not working?"

Well, that's a negative question! And, anyway, I believe most of us have some good ideas about what's not working--from project scoping and delivery issues to problems of complexity of feeds and bottlenecks in timely data availability. So, let me re-frame the question: "Where next for EDW?"

I wrote a BI Thought Leader for ParAccel last April called "Analytic Databases in the World of the Data Warehouse" that began to address that question, and as the world of BI has evolved since, I want to revisit that question briefly. Back then, I wrote:

"Specialized analytic databases using [advanced] technologies ... now offer significantly improved performance for typical BI applications, enable previously impossible analyses and often lower cost implementation. They also have the potential to challenge the current physically layered Data Warehouse architecture. This paper ... argues that analytical databases may enable a move to a simpler non-layered architecture with significant benefits in terms of lower costs of implementation, maintenance, and use."

In brief, it's our old friend, the paradigm shift, enabled by a dramatic shift in the price-performance characteristics of data warehouses driven by a new generation of technology. The possibility I saw then was a return to a physically simpler, more singular implementation of the EDW. And indeed that may still be a first step.

My thinking has evolved further since then, and I'm really beginning to envisage a much larger problem space that we need to address--how to integrate the entire enterprise information set, operational, informational and collaborative. I call that Business Integrated Insight (BI2), described in a more recent white paper. The discussion at BBBT last Friday led by a number of physical database technology experts gave rise to some new insights into how BI2 could be physically instantiated.

Virtualization at every level of the environment--servers, applications, data and particularly databases--linked closely with advances in the technology (as opposed to the hype) of cloud computing is widely discussed today as a way to reduce IT capital and operating costs, consolidate infrastructure, simplify resource management and so on. However, database virtualization offers new possibilities in the physical implementation of an enterprise data architecture that spans all data types and processing needs. Chief among these are flexibility of implementation, adaptability, mediated access to and use of data across multiple database types, significant reductions in data duplication and the gradual construction of overarching models that describe the entire business information resource. I'm sure there's much more to be said on this topic, but I'd love to hear the views of some experts in the field.


Posted February 9, 2010 6:52 AM
Permalink | No Comments |