Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

October 2012 Archives

I was the analyst on The Briefing Room this week with NuoDB's CEO, Barry Morris.  The product itself is extremely interesting, both in its concept and technology; and will formally launch in the next month or so after a long series of betas.  More about that a little later...

But first, I need to vent!  For some time now, I've been taking an interest in NoSQL because of its positioning in the Big Data space.  I've always had a real problem with the term - whether it means Not SQL or Not Only SQL - because defining anything by what it's not is logical nonsense.  Even the definitive nosql-database.org falls into the trap, listing 122+ examples from "Possibly the oldest NoSQL DB: Adabas" to the wonderfully-named StupidDB.  A simple glance at the list of categories used to classify the entries shows the issue: NoSQL is catch-all for a potentially endless list of products and tools.  Just because they don't use SQL as an access language is insufficient as a definition.

NewSQL Ecosystem.PNGBut, my irritation now extends to "NewSQL", a term I went a-Googling when I saw that NuoDB is sometimes put in this category.  This picture from Matthew Aslett of 451 Research's presentation was interesting if somewhat disappointing: another gathering of tools with a mixed and overlapping set of characteristics, most of which relate to their storage and underlying processing approaches, rather than anything new about SQL, which is, of course,  at heart a programming language.  So why invent the term NewSQL when the aim is to keep the same syntax?  The term totally misses the real innovation that's going on.

This innovation at a physical storage level has been happening for a number of years now.  Columnar storage on disk, from companies such as Vertica and ParAccel, was the first innovative concept to challenge traditional RDMS approaches in the mid-2000s.  Not forgetting Sybase IQ from the mid-1990s, which was, of course, column-oriented, but didn't catch the market as the analytic database vendors did later.  With cheaper memory and 64-bit addressing, the move is underway towards using main memory as the physical storage medium and disk as a fallback.  SAP HANA champions this approach at the high end, while various BI tools, such as QlikView and MicroStrategy hold the lower end.  And don't forget that the world's most unloved (by IT, at least) BI tool, Excel, has always been in-memory!

The other aspect of innovation relates to parallel processing.  Massively parallel processing (MPP) relational databases have been around for many years in the scientific arena and in commercial data warehousing from Teradata (1980s) and IBM DB2 Parallel Edition (1990s).  These powerful, if proprietary, platforms are usually forgotten (or ignored) when NoSQL vendors lament the inability of traditional RDBMSs to scale-out to multiple processors, blithely citing comparisons of their products to MySQL, probably more popular for its price than its technical prowess. Relational databases do indeed run across multiple processors, and must evolve to do so more easily and efficiently as increases in processing power are now coming mainly from increasing the number of cores in processors.  Which finally brings me back to NuoDB.

NuoDB takes a highly innovative, object-oriented, transaction/messaging-system approach to the underlying database processing, eliminating the concept of a single control process responsible for all aspects of database integrity and organization.  Invented by Jim Starkey, an éminence grise of the database industry, the approach is described as elastically scalable - cashing in on the cloud and big data.  It also touts emergent behavior, a concept central to the theory of complex systems.  Together with an in-memory model for data storage, NuoDB appears very well positioned to take advantage of the two key technological advances of recent years mentioned already:- extensive memory and multi-core processors.  And all of this behind a traditional SQL interface to maximize use of existing, widespread skills in the database industry.  What more could you ask?

However, it seems there's an added twist.  Apparently, SQL is just a personality the database presents; and is the focus of the initial release.  Morris also claims that NuoDB is able to behave as a document, object or graph database, personalities slated for later releases in 2013 and beyond.  Whether this emerges remains to be seen.  Interestingly, however, when saving to disk, NuoDB stores data in key-value format.

I'll be big data, NoSQL and NewSQL in speaking engagements in Europe in November: the IRM DW&BI Conference in London (5-7 Nov) and Big Data Deutschland in Frankfurt (20-21 Nov).  I look forward to meeting you there!


Posted October 25, 2012 9:45 AM
Permalink | No Comments |
I was the analyst on The Briefing Room this week with NuoDB's CEO, Barry Morris.  The product itself is extremely interesting, both in its concept and technology; and will formally launch in the next month or so after a long series of betas.  More about that a little later...

But first, I need to vent!  For some time now, I've been taking an interest in NoSQL because of its positioning in the Big Data space.  I've always had a real problem with the term - whether it means Not SQL or Not Only SQL - because defining anything by what it's not is logical nonsense.  Even the definitive nosql-database.org falls into the trap, listing 122+ examples from "Possibly the oldest NoSQL DB: Adabas" to the wonderfully-named StupidDB.  A simple glance at the list of categories used to classify the entries shows the issue: NoSQL is catch-all for a potentially endless list of products and tools.  Just because they don't use SQL as an access language is insufficient as a definition.

NewSQL Ecosystem.PNGBut, my irritation now extends to "NewSQL", a term I went a-Googling when I saw that NuoDB is sometimes put in this category.  This picture from Matthew Aslett of 451 Research's presentation was interesting if somewhat disappointing: another gathering of tools with a mixed and overlapping set of characteristics, most of which relate to their storage and underlying processing approaches, rather than anything new about SQL, which is, of course,  at heart a programming language.  So why invent the term NewSQL when the aim is to keep the same syntax?  The term totally misses the real innovation that's going on.

This innovation at a physical storage level has been happening for a number of years now.  Columnar storage on disk, from companies such as Vertica and ParAccel, was the first innovative concept to challenge traditional RDMS approaches in the mid-2000s.  Not forgetting Sybase IQ from the mid-1990s, which was, of course, column-oriented, but didn't catch the market as the analytic database vendors did later.  With cheaper memory and 64-bit addressing, the move is underway towards using main memory as the physical storage medium and disk as a fallback.  SAP HANA champions this approach at the high end, while various BI tools, such as QlikView and MicroStrategy hold the lower end.  And don't forget that the world's most unloved (by IT, at least) BI tool, Excel, has always been in-memory!

The other aspect of innovation relates to parallel processing.  Massively parallel processing (MPP) relational databases have been around for many years in the scientific arena and in commercial data warehousing from Teradata (1980s) and IBM DB2 Parallel Edition (1990s).  These powerful, if proprietary, platforms are usually forgotten (or ignored) when NoSQL vendors lament the inability of traditional RDBMSs to scale-out to multiple processors, blithely citing comparisons of their products to MySQL, probably more popular for its price than its technical prowess. Relational databases do indeed run across multiple processors, and must evolve to do so more easily and efficiently as increases in processing power are now coming mainly from increasing the number of cores in processors.  Which finally brings me back to NuoDB.

NuoDB takes a highly innovative, object-oriented, transaction/messaging-system approach to the underlying database processing, eliminating the concept of a single control process responsible for all aspects of database integrity and organization.  Invented by Jim Starkey, an éminence grise of the database industry, the approach is described as elastically scalable - cashing in on the cloud and big data.  It also touts emergent behavior, a concept central to the theory of complex systems.  Together with an in-memory model for data storage, NuoDB appears very well positioned to take advantage of the two key technological advances of recent years mentioned already:- extensive memory and multi-core processors.  And all of this behind a traditional SQL interface to maximize use of existing, widespread skills in the database industry.  What more could you ask?

However, it seems there's an added twist.  Apparently, SQL is just a personality the database presents; and is the focus of the initial release.  Morris also claims that NuoDB is able to behave as a document, object or graph database, personalities slated for later releases in 2013 and beyond.  Whether this emerges remains to be seen.  Interestingly, however, when saving to disk, NuoDB stores data in key-value format.

I'll be big data, NoSQL and NewSQL in speaking engagements in Europe in November: the IRM DW&BI Conference in London (5-7 Nov) and Big Data Deutschland in Frankfurt (20-21 Nov).  I look forward to meeting you there!


Posted October 25, 2012 9:45 AM
Permalink | No Comments |
Integrated information Platform.pngSlowly but surely, big data is becoming mainstream.  Of course, if you listened only to the hype from analysts and vendors, you might think this was already the case.  I suspect it's more like teenage sex, more talked about than actually happening.  But, seems like we're about to move into roaring twenties.

I had the pleasure to be invited as the external expert speaker at IBM's PureData launch in Boston this week.  In a theatrical, dry-ice moment, IBM rolled out one of their new PureData machines between the previously available PureFlex and PureApplication models.  However, for me, the launch carried a much more complex and, indeed, subtle message than "here's our new, bright and shiny hardware".  Rather, it played on a set of messages that is gradually moving big data from a specialized and largely standalone concept to an all-embracing, new ecosystem that includes all data and the multifarious ways business needs to use it.

Despite long-running laments to the contrary, IT has had it easy when it comes to data management and governance.  Before you flame me, please read at least the rest of this paragraph.  Since the earliest days of general-purpose business computing in the 1960s, we've worked with a highly modeled and carefully designed representation of reality.  Basically, we've taken the messy, incoherent record of what really happens in the real word and hammered it into relational (and previously popular hierarchical or network) databases.  To do so, we've worked with highly simplified models of the world.  These simplifications range from grossly wrong (all addresses must include a 5-digit zip-code--yes, there are still a few websites that enforce that rule) to obviously naive (multiple purchases by a customer correlate to high loyalty) as well as highly useful to managing and running a business (there exists a single version of the truth for all data).  The value of useful simplifications can be seen in the creation of elegant architectures that enable business and IT to converse constructively about how to built systems the business can use.  They also reduce the complexity of the data systems; one size fits all.  The danger lies in the longer-term rigidity such simplifications can cause.

The data warehouse architecture of the 1980s, to which I was a major contributor, of course, was based largely on the above single-version-of-the-truth simplification.  There's little doubt it has served us well.  But, big data and other trends are forcing us to look again at the underlying assumptions.  And find them lacking. IBM (and it's not alone in this) has recognized that there exists different business use patterns of data which lead to different technology sweet spots.  The fundamental precept is not new, of course.  The division of computing into operational, informational and collaborative is closely related.  The new news is that the usage patterns are non-exclusive and overlapping; and they need to co-exist in any business of reasonable size and complexity.  I can identify four major business patterns: (1) mainstream daily processing, (2) core business monitoring and reporting, (3) real-time operational excellence and (4) data-informed planning and prediction.  And there are surely more.  This week, IBM announced three differently configured models: (1) PureData System for Transactions, (2) for Analytics and (3) Operational Analytics, each based on existing business use patterns and implementation expertise.  Details can be found here.  I imagine we will see further models in the future.

All of this leads to a new architectural picture of the world of data--an integrated information platform, where we deliberately move form a layered paradigm to one of interconnected pillars of information, linked via integration, metadata and virtualization.  A more complete explanation can be found in my white paper, "The Big Data Zoo--Taming the Beasts:  The need for an integrated platform for enterprise information".  As always, feedback is very welcome--questions, compliments and criticisms.

Posted October 12, 2012 8:10 AM
Permalink | No Comments |
data 1s and 0s.jpgThe alacrity with which analysts, vendors, customers and even the popular press have jumped on the big data bandwagon over the past year or two has been little short of amazing.  Perhaps it was just boredom with ten years of the "relational is the answer; now, what's the question" refrain?  Or maybe the bottom line was an explosion of new business possibilities that emerged in different areas that all had one basic thing in common: a base of new data... as opposed to a new database?

I've commented on a number of occasions that the software technology on which big data is based is rather primitive.  After all, Hadoop and its associated zoo are little more than a framework and a set of software utilities to simplify writing and managing parallel-processing batch applications.  Compare this to the long-standing prevalence of real-time transaction processing in the database world, relational or otherwise.  NoSQL databases perhaps offer more novelty of thinking, especially where there has been innovation around the concept of key-value stores.  At some fundamental level, big data has been less about "volume, velocity and variety"--marketing terms in many ways--and more about simple economics.  The economics of cheap, commodity storage and processors combined with open sourcing of software development.

But, the big bandwagon has been rolling and many of us, myself included, have perhaps been too focused on the size and speed of the wagon and paid too little attention to the oxen pulling it.  Oxen?  Actually, I'm referring to the major web denizens, such as Google, Facebook and their ilk.  What alerted me was a recent Wired magazine article, "Google Spans Entire Planet With GPS-Powered Database" and a trail of links therein, particularly "Google's Dremel Makes Big Data Look Small".  Both articles, published in the two months, make fascinating reading, but the bottom line is that Google and, to some lesser extent, Facebook are upgrading their big data environments to be faster and more responsive.  Unsurprisingly, Google is moving from a batch-oriented paradigm to, wait for it, a database system that preserves update consistency.  Google has been on this journey for three years now and has been published research papers as far back as 2010.  Get ready for a new set of buzzwords: Dremel, Caffeine, Pregel and Spanner from Google and Prism from Facebook.

So what does this mean for the rest of us?  In the widespread adoption of the current version of big data technology, the driver has not been so much big data as the commoditization of processing power and computation that has emerged.  Database vendors have reacted by embracing Hadoop as a complementary data source or store to their engines.  The open sourcing of Dremel, if it happens, would signal, I believe, a much more significant change in the database market.  Readers familiar with "The Innovator's Dilemma" by Clayton Christensen, first published in 1997, will probably recognize that what would ensue as disruptive innovation, described as "innovation that creates a new market by applying a different set of values, which ultimately (and unexpectedly) overtakes an existing market".  To possibly overstretch the bandwagon analogy, it seems that the bandleader has switched horses; the parade is changing its route.

These developments add a whole new set of future considerations for vendors and implementers of big data solutions, and I'll be exploring them further in speaking engagements in Europe in November: the IRM DW&BI Conference in London (5-7 Nov) and Big Data Deutschland in Frankfurt (20-21 Nov).  I hope to meet at least a few of you there!

"Big wheel keep on turning / Proud Mary keep on burning / And we're rolling, rolling / Rolling on the river" Creedence Clearwater Revival, 1969

Posted October 2, 2012 3:54 AM
Permalink | No Comments |