We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

Recently in Big data Category

MagnifyingGlassDataSpider.jpgAs we begin a new year, we are promised a move from a focus on the meaning and technology of big data to the useful and worthwhile business applications it may offer.  A timely move indeed.  Hopefully, we'll begin to hear less about analyzing Twitter streams to optimize advertising spend and more about applications with the potential to improve people's lives or the environment.  And even more hopefully, people may begin to consider the risks they run when revealing or gathering personal data on our deeply interconnected Web.

With all of the synchronicity that is the Internet, I came across two articles from the New York Times published in last week. The first, by Peter Jaret on January 14, describes how patient records, transcribed and digitized from scrawled (why do they write so poorly?) doctors' notes, anonymized and stored on the Web, can be statistically mined to discover previously unknown side-effects of and interactions between prescribed drugs.  Clearly useful and valuable work.  The second article, three days later by Gina Kolata, revealed how easily a genetics researcher was able to identify five individuals and their extended families by combining publicly-available information from the anonymized 1000 Genome Project database, a commercial genealogy Web site and Google.  Kolata quotes Amy L. McGuire, a lawyer and ethicist at Baylor College of Medicine in Houston:  "To have the illusion you can fully protect privacy or make data anonymous is no longer a sustainable position".  The underlying genetic data is used in medical research to good effect, of course, but what are the possible consequences for those individuals thus identified as insurance companies, governments or other interested parties make potentially negative assessments based on their once private genomes?

Such occurrences--and there are many of them--should be deeply disturbing to those of us involved in the business of big data and analytics.  Here are doctors, scientists and lawyers--with training in logic, ethics and law--who see the power of analytics to improve the human condition, but who seem to gloss over the wider privacy and security implications of making personal information widely available on the Web.  After all, the limits of data anonymization on the web were being discussed openly as long ago as May 2011 by Pete Warden on the O'Reilly Radar blog.  And as far back as 1997, Prof. Latanya Sweeney, now Director of the Data Privacy Lab at Harvard, could show that the combination of gender, ZIP code and birthdate was unique for 87% of the U.S. population.  

Eben Moglen, professor of law and legal history at Columbia University and Chairman of the Software Freedom Law Center, warned at re:publica Berlin in May 2012 that "media that spies on and data-mines the public is destroying freedom of thought and only this generation, the last to grow up remembering the 'old way', is positioned to save this, humanity's most precious freedom".  With media and medicine, government and retail, telecommunications and finance all gathering hoards of information about us, each for their own allegedly good purpose, the reality is now that the abuse of big data (as opposed to its use) is not only possible, but proceeding apace, even in largely democratic,Western states.

So, given that big data anonymity is "no longer a sustainable position", it should be clear that the analytics possible on today's high-powered computers is a double-edged sword; it serves us poorly to focus only on one, single, razor-sharp edge.  As we evaluate and build useful and worthwhile business analytics applications of this coming year, let us step back even occasionally to contemplate whether the profits to be earned or the discoveries to be made are worth the price of human freedom.

Posted January 23, 2013 9:06 AM
Permalink | No Comments |
Melting-ice-polar-bear Big Data.jpgAs you may be aware, the world (or civilization, at least) is due to end in a couple of weeks, as the Mayan calendar counts to the last day of this "Sun".  For those of you living beneath a stone, the date / time is 21st December at sunset in the Yucatan... depending on whom you choose to believe.

Big data, conversely, has been heralded by some as the harbinger of a bright, shiny and new world where all things will be will be possible using the vast quantities of data that are becoming available on the Internet.  Many contend that the transformation has already begun.  We will discuss the more mundane truth of how business is using big data in a joint EMA / 9sight webinar "Big Data Comes of Age" on Thursday, 13th December, 11am PST / 2pm EST / 7pm GMT.

The truth is somewhere in between... as always.  And as year-end approaches, it might be a good time to ponder just where big data is leading us as people and as a society.

There's little doubt that big data--in all its meanings and incarnations--is effecting major changes in advertising and marketing.  Much of what we see in this area is about increasing the efficiency of targeting and conversion.  As Google tracks our searches, Facebook and Twitter our shared opinions, mobile Apps our movements and sellers our purchases, the message we hear is that businesses want to understand us and our needs more clearly, serve us better and ensure that we are increasingly delighted.  However, the reality in the vast majority of cases is that businesses are driven simply by the financial bottom line, on a quarterly or even monthly basis as BI reports are produced and earnings statements released.  Unfortunately, in my opinion, big data is most widely used as the next spin of the "sell more at higher profit" story, or to put it bluntly, driving consumption.

And yet, the other areas of application of big data offer insights about some of the biggest challenges to humanity, such as climate change, energy efficiency, health quality, economic and financial management, and more.  The increasing quantities of data being gathered or available for collection and analysis in all of these areas offer us the opportunity to make a real difference in the lives of humanity, and to avert the catastrophes of which most scientists and philosophers already warn.  That even one of the most data-driven of companies, PricewaterhouseCoopers, warns of impending global catastrophe due to the increased rate and scale of warming--as much as 6 ï¿½C--is surely a sign that the writing is on the wall.  To quote their report: "The only way to avoid the pessimistic scenarios will be radical transformations in the ways the global economy currently functions: rapid uptake of renewable energy, sharp falls in fossil fuel use or massive deployment of carbon capture and storage, removal of industrial emissions and halting deforestation... business-as-usual is not an option."

The PwC report does not, unfortunately, make the explicit link between ever increasing consumption of energy and raw materials on which the global economy currently functions and the seeming impossibility of reducing carbon emissions at the required rate to avert the worst possible scenarios.  But big data analysis across both sides of this simple equation could show how to tackle the problem. Big data is about bringing data from widely disparate areas together and discovering new possibilities.  How to consume less but improve living standards.  How to prevent the type of financial behavior that paralyzes international economies.  Of course, all this assumes the business and political will to do so.

Those of us who understand big data technology and promote its use must surely begin to advocate the more responsible and sustainable uses of this powerful technology.  A New Year resolution, perhaps...?

Posted December 11, 2012 5:12 AM
Permalink | No Comments |
Big Data RIP tombstone.jpg2012 begins to wind down.  Yes, I know it's still only mid-November, but I find it hard to avoid thinking of year-end when the retail industry has been pushing Christmas for weeks already.  I've been preparing for my keynote at Big Data Deutschland in Frankfurt (20-21 Nov) next week, so it seemed appropriate to share some thinking on where big data is at now.  Also, I've been deeply involved in analyzing the results of the EMA / 9sight big data survey which has just been published.  My bottom line?  Big data is dead!

Of course, I don't mean that literally.  What I'm really trying to do is to get the attention of the marketing folks who have been using and abusing the term, particularly during 2012.  Two very clear results emerge from the big data survey when it comes to real customer projects carrying the moniker big data.  

First, the industry has been besotted by size.  Carefully avoiding now all vaguely salacious phrases, the fact is that size is so relative that calling data big or small is more about bragging or shaming than any measure of real use.  Our survey showed that 60% of respondents were managing less than 100TB of data in total in their organizations, while only 5% stretched beyond a petabyte.   Not all of this data was part of their big data projects; on average, only some 30% was included there.  This strongly suggests that so called big data technology is being widely used for something other than processing excessively large data volumes.

Second, it's not all about exotic types of data either.  Yes, some 45% of the data sources fall under the category of human-sourced information, which includes social media sources.  But, just over 30% is process-mediated data -- transactional data gathered and created in traditional operational and informational applications.  For a more detailed explanation of these data domains, as I call them, please see my recent White Paper "The Big Data Zoo - Taming the Beasts, The need for an integrated platform for enterprise information".  So, big data projects are addressing a substantial proportion of the data we've known and loved for many years.

You can hear more of the survey results on the EMA / 9sight webinar on Thursday, 13 December, 11 a.m. PST / 2 p.m. EST.

What is actually becoming important as we look towards 2013 is what businesses are really doing with data at the moment that is different from what they've traditionally done.  I believe there are two distinct trends.  One is, of course, business analytics.  This is simply an evolution of traditional BI, with more of an emphasis on exploration (or mining) and less on reporting and dashboards.  The second is more interesting and, potentially, game changing.  This involves the re-integration of operational action taking and informational decision making in customer-facing applications that automatically modify their behavior in real-time in response to rapidly changing market or personal circumstances.

All this says to me that big data as a technological category is becoming an increasingly meaningless name.  Big data is essentially all data.  Is there any chance that the marketing folks can hear me?

Posted November 13, 2012 11:11 AM
Permalink | No Comments |
I was the analyst on The Briefing Room this week with NuoDB's CEO, Barry Morris.  The product itself is extremely interesting, both in its concept and technology; and will formally launch in the next month or so after a long series of betas.  More about that a little later...

But first, I need to vent!  For some time now, I've been taking an interest in NoSQL because of its positioning in the Big Data space.  I've always had a real problem with the term - whether it means Not SQL or Not Only SQL - because defining anything by what it's not is logical nonsense.  Even the definitive nosql-database.org falls into the trap, listing 122+ examples from "Possibly the oldest NoSQL DB: Adabas" to the wonderfully-named StupidDB.  A simple glance at the list of categories used to classify the entries shows the issue: NoSQL is catch-all for a potentially endless list of products and tools.  Just because they don't use SQL as an access language is insufficient as a definition.

NewSQL Ecosystem.PNGBut, my irritation now extends to "NewSQL", a term I went a-Googling when I saw that NuoDB is sometimes put in this category.  This picture from Matthew Aslett of 451 Research's presentation was interesting if somewhat disappointing: another gathering of tools with a mixed and overlapping set of characteristics, most of which relate to their storage and underlying processing approaches, rather than anything new about SQL, which is, of course,  at heart a programming language.  So why invent the term NewSQL when the aim is to keep the same syntax?  The term totally misses the real innovation that's going on.

This innovation at a physical storage level has been happening for a number of years now.  Columnar storage on disk, from companies such as Vertica and ParAccel, was the first innovative concept to challenge traditional RDMS approaches in the mid-2000s.  Not forgetting Sybase IQ from the mid-1990s, which was, of course, column-oriented, but didn't catch the market as the analytic database vendors did later.  With cheaper memory and 64-bit addressing, the move is underway towards using main memory as the physical storage medium and disk as a fallback.  SAP HANA champions this approach at the high end, while various BI tools, such as QlikView and MicroStrategy hold the lower end.  And don't forget that the world's most unloved (by IT, at least) BI tool, Excel, has always been in-memory!

The other aspect of innovation relates to parallel processing.  Massively parallel processing (MPP) relational databases have been around for many years in the scientific arena and in commercial data warehousing from Teradata (1980s) and IBM DB2 Parallel Edition (1990s).  These powerful, if proprietary, platforms are usually forgotten (or ignored) when NoSQL vendors lament the inability of traditional RDBMSs to scale-out to multiple processors, blithely citing comparisons of their products to MySQL, probably more popular for its price than its technical prowess. Relational databases do indeed run across multiple processors, and must evolve to do so more easily and efficiently as increases in processing power are now coming mainly from increasing the number of cores in processors.  Which finally brings me back to NuoDB.

NuoDB takes a highly innovative, object-oriented, transaction/messaging-system approach to the underlying database processing, eliminating the concept of a single control process responsible for all aspects of database integrity and organization.  Invented by Jim Starkey, an éminence grise of the database industry, the approach is described as elastically scalable - cashing in on the cloud and big data.  It also touts emergent behavior, a concept central to the theory of complex systems.  Together with an in-memory model for data storage, NuoDB appears very well positioned to take advantage of the two key technological advances of recent years mentioned already:- extensive memory and multi-core processors.  And all of this behind a traditional SQL interface to maximize use of existing, widespread skills in the database industry.  What more could you ask?

However, it seems there's an added twist.  Apparently, SQL is just a personality the database presents; and is the focus of the initial release.  Morris also claims that NuoDB is able to behave as a document, object or graph database, personalities slated for later releases in 2013 and beyond.  Whether this emerges remains to be seen.  Interestingly, however, when saving to disk, NuoDB stores data in key-value format.

I'll be big data, NoSQL and NewSQL in speaking engagements in Europe in November: the IRM DW&BI Conference in London (5-7 Nov) and Big Data Deutschland in Frankfurt (20-21 Nov).  I look forward to meeting you there!

Posted October 25, 2012 9:45 AM
Permalink | No Comments |
Integrated information Platform.pngSlowly but surely, big data is becoming mainstream.  Of course, if you listened only to the hype from analysts and vendors, you might think this was already the case.  I suspect it's more like teenage sex, more talked about than actually happening.  But, seems like we're about to move into roaring twenties.

I had the pleasure to be invited as the external expert speaker at IBM's PureData launch in Boston this week.  In a theatrical, dry-ice moment, IBM rolled out one of their new PureData machines between the previously available PureFlex and PureApplication models.  However, for me, the launch carried a much more complex and, indeed, subtle message than "here's our new, bright and shiny hardware".  Rather, it played on a set of messages that is gradually moving big data from a specialized and largely standalone concept to an all-embracing, new ecosystem that includes all data and the multifarious ways business needs to use it.

Despite long-running laments to the contrary, IT has had it easy when it comes to data management and governance.  Before you flame me, please read at least the rest of this paragraph.  Since the earliest days of general-purpose business computing in the 1960s, we've worked with a highly modeled and carefully designed representation of reality.  Basically, we've taken the messy, incoherent record of what really happens in the real word and hammered it into relational (and previously popular hierarchical or network) databases.  To do so, we've worked with highly simplified models of the world.  These simplifications range from grossly wrong (all addresses must include a 5-digit zip-code--yes, there are still a few websites that enforce that rule) to obviously naive (multiple purchases by a customer correlate to high loyalty) as well as highly useful to managing and running a business (there exists a single version of the truth for all data).  The value of useful simplifications can be seen in the creation of elegant architectures that enable business and IT to converse constructively about how to built systems the business can use.  They also reduce the complexity of the data systems; one size fits all.  The danger lies in the longer-term rigidity such simplifications can cause.

The data warehouse architecture of the 1980s, to which I was a major contributor, of course, was based largely on the above single-version-of-the-truth simplification.  There's little doubt it has served us well.  But, big data and other trends are forcing us to look again at the underlying assumptions.  And find them lacking. IBM (and it's not alone in this) has recognized that there exists different business use patterns of data which lead to different technology sweet spots.  The fundamental precept is not new, of course.  The division of computing into operational, informational and collaborative is closely related.  The new news is that the usage patterns are non-exclusive and overlapping; and they need to co-exist in any business of reasonable size and complexity.  I can identify four major business patterns: (1) mainstream daily processing, (2) core business monitoring and reporting, (3) real-time operational excellence and (4) data-informed planning and prediction.  And there are surely more.  This week, IBM announced three differently configured models: (1) PureData System for Transactions, (2) for Analytics and (3) Operational Analytics, each based on existing business use patterns and implementation expertise.  Details can be found here.  I imagine we will see further models in the future.

All of this leads to a new architectural picture of the world of data--an integrated information platform, where we deliberately move form a layered paradigm to one of interconnected pillars of information, linked via integration, metadata and virtualization.  A more complete explanation can be found in my white paper, "The Big Data Zoo--Taming the Beasts:  The need for an integrated platform for enterprise information".  As always, feedback is very welcome--questions, compliments and criticisms.

Posted October 12, 2012 8:10 AM
Permalink | No Comments |


Search this blog
Categories ›
Archives ›
Recent Entries ›