Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation published by Addison-Wesley in 1997.

Over the past few years, Barry has extended his interest to cover the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT.

Barry has worked in the IT industry for more than 25 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

just-say-nosql.pngBusiness intelligence (BI) has long been associated with relational databases and the SQL language.  From the earliest days of data warehousing, the qualities of the relational model have been highly valued in the quest for data consistency and quality.  In addition, it was assumed that business users are comfortable with tables of information.  This has been proven true, especially by spreadsheets, much to IT's chagrin.  Tables are also the lingua franca of BI tools and simple Select / Where queries are familiar to many users.  But, whatever the rationale, the association of BI and SQL is deeply embedded in the minds of most practitioners. So, the question arises--what about NoSQL; how does this relate to BI?  Can it be of use in data warehousing?

Good questions.  But first, you need to know what flavor of NoSQL you're speaking about.  For brevity, I'll focus only on one of the five or so varieties: document-oriented data stores.  (If you are interested in the others, the bigger picture--and a trip to Rome--I propose my two-day seminar there on 11-12 June!)  As I discovered about a year ago in a fascinating conversation with Max Schireson, president of 10gen / MongoDB, in this context a document is neither about e-mail contents nor Word documents; it refers to a particular data structure where records consist of an arbitrary set of fields, each identified by a name and value pair, structured in JSON (JavaScript Object Notation) or similar language.  For more details, refer to my white paper.  So, let me release you from your suspense now.  Can this be of use in BI? The short answer is yes.  But to fully grasp the extent, I'd like to introduce you to two MongoDB customers and how they are easing into BI using NoSQL.

I spoke to David Chancogne, CTO of Traackr, a web business measuring the influence of people who blog, tweet and otherwise contribute to the impression the general public forms of brands, products and more on the web.  The goal is to assist marketers and advertizing agencies track and target such influencers more effectively.  Traackr has built a MongoDB database of the contents of blogs, tweets, etc. and gives its customers reports and analyses of the top influencers in their areas of interest.  Is this BI?  In its broadest sense, yes.  The scope is very specific and the queries pre-defined, but this is still BI at its most basic.  Did Chancogne think of it as BI?  Actually not, it's simply his business to provide analytics to his customers.  Probing a little deeper, I discovered that Traackr is continually trying to optimize its algorithm to rate influence.  They do this by extracting data from their database and playing with it in--wait for it---Excel!  More BI, but like many a start-up business before them, the choice of Excel was more through familiarity and ease-of-use.  Generic BI tools that run against a JSON data store, such as Pentaho's NoSQL solution, Nucleon Software's BI Studio, are beginning to appear that allow generic querying on the data without extracting it to Excel.

A conversation with Julian Browne led to further interesting insights.  Browne leads the implementation of Priority Moments (a location-aware customer loyalty program that offers discounts at affiliated retailers) at O2, the second-largest provider of mobile/cell phone services in the UK, with more than 20 million customers.  MongoDB was chosen as the platform for this service largely to deal with the complexity and variability of their product catalog.  The challenge is that there exists a bewildering variety of product sets that can be offered to different customers, and changes constantly at the whim of marketing.  The absence of a predefined schema, a key characteristic of document-oriented data stores, was a compelling argument for the technology choice.  But, what of BI?  Customer loyalty programs are prime BI territory, of course, and in this case tracking of uptake of offers is vital.  As with Traackr, initial BI was provided through hand-crafted Java programming, although there is growing interest in using the emerging BI tools.  Of more interest, however, is the experimental use of a specific feature of the database that allows a query to be left open and as records arrive in the database, they automatically appear in the result, which can be routed to a live HTML5 graph(1) giving real-time feedback to monitor program activity.

How would we summarize the situation regarding BI for document-oriented NoSQL databases?  What we see is a fairly recent database technology with its query facilities being used for basic, predefined BI.  As might be expected, more generic tooling for building queries is appearing.  The type of BI supported is focused, application-specific querying and reporting--the type associated with data marts in traditional BI.  This is exactly as we saw in the emergence of BI against relational databases.  Note that some of the querying is being performed against the live operational sources.  Again, we see the similarity with early reporting approaches with similar concerns about performance impacts on operations.  MongoDB addresses this through the creation of eventually consistent replicas.  Nonetheless, the demand for real-time BI continues to grow and certain classes of operational analytics will need such real-time or near real-time access.

Where NoSQL does not play a role in BI is also important.  Enterprise data warehouses (EDW), with their focus on creating consistent, integrated, historical stores of core business information are set to remain squarely in the relational database world.  But, where operational needs drive the choice of a NoSQL document-oriented data store, it is clear that BI can flourish in this environment too.  See my latest white paper, "Business Intelligence--NoSQL... No Problem", for further details.


(1)  For background on this approach, see hummingbird and data-driven documents.

Posted May 17, 2012 3:37 AM
Permalink | No Comments |
knives.jpgDonald Farmer, now of Qliktech, offered to the Boulder BI Brain Trust (BBBT) last week that what we in "BI" do is better described as decision support rather than business intelligence.  The comment was greeted by a flurry of Tweets and Grunts of agreement.  It's an observation I've also made, and for similar reasons.  In essence, BI tools support decision making; to attribute intelligence--business or otherwise--to software seems somewhat presumptuous.  And yet, there is a further problem with the term business intelligence.  It implies a level of rationality in decision making that is beyond the reality most of us encounter.  This implication is carried even further as various analysts and vendors begin to talk about business analytics as if it will be the ultimate solution to all business decision-making needs.

There are facts, we are told.  And if we have all the facts and we apply comprehensive analytics, we will discover the past, understand the present and predict the future.  We are told this is the scientific method; the truth is in the numbers.  Is this a valid way of interpreting the way the world works?  I would argue that it is so far from reality that we are in danger of creating a fantasy world worthy of Tolkien.

History, it is said, is named thus because it is "his story".  History, they say, is written by the victors.  The implications are far reaching.  Yes, there are indeed facts, but it's the stories we weave around the facts that are what really matter.  To quote Liz Greene(1): "Mehmet the Conqueror invaded Constantinople in 1453.  That is an historical fact.  But depending on which history book we read, Mehmet was either a redeemer or a cruel tyrant, a warrior for the True Faith or a vile heretic."  In terms of the story we tell ourselves about this incident and its value as guide for the future, which part of the quote is more relevant - the historical fact or its interpretation?  And if you haven't yet looked at the endnote for the source of this quote, do so now.  And be brutally honest with yourself.  Do your beliefs about astrology affect the weight you attach to the quote?  And when I tell you that Dr. Liz Greene is also a fully trained and qualified Jungian psychoanalyst, how does that cause you to re-evaluate your judgement.

Our current obsession with analytics is dangerous.  It's based a number of simplifications, misconceptions and downright errors.  It is a simplification that business is an entirely rational, fact-driven process.  It is a misconception that given sufficient data you can predict the future.  It is a downright error to assume that in the future, business can be entirely (or even largely) driven by business analytics.

Does that mean we should abandon analytics?  Of course not.  There are facts to gather that have so far remained undetected.  These facts can influence our interpretations.  If they are indeed relevant to the story at hand.  And if we allow them to do so.  And if our business users can avoid statistical errors such as confusing correspondence with causality.  There are many examples already of significant benefits to be gained for businesses who adopt analytics.

The questions of relevance and abuse of statistics are ones of good analytic practice and education of users.  I have no doubt that, as we move beyond the hype phase, these issues will be addressed.  The issue of interpretation is much more difficult to tackle.  Because it is at the heart of how we imagine our decision makers behave.  Our focus on intelligence--rational and logical--obscures two other keys aspects of decision making:  intent and intuition.  Intent we ignore and intuition we dismiss.  All decision making includes the intent of the decision maker.  That intention drives everything from what data is gathered, through how it is evaluated, all the way to the final choice of action.  How many decisions are post-justified by careful data selection and evaluation?  If a decision maker is motivated by personal gain (and they do exist, you know), won't analytics be enlisted to support that goal?  And regarding intuition, it is evident that not all decision contexts are wholly driven by measurable or predictable metrics.  Low prices may be important, but so too are ambience, history, ethos and personal relationships when customers choose where to shop.  Data measures for the latter are hard to define and capture.  The intuition of an experienced manager is needed in such circumstances.  Target's decision to focus marketing of maternity products to women in the early stages of pregnancy was based on sound analytics according to a story in the New York Times, but the reaction of prospective customers was intuitively obvious.

The bottom line is that we focus exclusively on big data and analytics at our peril.  We need to move beyond traditional concepts of business intelligence and decision support.  I see our goal as supporting full-spectrum business insight.

(1) Greene, L., "Apollo's Chariot - The Meaning of the Astrological Sun", CPA Press, (2001)


Posted May 2, 2012 6:56 AM
Permalink | No Comments |
knives.jpgDonald Farmer, now of Qliktech, offered to the Boulder BI Brain Trust (BBBT) last week that what we in "BI" do is better described as decision support rather than business intelligence.  The comment was greeted by a flurry of Tweets and Grunts of agreement.  It's an observation I've also made, and for similar reasons.  In essence, BI tools support decision making; to attribute intelligence--business or otherwise--to software seems somewhat presumptuous.  And yet, there is a further problem with the term business intelligence.  It implies a level of rationality in decision making that is beyond the reality most of us encounter.  This implication is carried even further as various analysts and vendors begin to talk about business analytics as if it will be the ultimate solution to all business decision-making needs.

There are facts, we are told.  And if we have all the facts and we apply comprehensive analytics, we will discover the past, understand the present and predict the future.  We are told this is the scientific method; the truth is in the numbers.  Is this a valid way of interpreting the way the world works?  I would argue that it is so far from reality that we are in danger of creating a fantasy world worthy of Tolkien.

History, it is said, is named thus because it is "his story".  History, they say, is written by the victors.  The implications are far reaching.  Yes, there are indeed facts, but it's the stories we weave around the facts that are what really matter.  To quote Liz Greene(1): "Mehmet the Conqueror invaded Constantinople in 1453.  That is an historical fact.  But depending on which history book we read, Mehmet was either a redeemer or a cruel tyrant, a warrior for the True Faith or a vile heretic."  In terms of the story we tell ourselves about this incident and its value as guide for the future, which part of the quote is more relevant - the historical fact or its interpretation?  And if you haven't yet looked at the endnote for the source of this quote, do so now.  And be brutally honest with yourself.  Do your beliefs about astrology affect the weight you attach to the quote?  And when I tell you that Dr. Liz Greene is also a fully trained and qualified Jungian psychoanalyst, how does that cause you to re-evaluate your judgement.

Our current obsession with analytics is dangerous.  It's based a number of simplifications, misconceptions and downright errors.  It is a simplification that business is an entirely rational, fact-driven process.  It is a misconception that given sufficient data you can predict the future.  It is a downright error to assume that in the future, business can be entirely (or even largely) driven by business analytics.

Does that mean we should abandon analytics?  Of course not.  There are facts to gather that have so far remained undetected.  These facts can influence our interpretations.  If they are indeed relevant to the story at hand.  And if we allow them to do so.  And if our business users can avoid statistical errors such as confusing correspondence with causality.  There are many examples already of significant benefits to be gained for businesses who adopt analytics.

The questions of relevance and abuse of statistics are ones of good analytic practice and education of users.  I have no doubt that, as we move beyond the hype phase, these issues will be addressed.  The issue of interpretation is much more difficult to tackle.  Because it is at the heart of how we imagine our decision makers behave.  Our focus on intelligence--rational and logical--obscures two other keys aspects of decision making:  intent and intuition.  Intent we ignore and intuition we dismiss.  All decision making includes the intent of the decision maker.  That intention drives everything from what data is gathered, through how it is evaluated, all the way to the final choice of action.  How many decisions are post-justified by careful data selection and evaluation?  If a decision maker is motivated by personal gain (and they do exist, you know), won't analytics be enlisted to support that goal?  And regarding intuition, it is evident that not all decision contexts are wholly driven by measurable or predictable metrics.  Low prices may be important, but so too are ambience, history, ethos and personal relationships when customers choose where to shop.  Data measures for the latter are hard to define and capture.  The intuition of an experienced manager is needed in such circumstances.  Target's decision to focus marketing of maternity products to women in the early stages of pregnancy was based on sound analytics according to a story in the New York Times, but the reaction of prospective customers was intuitively obvious.

The bottom line is that we focus exclusively on big data and analytics at our peril.  We need to move beyond traditional concepts of business intelligence and decision support.  I see our goal as supporting full-spectrum business insight.

(1) Greene, L., "Apollo's Chariot - The Meaning of the Astrological Sun", CPA Press, (2001)


Posted May 2, 2012 6:56 AM
Permalink | No Comments |
Big Rubbish PIle on Car.jpgAttending the Teradata Universe 2012 in Dublin, an impressive line-up of speakers from Tim Berners-Lee to customers doing real data warehouse implementations got me thinking beyond the normal boundaries about our assumptions about the real role and value of data - both traditional and big.  A few observations follow, but first...

As an ex-pat Irishman, I have to say that the new Convention Centre Dublin is a wonderful venue for events with up to a couple of thousand attendees.  The main auditorium is a superb space and there's lots of room for expo and breakouts.  And the facilities and staff are first rate.  Well done!  My only regret is that the area around the Centre, especially towards the Port, remains blighted by vacant sites and unfinished blocks - the legacy of Ireland's boom and bust - but not much can be done about that for now.

Much of the main tent focus at this year's event was on the future of information, with big data featuring... well... large in the presentations of speakers such as Erik Brynjolfsson, Professor and Director of the MIT Center for Digital Business and Sir Tim Berners-Lee, inventor of the World Wide Web.  Michio Kaku, Professor of Theoretical Physics at City College of New York, also addressed the theme of the central role of data in every aspect of our future.  The tone of these presentations is best described as expansive and optimistic - given better and more data and technology, the future of business and humankind in general is rosy.  This is an expectation that I, personally, believe to be of somewhat low probability.

While I am a long-time supporter of the need for and value of good and extensive information in business, my experience of the purposes for which such information is used and the extent to which decision making benefits is less sanguine.  In general business, business intelligence is used almost exclusively in support of a narrowly-focused drive for bottom-line profit.  At the risk of being labeled a Communist, I remain unconvinced that this is always a good thing.

This niggling doubt is best expressed through an example - the use of data warehousing in retail, something that has been going on for over 25 years.  BI can be very effective in optimizing the supply chain from manufacturer all the way to customer, supporting the intent of the business to reduce cost.  When that focus is pursued as a sole strategy, it can have highly undesirable effects, through driving local suppliers out of business, reducing a community's disposable income and creating an unbreakable downward economic spiral.  As a BI community we can say that BI is not responsible, and on the level of cause and effect, that's true.  But, at a deeper level, we cannot ignore the side effects of the tools and techniques we invent and promote, any more than cigarette manufacturers can avoid responsibility for the impact of passive smoking.

Getting back to big data, the problem is that as we focus on, and get excited about, a technique such as statistical analysis of social behavior to predict marketing trends for a brand, for example, we simultaneously narrow our focus on potentially interesting or important information that is external to that data.  Big data encourages us to somewhat obsessively analyze in ever greater depth the minutiae of life.  Why?  Often to drive profit for some business.  The optimistic view I mentioned earlier imagines that we will use this data to solve medical issues, world hunger, climate change, and more.  I don't have data to confirm this, but I guess that the proportion of profit-driven big data analytics vs. altruistic is greater than 10 to 1.  And which of these two categories of information have the highest impact on the medium- and long-term survival of humanity?  The last speaker, Deb Roy, CEO of Bluefin Labs, showed us just how much analysis can be done to link social network activity to TV shows and advertising.  All to decide where to spend millions of advertizing dollars.  There two ways of looking at this: (1) everybody needs to do this type of processing in order to compete, or (2) we need to examine our underlying model of doing business that drives such net-non-productive activity.  I would invite you to share your views on this.

At a more mundane and practical level, speakers from current Teradata customers focused in a very different area - creating consistent and integrated enterprise data warehouses for very traditional transaction business data.  Unsurprisingly, the majority of enterprises are still struggling with the old issues that drove data warehouse development for the past 30 years.  I have no doubt that this will continue for most businesses for many more years.  

But, while this continues, we need to start thinking about the more philosophical issues that the conference brought up for me.



Posted April 25, 2012 4:18 AM
Permalink | No Comments |
Big Rubbish PIle on Car.jpgAttending the Teradata Universe 2012 in Dublin, an impressive line-up of speakers from Tim Berners-Lee to customers doing real data warehouse implementations got me thinking beyond the normal boundaries about our assumptions about the real role and value of data - both traditional and big.  A few observations follow, but first...

As an ex-pat Irishman, I have to say that the new Convention Centre Dublin is a wonderful venue for events with up to a couple of thousand attendees.  The main auditorium is a superb space and there's lots of room for expo and breakouts.  And the facilities and staff are first rate.  Well done!  My only regret is that the area around the Centre, especially towards the Port, remains blighted by vacant sites and unfinished blocks - the legacy of Ireland's boom and bust - but not much can be done about that for now.

Much of the main tent focus at this year's event was on the future of information, with big data featuring... well... large in the presentations of speakers such as Erik Brynjolfsson, Professor and Director of the MIT Center for Digital Business and Sir Tim Berners-Lee, inventor of the World Wide Web.  Michio Kaku, Professor of Theoretical Physics at City College of New York, also addressed the theme of the central role of data in every aspect of our future.  The tone of these presentations is best described as expansive and optimistic - given better and more data and technology, the future of business and humankind in general is rosy.  This is an expectation that I, personally, believe to be of somewhat low probability.

While I am a long-time supporter of the need for and value of good and extensive information in business, my experience of the purposes for which such information is used and the extent to which decision making benefits is less sanguine.  In general business, business intelligence is used almost exclusively in support of a narrowly-focused drive for bottom-line profit.  At the risk of being labeled a Communist, I remain unconvinced that this is always a good thing.

This niggling doubt is best expressed through an example - the use of data warehousing in retail, something that has been going on for over 25 years.  BI can be very effective in optimizing the supply chain from manufacturer all the way to customer, supporting the intent of the business to reduce cost.  When that focus is pursued as a sole strategy, it can have highly undesirable effects, through driving local suppliers out of business, reducing a community's disposable income and creating an unbreakable downward economic spiral.  As a BI community we can say that BI is not responsible, and on the level of cause and effect, that's true.  But, at a deeper level, we cannot ignore the side effects of the tools and techniques we invent and promote, any more than cigarette manufacturers can avoid responsibility for the impact of passive smoking.

Getting back to big data, the problem is that as we focus on, and get excited about, a technique such as statistical analysis of social behavior to predict marketing trends for a brand, for example, we simultaneously narrow our focus on potentially interesting or important information that is external to that data.  Big data encourages us to somewhat obsessively analyze in ever greater depth the minutiae of life.  Why?  Often to drive profit for some business.  The optimistic view I mentioned earlier imagines that we will use this data to solve medical issues, world hunger, climate change, and more.  I don't have data to confirm this, but I guess that the proportion of profit-driven big data analytics vs. altruistic is greater than 10 to 1.  And which of these two categories of information have the highest impact on the medium- and long-term survival of humanity?  The last speaker, Deb Roy, CEO of Bluefin Labs, showed us just how much analysis can be done to link social network activity to TV shows and advertising.  All to decide where to spend millions of advertizing dollars.  There two ways of looking at this: (1) everybody needs to do this type of processing in order to compete, or (2) we need to examine our underlying model of doing business that drives such net-non-productive activity.  I would invite you to share your views on this.

At a more mundane and practical level, speakers from current Teradata customers focused in a very different area - creating consistent and integrated enterprise data warehouses for very traditional transaction business data.  Unsurprisingly, the majority of enterprises are still struggling with the old issues that drove data warehouse development for the past 30 years.  I have no doubt that this will continue for most businesses for many more years.  

But, while this continues, we need to start thinking about the more philosophical issues that the conference brought up for me.



Posted April 25, 2012 4:18 AM
Permalink | No Comments |
PREV 1 2