Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation published by Addison-Wesley in 1997.

Over the past few years, Barry has extended his interest to cover the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT.

Barry has worked in the IT industry for more than 25 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

Recently in Big data Category

knives.jpgDonald Farmer, now of Qliktech, offered to the Boulder BI Brain Trust (BBBT) last week that what we in "BI" do is better described as decision support rather than business intelligence.  The comment was greeted by a flurry of Tweets and Grunts of agreement.  It's an observation I've also made, and for similar reasons.  In essence, BI tools support decision making; to attribute intelligence--business or otherwise--to software seems somewhat presumptuous.  And yet, there is a further problem with the term business intelligence.  It implies a level of rationality in decision making that is beyond the reality most of us encounter.  This implication is carried even further as various analysts and vendors begin to talk about business analytics as if it will be the ultimate solution to all business decision-making needs.

There are facts, we are told.  And if we have all the facts and we apply comprehensive analytics, we will discover the past, understand the present and predict the future.  We are told this is the scientific method; the truth is in the numbers.  Is this a valid way of interpreting the way the world works?  I would argue that it is so far from reality that we are in danger of creating a fantasy world worthy of Tolkien.

History, it is said, is named thus because it is "his story".  History, they say, is written by the victors.  The implications are far reaching.  Yes, there are indeed facts, but it's the stories we weave around the facts that are what really matter.  To quote Liz Greene(1): "Mehmet the Conqueror invaded Constantinople in 1453.  That is an historical fact.  But depending on which history book we read, Mehmet was either a redeemer or a cruel tyrant, a warrior for the True Faith or a vile heretic."  In terms of the story we tell ourselves about this incident and its value as guide for the future, which part of the quote is more relevant - the historical fact or its interpretation?  And if you haven't yet looked at the endnote for the source of this quote, do so now.  And be brutally honest with yourself.  Do your beliefs about astrology affect the weight you attach to the quote?  And when I tell you that Dr. Liz Greene is also a fully trained and qualified Jungian psychoanalyst, how does that cause you to re-evaluate your judgement.

Our current obsession with analytics is dangerous.  It's based a number of simplifications, misconceptions and downright errors.  It is a simplification that business is an entirely rational, fact-driven process.  It is a misconception that given sufficient data you can predict the future.  It is a downright error to assume that in the future, business can be entirely (or even largely) driven by business analytics.

Does that mean we should abandon analytics?  Of course not.  There are facts to gather that have so far remained undetected.  These facts can influence our interpretations.  If they are indeed relevant to the story at hand.  And if we allow them to do so.  And if our business users can avoid statistical errors such as confusing correspondence with causality.  There are many examples already of significant benefits to be gained for businesses who adopt analytics.

The questions of relevance and abuse of statistics are ones of good analytic practice and education of users.  I have no doubt that, as we move beyond the hype phase, these issues will be addressed.  The issue of interpretation is much more difficult to tackle.  Because it is at the heart of how we imagine our decision makers behave.  Our focus on intelligence--rational and logical--obscures two other keys aspects of decision making:  intent and intuition.  Intent we ignore and intuition we dismiss.  All decision making includes the intent of the decision maker.  That intention drives everything from what data is gathered, through how it is evaluated, all the way to the final choice of action.  How many decisions are post-justified by careful data selection and evaluation?  If a decision maker is motivated by personal gain (and they do exist, you know), won't analytics be enlisted to support that goal?  And regarding intuition, it is evident that not all decision contexts are wholly driven by measurable or predictable metrics.  Low prices may be important, but so too are ambience, history, ethos and personal relationships when customers choose where to shop.  Data measures for the latter are hard to define and capture.  The intuition of an experienced manager is needed in such circumstances.  Target's decision to focus marketing of maternity products to women in the early stages of pregnancy was based on sound analytics according to a story in the New York Times, but the reaction of prospective customers was intuitively obvious.

The bottom line is that we focus exclusively on big data and analytics at our peril.  We need to move beyond traditional concepts of business intelligence and decision support.  I see our goal as supporting full-spectrum business insight.

(1) Greene, L., "Apollo's Chariot - The Meaning of the Astrological Sun", CPA Press, (2001)


Posted May 2, 2012 6:56 AM
Permalink | No Comments |
Big Rubbish PIle on Car.jpgAttending the Teradata Universe 2012 in Dublin, an impressive line-up of speakers from Tim Berners-Lee to customers doing real data warehouse implementations got me thinking beyond the normal boundaries about our assumptions about the real role and value of data - both traditional and big.  A few observations follow, but first...

As an ex-pat Irishman, I have to say that the new Convention Centre Dublin is a wonderful venue for events with up to a couple of thousand attendees.  The main auditorium is a superb space and there's lots of room for expo and breakouts.  And the facilities and staff are first rate.  Well done!  My only regret is that the area around the Centre, especially towards the Port, remains blighted by vacant sites and unfinished blocks - the legacy of Ireland's boom and bust - but not much can be done about that for now.

Much of the main tent focus at this year's event was on the future of information, with big data featuring... well... large in the presentations of speakers such as Erik Brynjolfsson, Professor and Director of the MIT Center for Digital Business and Sir Tim Berners-Lee, inventor of the World Wide Web.  Michio Kaku, Professor of Theoretical Physics at City College of New York, also addressed the theme of the central role of data in every aspect of our future.  The tone of these presentations is best described as expansive and optimistic - given better and more data and technology, the future of business and humankind in general is rosy.  This is an expectation that I, personally, believe to be of somewhat low probability.

While I am a long-time supporter of the need for and value of good and extensive information in business, my experience of the purposes for which such information is used and the extent to which decision making benefits is less sanguine.  In general business, business intelligence is used almost exclusively in support of a narrowly-focused drive for bottom-line profit.  At the risk of being labeled a Communist, I remain unconvinced that this is always a good thing.

This niggling doubt is best expressed through an example - the use of data warehousing in retail, something that has been going on for over 25 years.  BI can be very effective in optimizing the supply chain from manufacturer all the way to customer, supporting the intent of the business to reduce cost.  When that focus is pursued as a sole strategy, it can have highly undesirable effects, through driving local suppliers out of business, reducing a community's disposable income and creating an unbreakable downward economic spiral.  As a BI community we can say that BI is not responsible, and on the level of cause and effect, that's true.  But, at a deeper level, we cannot ignore the side effects of the tools and techniques we invent and promote, any more than cigarette manufacturers can avoid responsibility for the impact of passive smoking.

Getting back to big data, the problem is that as we focus on, and get excited about, a technique such as statistical analysis of social behavior to predict marketing trends for a brand, for example, we simultaneously narrow our focus on potentially interesting or important information that is external to that data.  Big data encourages us to somewhat obsessively analyze in ever greater depth the minutiae of life.  Why?  Often to drive profit for some business.  The optimistic view I mentioned earlier imagines that we will use this data to solve medical issues, world hunger, climate change, and more.  I don't have data to confirm this, but I guess that the proportion of profit-driven big data analytics vs. altruistic is greater than 10 to 1.  And which of these two categories of information have the highest impact on the medium- and long-term survival of humanity?  The last speaker, Deb Roy, CEO of Bluefin Labs, showed us just how much analysis can be done to link social network activity to TV shows and advertising.  All to decide where to spend millions of advertizing dollars.  There two ways of looking at this: (1) everybody needs to do this type of processing in order to compete, or (2) we need to examine our underlying model of doing business that drives such net-non-productive activity.  I would invite you to share your views on this.

At a more mundane and practical level, speakers from current Teradata customers focused in a very different area - creating consistent and integrated enterprise data warehouses for very traditional transaction business data.  Unsurprisingly, the majority of enterprises are still struggling with the old issues that drove data warehouse development for the past 30 years.  I have no doubt that this will continue for most businesses for many more years.  

But, while this continues, we need to start thinking about the more philosophical issues that the conference brought up for me.



Posted April 25, 2012 4:18 AM
Permalink | No Comments |
1984first.png"Social networks already know who you know", "recommendation engines get much smarter", "early detection mitigates catastrophes".  Three of ten ways big data is creating the science fiction future.  These types of headlines appeal to the geek optimists in many of us.  We think that mitigating a catastrophe is certainly a good thing.  That smarter recommendations to whom we should connect and what we might be interested in buying could probably save us time, that most precious of commodities.  Most of us have grown up with a belief system that science and, by extension, technology and computers, are a sine qua non in today's world.  In truth, the world we live in today could not exist without them.

But, at what cost?

Three further headlines from the same blog: "surveillance gets really Orwellian", "doctors can make sense of your genome--and so can insurers", "dating sites that can predict when you're lying".  Perhaps these items give pause for thought.  Security cameras lurk in every corridor and public place.  And, as of last August, the NYPD has been monitoring Facebook and Twitter.  Even in our bedrooms, smart phones can be turned on remotely to monitor our most intimate indiscretions.  It's open season on our actions and communications.  Our genomes are fast becoming public property, ostensibly for our better health management; but, clearly, for better risk management--read profit--for insurance companies.  Even our thinking is being analyzed.

We're fast reaching 1984 some 30 years later than George Orwell imagined.  At least in our ability to monitor the actions, communications, genetic makeup and thoughts of an ever-increasing swathe of humanity.  As BI experts and data scientists, we celebrate our ability to gather and analyze ever more data with ever more sophistication and effort deeper granularity.  For marketeers, Utopia is a segment of one whose buying behavior is predictable with certainty.  As traders on the commodities or currency markets, our algorithms gamble on the Brownian motion of microscopic movements in prices.  For insurers, statistical averaging of risk across populations gives way to cherry picking the low-risk individuals for discounted premiums.

Am I overly pessimistic or even paranoid in imagining that big data brings risks at least as large as the benefits it promises?  Are the petabytes and exabytes of information we're gathering, storing and analyzing open to misuse?  We celebrate the role of social networking in pro-democracy movements around the world imagining that tweets and texts that are unassailable weapons for freedom, forgetting that the networks that carry them are run by big businesses whose bottom line is profit.  We reveal the secrets of our lives in dribs and drabs, in recordable phone conversations and even through the GPS tracking of our smart phones, oblivious that the technology exists to meet all the clues together, Sherlock Holmes-like, given sufficient time and money.

In my last post, I challenged us to take a step back and apply human insight to the results of big data analysis rather than take the results from statistical analyses at face value, to question the sources and play with other possible explanations before jumping to conclusions.  Now, knowing how fallible your own interpretation of big data may be, please give some consideration to the possibility that others, particularly those in positions of power, such as governments and businesses, can accidentally or deliberately misinterpret or misuse the big data resource.

But what can we do as an industry?  As individual analysts, consultants, data administrators and more?  At the very least, we can revisit the privacy and security controls we build into our systems.  Take a look at "Why you can't really anonymize your data" by Pete Warden and begin pressing the industry and academia to search for new solutions.  Look again at your business processes and evaluate if and how the use of big data subverts the intentions or ethics of how you work.  And, finally, reread George Orwell's "1984".


Posted February 17, 2012 2:47 AM
Permalink | 5 Comments |
4831625_s.jpgNow, I may be accused of getting up on my soap box in this first post of 2012, but... a few recent articles on the topic of big data / predictive analytics have really got me thinking.  Well, worrying, to be more precise.  My worry is that there seems to be a growing belief in the somehow magical properties of big data and a corresponding deification of those on the leading edge of working with big data and predictive analytics.  What's going on?

The first article I came across was "So, What's Your Algorithm?" by Dennis Berman in the Wall Street Journal.  He wrote on January 4th, "We are ruined by our own biases. When making decisions, we see what we want, ignore probabilities, and minimize risks that uproot our hopes.  What's worse, 'we are often confident even when we are wrong,' writes Daniel Kahneman, in his masterful new book on psychology and economics called 'Thinking, Fast and Slow.'  An objective observer, he writes, 'is more likely to detect our errors than we are.'"

I've read no more than the first couple of chapters of Kahneman's book (courtesy of Amazon Kindle samples), so I don't know what he concludes as a solution to the problem posed above--that we are deceived by our own inner brain processes.  However, my intuitive reaction to Berman's solution was visceral: how can he possibly suggest that the objective observer advocated by Kahneman could be provided by analytics over big data sets?  In truth, the error Berman makes is blatantly obvious in the title of the article... it always is somebody's algorithm.

The point is not that analytics and big data are useless.  Far from it.  They can most certainly detect far more subtle patterns in far larger and statistically more significant data sets than most or even all human minds can.  But, the question of what is a significant pattern and, more importantly, what it might mean remains the preserve of human insight.  (I use the term "insight" here to mean a balanced judgment combining both rationality and intuition.)  So, the role of such systems as objective observer for the detection and possible elimination of human error is, to me, both incorrect and objectionable.  It merely elevates the writer of the algorithm to the status of omniscient god.  And not only omniscient, but also often invisible.

Which brings me to the second article that got me thinking... rather negatively, it so happens.  "This Is Generation Flux: Meet The Pioneers Of The New (And Chaotic) Frontier Of Business" by Robert Safian was published by Fast Company magazine on January 9th.  The breathless tone, the quirky black and white photos and the personal success stories all contribute to a sense (for me, anyway) of awe in which we are asked to perceive these people.  The premise that the new frontier of business is chaotic is worthy of deep consideration and, in my opinion, is quite likely to be true.  But, the treatment is, as Scott Davis of Lyzasoft opined "more Madison Avenue than Harvard Business Review".  It is quite clear that each of the pioneers here has made significant contributions to the use of big data and analytics in a rapidly changing business world.  However, the converging personal views of seven pioneering people--presumably chosen for their common views on the topic--hardly constitutes a well-founded, thought-out theoretical rationale for concluding that big data and predictive analytics are the only, or even a suitable, solution for managing chaos in business.

As big data peaks on the hype curve this year (or has it done so already?), it will be vital that we in the Business Intelligence world step back and balance the unbridled enthusiasm and optimism of the above two articles with a large dollop of cold, hard realism based on our many years experience of trying to garner value from "big data".  (Since its birth, BI has always been on the edge of data bigger than could be comfortably handled by the technology of the time.)  So, here are three questions you might consider asking the next big data pioneer who is preaching about their latest discovery:  What is the provenance of the data you used--its sources, how it was collected/generated, privacy and usage conditions?  Can you explain in layman's terms the algorithm you used (recall that a key cause of the 2008 financial crash was apparently that none of the executives understood the trading algorithms)?  Can you give me two alternative explanations that might also fit the data values observed?

Big data and predictive analytics should be causing us to think about new possibilities and old explanations.  They should be challenging us to exercise our own insight.  Unfortunately, it appears that they may be tempting some of us to do the exact opposite: trust the computer output or the data science gurus more than we trust ourselves.  "Caveat decernor" to coin a phrase in in something akin to pig Latin--let the decision maker beware!

See also: "What is the Importance and Value of Big Data? Part 2 of Big Data: Giant Wave or Huge Precipice?"

Posted January 16, 2012 8:28 AM
Permalink | No Comments |
bp-napkin.jpg"Seven Faces of Data - Rethinking data's basic characteristics" - new White Paper by Dr. Barry Devlin.

We live in a time when data volumes are growing faster than Moore's Law and the variety of structures and sources has expanded far beyond those that IT has experience of managing.  It is simultaneously an era when our businesses and our daily lives have become intimately dependent on such data being trustworthy, consistent, timely and correct.  And yet, our thinking about and tools for managing data quality in the broadest sense of the word remain rooted in a traditional understanding of what data is and how it works.  It is surely time for some new thinking.

A fascinating discussion with Dan Graham of Teradata over a couple of beers in February last at Strata in Santa Clara ended up in a picture of something called a "Data Equalizer" drawn on a napkin.  As often happens after a few beers, one thing led to another...

The napkin picture led me to take a look at the characteristics of data in the light of the rapid, ongoing change in the volumes, varieties and velocity we're seeing in the context of Big Data.  A survey of data-centric sources of information revealed almost thirty data characteristics considered interesting by different experts.  Such a list is too cumbersome to use and I narrowed it down based on two criteria.  First was the practical usefulness of the characteristic: how does the trait help IT make decisions on how to store, manage and use such data?  What can users expect of this data based on its traits?  Second, can the trait actually be measured?

The outcome was seven fundamental traits of data structure, composition and use that enable IT professionals to examine existing and new data sources and respond to the opportunities and challenges posed by new business demands and novel technological advances.  These traits can help answer fundamental questions about how and where data should be stored and how it should be protected.  And they suggest how it can be securely made available to business users in a timely manner.

So what is the "Data Equalizer"?  It's a tool that graphically portrays the overall tone and character of a dataset, IT professionals can quickly evaluate the data management needs of a specific set of data.  More generally, it clarifies how technologies such as relational databases and Hadoop, for example, can be positioned relative to one another and how the data warehouse is likely to evolve as the central integrating hub in a heterogeneous, distributed and expanding data environment.

Understanding the fundamental characteristics of data today is becoming an essential first step in defining a data architecture and building an appropriate data store.  The emerging architecture for data is almost certainly heterogeneous and distributed.  There is simply too large a volume and too wide a variety to insist that it all must be copied into a single format or store.  The long-standing default decision--a relational database--may not always be appropriate for every application or decision-support need in the face of these surging data volumes and growing variety of data sources.  The challenge for the evolving data warehouse will be to ensure that we retain a core set of information to ensure homogeneous and integrated business usage.  For this core business information, the relational model will remain central and likely mandatory; it is the only approach that has the theoretical and practical schema needed to link such core data to other stores.

"Seven Faces of Data - Rethinking data's basic characteristics" - new White Paper by Dr. Barry Devlin (sponsored by Teradata)


Posted November 17, 2011 6:07 AM
Permalink | No Comments |
PREV 1 2 3