Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

November 2011 Archives

data searching boy.pngI wrote a White Paper for an interesting, UK-based start-up, NeutrinoBI, back in October on the topic of freeform search in BI.  So, a new paper by Marti A. Hearst, "'Natural' Search User Interfaces", in the November issue of "Communications of the ACM" caught my attention.  I was particularly interested because Hearst has been one of the main proponents of faceted search, an approach that is relatively unsuccessful in BI.  I wondered if I had missed some new developments in the field.

In the event, Hearst's paper is somewhat futuristic.  Nonetheless, he makes some fascinating observations about how users' interaction with computers is undergoing significant change.  The first trend is a move towards voice interaction, driven in particular by smart phones.  There have been dramatic improvements in voice recognition over the past few years, and the ability of software to recognize and interpret simple commands allows users to speak more naturally.  This is driving a move away from single keyword or key phrase searches, even when typed into Google, for example.  However, more work is needed to enable reliable identification of context and the sequential inquiry approach often favored by users.  Natural language-like query has been a topic of ongoing interest in BI, and NeutrinoBI's product supports this in its query interface.  

Hearst also notes the growing importance of social search, in collaboration, crowdsourcing and basic social communication.  This is a topic I'm currently developing in a new White Paper with Lyzasoft and to which I'll return in a future post.

The final point that Hearst makes is on the emergence of video as a means of communication, replacing text-based approaches among some younger users of the Web.  This poses some significant challenges for both search and analytics in the future.  In recent years, text analytics has become an important tool for sentiment analysis and so on.  "Video analytics" is still in its infancy.

Returning to NeutrinoBI, let's take a look at the exciting concept of freeform search.  Freeform search, in one sentence, enables Google-like search of highly-structured information as is found in data warehouses and data marts.  The secret of moving from keyword search of content to freeform search lies in understanding and representing the structure of relational data in a way that supports intelligent parsing of free text searches.  This structure originates in data modeling and is carried into relational database design and associated metadata--which columns occur in which tables, identification of primary and secondary keys and foreign-key relationships between tables.  Structuring, whether into normalized or multidimensional schemata, also constrains allowable values in certain columns and relationships between them.

The result is a conceptual hierarchical structure that is hardwired in the database or application in the multidimensional approach.  For example, geographical location is often expressed in the form of regions, which are comprised of countries, which are further comprised of states / provinces, which break down to counties, each of which contains a known and limited set of values.  In the case of freeform search, a process known as hierarchical value decomposition is used to automatically analyze database structures and create the indexes and metadata--the information context, unique and specific to the structure and content of the underlying data--needed to extract meaning from the key words entered by the user at search time.  The result is a system that can be easily used by business people in their own terminology and can be constructed rapidly and efficiently by IT.  Further details can be found in my White Paper "Freedom from Facets: Discovering the data you really need".

On the Web, a search interface is already the norm.  As Hearst describes, it is evolving into something closer to the way humans interact.  What we see on the Web usually transfers into the enterprise environment.  The function that  NeutrinoBI is already delivering shows how these developments can be applied to BI and, in the process, is beginning to provide business users with a powerful, yet simple, way to get the information they need.


Posted November 30, 2011 5:10 AM
Permalink | No Comments |
Sometimes, business intelligence stories can be complex and technical.  Sometimes, even the success stories of a million shaved off expenses here or a 3% improvement in profit margin there can be a bit repetitive.  However, it is the success stories that can prove the business value and quality that characterize the best BI projects.  As one of the judges of the annual South African BI Excellence and Innovation Awards (closing date for entries is 30 November), I'd like to share with you a novel approach to making your entry stand out.  For those of you not eligible to enter, the approach also works when trying to raise funding from the business for a new BI project.  However, you may need to be very brave, even foolhardy; sometimes, you need to tell the as-is horror story to prove just how much better the to-be situation will be.

This story that I'm about to summarize comes courtesy of Teradata's Bill Franks, who posted it about 10 days ago on the Smart Data Collective blog, where it's attracting lots of attention.  Entitled "The Dire Consequences of Analytics Gone Wrong: Ruining Kids' Futures", Bill recounts the story of a local school who invested for the first time in an analytics package designed (allegedly) to detect cheating in English essays.  The package was run for the first time against a set of essays submitted at the start of term by a class of highly motivated and high performing students.  The package promptly reported that each and every student was a cheater, with pervasive copying and plagiarism throughout the group.  The school failed all of the students on the assignment and was about to note the offense on the students' records, an action that would have had severe consequences for their future educational chances, until the parents stepped in...

I leave you to check the rest of the story on Bill's post.  But the bottom line was that the school and the teachers trusted the results of the package more than their own prior experience of the students.  A result of stunning implausibility from the software was accepted without question, without any resort to reason or common sense.  Apparently, the school backed down on noting the offense on the students' records; but it stood by the decision to fail every student in the class on the essay.

As BI practitioners, I trust you get the message.  Analytics in the hands of naive users can be more dangerous than a Kalashnikov.  Self-service BI may speed up delivery, but how reliable are the conclusions?  There's a good reason why assault rifles are not available on open supermarket shelves (in most countries!).  I'm not against self-service BI, but I do have a problem with willful and uneducated business users.  BI must complement common sense, not to substitute for it.  It's up to IT to ensure the users are informed of the strengths and weaknesses of new tooling.

To return to your entries for the BI Awards, to be presented at the BI Summit in Johannesburg on 28 February next, I hope you don't have a horror story of such magnitude, and if you did, I imagine your CEO would be reluctant to have it told in public.  But do remember that the difference between the before and after pictures is a strong indication to the judges of the value of the project to the business.  BI Excellence, beyond technical architecture, also includes data governance and user education.  BI Innovation is about changing the way users make business decisions.  Good decisions based on reliable information.

Posted November 21, 2011 5:32 AM
Permalink | No Comments |
bp-napkin.jpg"Seven Faces of Data - Rethinking data's basic characteristics" - new White Paper by Dr. Barry Devlin.

We live in a time when data volumes are growing faster than Moore's Law and the variety of structures and sources has expanded far beyond those that IT has experience of managing.  It is simultaneously an era when our businesses and our daily lives have become intimately dependent on such data being trustworthy, consistent, timely and correct.  And yet, our thinking about and tools for managing data quality in the broadest sense of the word remain rooted in a traditional understanding of what data is and how it works.  It is surely time for some new thinking.

A fascinating discussion with Dan Graham of Teradata over a couple of beers in February last at Strata in Santa Clara ended up in a picture of something called a "Data Equalizer" drawn on a napkin.  As often happens after a few beers, one thing led to another...

The napkin picture led me to take a look at the characteristics of data in the light of the rapid, ongoing change in the volumes, varieties and velocity we're seeing in the context of Big Data.  A survey of data-centric sources of information revealed almost thirty data characteristics considered interesting by different experts.  Such a list is too cumbersome to use and I narrowed it down based on two criteria.  First was the practical usefulness of the characteristic: how does the trait help IT make decisions on how to store, manage and use such data?  What can users expect of this data based on its traits?  Second, can the trait actually be measured?

The outcome was seven fundamental traits of data structure, composition and use that enable IT professionals to examine existing and new data sources and respond to the opportunities and challenges posed by new business demands and novel technological advances.  These traits can help answer fundamental questions about how and where data should be stored and how it should be protected.  And they suggest how it can be securely made available to business users in a timely manner.

So what is the "Data Equalizer"?  It's a tool that graphically portrays the overall tone and character of a dataset, IT professionals can quickly evaluate the data management needs of a specific set of data.  More generally, it clarifies how technologies such as relational databases and Hadoop, for example, can be positioned relative to one another and how the data warehouse is likely to evolve as the central integrating hub in a heterogeneous, distributed and expanding data environment.

Understanding the fundamental characteristics of data today is becoming an essential first step in defining a data architecture and building an appropriate data store.  The emerging architecture for data is almost certainly heterogeneous and distributed.  There is simply too large a volume and too wide a variety to insist that it all must be copied into a single format or store.  The long-standing default decision--a relational database--may not always be appropriate for every application or decision-support need in the face of these surging data volumes and growing variety of data sources.  The challenge for the evolving data warehouse will be to ensure that we retain a core set of information to ensure homogeneous and integrated business usage.  For this core business information, the relational model will remain central and likely mandatory; it is the only approach that has the theoretical and practical schema needed to link such core data to other stores.

"Seven Faces of Data - Rethinking data's basic characteristics" - new White Paper by Dr. Barry Devlin (sponsored by Teradata)


Posted November 17, 2011 6:07 AM
Permalink | No Comments |
Elephant.pngIt's now more than two weeks since the second annual Tech4Africa Conference in Johannesburg finished.  My excuse for not blogging earlier?  Well, my reason is that that I took a vacation in the Kruger National Park immediately afterwards.  Hence the picture.  Which also fits the theme of this post...

I was invited to present as an African speaker!  I felt a bit of a fraud--two years living on the continent, especially in Cape Town, probably the most European of African cities--hardly seemed to qualify me for the label.  But, as I listened to the various other contributors, I began to realise that, somehow, the African technology environment seems to make sense to me.
The most striking thing about the event was its heart.  I mean the sense of energy and excitement shared by the speakers, staff and attendees alike.  Part is, I'm sure, due to the unrelenting enthusiasm and dedication of the founder and chief bottle-washer of Tech4Africa, Gareth Knight.  But, more was due to something in the air at the event.    

I have been to many enthusiastic and energetic events over the years.  The early TDWI conferences in the US.  Most recently Strata in Santa Clara last February.  But the difference here was the source of all the energy and the target of the enthusiasm.  Here it was all about making a difference for the people of Africa.  Through commerce and technology, for sure.  But, the most important thing was that it was about doing it in a way that fit in with the environment and culture of the continent.  Or to be more correct, the environments and cultures of this huge continent.  Just see how big it is!  Perhaps you can begin to imagine the diversity, too.

This impression was further reinforced by the companies and products that were honoured at the event.  The winner of the Innovation Award was TxtAlert, a mobile techology platform designed to improve adherence to Anti-Retroviral Treatment schedules.  The technology is built on basic SMS, but the impact is extraordinary in its scope.  Much the same applies to the other nominees for the Award.

In contrast, the winner of the Samsung Ignite competition for start-ups was a product that would probably only work in one small area of the continent, the Western Cape province of South Africa... and much of the "Western world".  The winner was Real Time Wine, a mobile platform aimed at the supermarket wine-buying audience that enables them to discover, review, engage with and buy wine, using smartphone apps, game technology and barcode scanning.

All in all, an excellent conference that, judging from the comments I heard there and saw on Twitter, provided great value to the businesses that attended and great encouragement to the innovators and software vendors who participated.  For the speakers, especially the international ones (and I'll include myself in that category for now), it was an eye-opener.

Well done to Gareth, Chrissy (event lynchpin) and all the team.  I, for one, am already looking forward to the 2012 event!

Posted November 15, 2011 6:17 AM
Permalink | No Comments |
CognitiveComputingWatson.jpgI've previously written about IBM Watson, its success in "Jeopardy!" and some of the future applications that its developers envisaged for it.  IBM has moved the technology towards the mainstream in a number of presentations at the Information on Demand (IOD) Conference in Las Vegas last week.  While Watson works well beyond the normal bounds of BI, analyzing and reasoning in soft (unstructured) information, the underlying computer hardware is very much the same (albeit faster and bigger) as we have used since the beginnings of the computer era.

But, I was intrigued by an announcement that IBM made in August last that I came across a few weeks ago:
"18 Aug 2011: Today, IBM researchers unveiled a new generation of experimental computer chips designed to emulate the brain's abilities for perception, action and cognition. The technology could yield many orders of magnitude less power consumption and space than used in today's computers.


In a sharp departure from traditional concepts in designing and building computers, IBM's first neurosynaptic computing chips recreate the phenomena between spiking neurons and synapses in biological systems, such as the brain, through advanced algorithms and silicon circuitry. Its first two prototype chips have already been fabricated and are currently undergoing testing.

"Called cognitive computers, systems built with these chips won't be programmed the same way traditional computers are today. Rather, cognitive computers are expected to learn through experiences, find correlations, create hypotheses, and remember - and learn from - the outcomes, mimicking the brains structural and synaptic plasticity... The goal of SyNAPSE is to create a system that not only analyzes complex information from multiple sensory modalities at once, but also dynamically rewires itself as it interacts with its environment - all while rivaling the brain's compact size and low power usage."


Please excuse the long quote, but, for once :-), the press release says it as well as I could!  For further details and links to some fascinating videos, see here.

What reminded me of this development was another blog post from Jim Lee in Resilience Economics entitled "Why The Future Of Work Will Make Us More Human". I really like the idea of this, but I'm struggling with it on two fronts.

Quoting David Autor, an economist at MIT, Jim argues that that outsourcing and "othersourcing" of jobs to other countries and machines respectively are polarizing labor markets towards opposite ends of the skills spectrum: low-paying service-oriented jobs that require personal interaction and the manipulation of machinery in unpredictable environments at one end and well-paid jobs that require creativity, ambiguity, and high levels of personal training and judgment at the other.  The center-ground - a vast swathe of mundane, repetitive work that computers do much better than us - will disappear.  These are jobs involving middle-skilled cognitive and productive activities that follow clear and easily understood procedures and can reliably be transcribed into software instructions or subcontracted to overseas labor.  This will leave two types of work for humans: "The job opportunities of the future require either high cognitive skills, or well-developed personal skills and common sense," says Lee in summary.

My first concern is the either-or in the above approach; I believe that high cognitive skills are part and parcel of well-developed personal skills and common sense.  At which end of this polarization would you place teaching, for example?  Education (in the real meaning of the word - from the Latin "to draw out" - as opposed to hammering home) spans both ends of the spectrum.

From the point of view of technology, my second concern is that our understanding of where computing will take us, even in the next few years, has been blown wide open, first by Watson and now by neurosynaptic computing.  What we've seen in Watson is a move from Boolean logic and numerically focused computing to a way of using and understand and using soft information that is much closer to the way humans deal with it.  Of course, it's still far from human.  But, with an attempt to "emulate the brain's abilities for perception, action and cognition", I suspect we'll be in for some interesting developments in the next few years.  Anyone else remember HAL from "2001, A Space Odessey"?

Posted November 1, 2011 5:30 AM
Permalink | No Comments |