Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation published by Addison-Wesley in 1997.

Over the past few years, Barry has extended his interest to cover the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

November 2011 Archives

data searching boy.pngI wrote a White Paper for an interesting, UK-based start-up, NeutrinoBI, back in October on the topic of freeform search in BI.  So, a new paper by Marti A. Hearst, "'Natural' Search User Interfaces", in the November issue of "Communications of the ACM" caught my attention.  I was particularly interested because Hearst has been one of the main proponents of faceted search, an approach that is relatively unsuccessful in BI.  I wondered if I had missed some new developments in the field.


Posted November 30, 2011 5:10 AM
Permalink | No Comments |
data searching boy.pngI wrote a White Paper for an interesting, UK-based start-up, NeutrinoBI, back in October on the topic of freeform search in BI.  So, a new paper by Marti A. Hearst, "'Natural' Search User Interfaces", in the November issue of "Communications of the ACM" caught my attention.  I was particularly interested because Hearst has been one of the main proponents of faceted search, an approach that is relatively unsuccessful in BI.  I wondered if I had missed some new developments in the field.

In the event, Hearst's paper is somewhat futuristic.  Nonetheless, he makes some fascinating observations about how users' interaction with computers is undergoing significant change.  The first trend is a move towards voice interaction, driven in particular by smart phones.  There have been dramatic improvements in voice recognition over the past few years, and the ability of software to recognize and interpret simple commands allows users to speak more naturally.  This is driving a move away from single keyword or key phrase searches, even when typed into Google, for example.  However, more work is needed to enable reliable identification of context and the sequential inquiry approach often favored by users.  Natural language-like query has been a topic of ongoing interest in BI, and NeutrinoBI's product supports this in its query interface.  

Hearst also notes the growing importance of social search, in collaboration, crowdsourcing and basic social communication.  This is a topic I'm currently developing in a new White Paper with Lyzasoft and to which I'll return in a future post.

The final point that Hearst makes is on the emergence of video as a means of communication, replacing text-based approaches among some younger users of the Web.  This poses some significant challenges for both search and analytics in the future.  In recent years, text analytics has become an important tool for sentiment analysis and so on.  "Video analytics" is still in its infancy.

Returning to NeutrinoBI, let's take a look at the exciting concept of freeform search.  Freeform search, in one sentence, enables Google-like search of highly-structured information as is found in data warehouses and data marts.  The secret of moving from keyword search of content to freeform search lies in understanding and representing the structure of relational data in a way that supports intelligent parsing of free text searches.  This structure originates in data modeling and is carried into relational database design and associated metadata--which columns occur in which tables, identification of primary and secondary keys and foreign-key relationships between tables.  Structuring, whether into normalized or multidimensional schemata, also constrains allowable values in certain columns and relationships between them.

The result is a conceptual hierarchical structure that is hardwired in the database or application in the multidimensional approach.  For example, geographical location is often expressed in the form of regions, which are comprised of countries, which are further comprised of states / provinces, which break down to counties, each of which contains a known and limited set of values.  In the case of freeform search, a process known as hierarchical value decomposition is used to automatically analyze database structures and create the indexes and metadata--the information context, unique and specific to the structure and content of the underlying data--needed to extract meaning from the key words entered by the user at search time.  The result is a system that can be easily used by business people in their own terminology and can be constructed rapidly and efficiently by IT.  Further details can be found in my White Paper "Freedom from Facets: Discovering the data you really need".

On the Web, a search interface is already the norm.  As Hearst describes, it is evolving into something closer to the way humans interact.  What we see on the Web usually transfers into the enterprise environment.  The function that  NeutrinoBI is already delivering shows how these developments can be applied to BI and, in the process, is beginning to provide business users with a powerful, yet simple, way to get the information they need.


Posted November 30, 2011 5:10 AM
Permalink | No Comments |
Sometimes, business intelligence stories can be complex and technical.  Sometimes, even the success stories of a million shaved off expenses here or a 3% improvement in profit margin there can be a bit repetitive.  However, it is the success stories that can prove the business value and quality that characterize the best BI projects.  As one of the judges of the annual South African BI Excellence and Innovation Awards (closing date for entries is 30 November), I'd like to share with you a novel approach to making your entry stand out.  For those of you not eligible to enter, the approach also works when trying to raise funding from the business for a new BI project.  However, you may need to be very brave, even foolhardy; sometimes, you need to tell the as-is horror story to prove just how much better the to-be situation will be.

This story that I'm about to summarize comes courtesy of Teradata's Bill Franks, who posted it about 10 days ago on the Smart Data Collective blog, where it's attracting lots of attention.  Entitled "The Dire Consequences of Analytics Gone Wrong: Ruining Kids' Futures", Bill recounts the story of a local school who invested for the first time in an analytics package designed (allegedly) to detect cheating in English essays.  The package was run for the first time against a set of essays submitted at the start of term by a class of highly motivated and high performing students.  The package promptly reported that each and every student was a cheater, with pervasive copying and plagiarism throughout the group.  The school failed all of the students on the assignment and was about to note the offense on the students' records, an action that would have had severe consequences for their future educational chances, until the parents stepped in...

I leave you to check the rest of the story on Bill's post.  But the bottom line was that the school and the teachers trusted the results of the package more than their own prior experience of the students.  A result of stunning implausibility from the software was accepted without question, without any resort to reason or common sense.  Apparently, the school backed down on noting the offense on the students' records; but it stood by the decision to fail every student in the class on the essay.

As BI practitioners, I trust you get the message.  Analytics in the hands of naive users can be more dangerous than a Kalashnikov.  Self-service BI may speed up delivery, but how reliable are the conclusions?  There's a good reason why assault rifles are not available on open supermarket shelves (in most countries!).  I'm not against self-service BI, but I do have a problem with willful and uneducated business users.  BI must complement common sense, not to substitute for it.  It's up to IT to ensure the users are informed of the strengths and weaknesses of new tooling.

To return to your entries for the BI Awards, to be presented at the BI Summit in Johannesburg on 28 February next, I hope you don't have a horror story of such magnitude, and if you did, I imagine your CEO would be reluctant to have it told in public.  But do remember that the difference between the before and after pictures is a strong indication to the judges of the value of the project to the business.  BI Excellence, beyond technical architecture, also includes data governance and user education.  BI Innovation is about changing the way users make business decisions.  Good decisions based on reliable information.

Posted November 21, 2011 5:32 AM
Permalink | No Comments |
Sometimes, business intelligence stories can be complex and technical.  Sometimes, even the success stories of a million shaved off expenses here or a 3% improvement in profit margin there can be a bit repetitive.  However, it is the success stories that can prove the business value and quality that characterize the best BI projects.  As one of the judges of the annual South African BI Excellence and Innovation Awards (closing date for entries is 30 November), I'd like to share with you a novel approach to making your entry stand out.  For those of you not eligible to enter, the approach also works when trying to raise funding from the business for a new BI project.  However, you may need to be very brave, even foolhardy; sometimes, you need to tell the as-is horror story to prove just how much better the to-be situation will be.

This story that I'm about to summarize comes courtesy of Teradata's Bill Franks, who posted it about 10 days ago on the Smart Data Collective blog, where it's attracting lots of attention.  Entitled "The Dire Consequences of Analytics Gone Wrong: Ruining Kids' Futures", Bill recounts the story of a local school who invested for the first time in an analytics package designed (allegedly) to detect cheating in English essays.  The package was run for the first time against a set of essays submitted at the start of term by a class of highly motivated and high performing students.  The package promptly reported that each and every student was a cheater, with pervasive copying and plagiarism throughout the group.  The school failed all of the students on the assignment and was about to note the offense on the students' records, an action that would have had severe consequences for their future educational chances, until the parents stepped in...

I leave you to check the rest of the story on Bill's post.  But the bottom line was that the school and the teachers trusted the results of the package more than their own prior experience of the students.  A result of stunning implausibility from the software was accepted without question, without any resort to reason or common sense.  Apparently, the school backed down on noting the offense on the students' records; but it stood by the decision to fail every student in the class on the essay.

As BI practitioners, I trust you get the message.  Analytics in the hands of naive users can be more dangerous than a Kalashnikov.  Self-service BI may speed up delivery, but how reliable are the conclusions?  There's a good reason why assault rifles are not available on open supermarket shelves (in most countries!).  I'm not against self-service BI, but I do have a problem with willful and uneducated business users.  BI must complement common sense, not to substitute for it.  It's up to IT to ensure the users are informed of the strengths and weaknesses of new tooling.

To return to your entries for the BI Awards, to be presented at the BI Summit in Johannesburg on 28 February next, I hope you don't have a horror story of such magnitude, and if you did, I imagine your CEO would be reluctant to have it told in public.  But do remember that the difference between the before and after pictures is a strong indication to the judges of the value of the project to the business.  BI Excellence, beyond technical architecture, also includes data governance and user education.  BI Innovation is about changing the way users make business decisions.  Good decisions based on reliable information.

Posted November 21, 2011 5:32 AM
Permalink | No Comments |
bp-napkin.jpg"Seven Faces of Data - Rethinking data's basic characteristics" - new White Paper by Dr. Barry Devlin.

We live in a time when data volumes are growing faster than Moore's Law and the variety of structures and sources has expanded far beyond those that IT has experience of managing.  It is simultaneously an era when our businesses and our daily lives have become intimately dependent on such data being trustworthy, consistent, timely and correct.  And yet, our thinking about and tools for managing data quality in the broadest sense of the word remain rooted in a traditional understanding of what data is and how it works.  It is surely time for some new thinking.

A fascinating discussion with Dan Graham of Teradata over a couple of beers in February last at Strata in Santa Clara ended up in a picture of something called a "Data Equalizer" drawn on a napkin.  As often happens after a few beers, one thing led to another...

The napkin picture led me to take a look at the characteristics of data in the light of the rapid, ongoing change in the volumes, varieties and velocity we're seeing in the context of Big Data.  A survey of data-centric sources of information revealed almost thirty data characteristics considered interesting by different experts.  Such a list is too cumbersome to use and I narrowed it down based on two criteria.  First was the practical usefulness of the characteristic: how does the trait help IT make decisions on how to store, manage and use such data?  What can users expect of this data based on its traits?  Second, can the trait actually be measured?

The outcome was seven fundamental traits of data structure, composition and use that enable IT professionals to examine existing and new data sources and respond to the opportunities and challenges posed by new business demands and novel technological advances.  These traits can help answer fundamental questions about how and where data should be stored and how it should be protected.  And they suggest how it can be securely made available to business users in a timely manner.

So what is the "Data Equalizer"?  It's a tool that graphically portrays the overall tone and character of a dataset, IT professionals can quickly evaluate the data management needs of a specific set of data.  More generally, it clarifies how technologies such as relational databases and Hadoop, for example, can be positioned relative to one another and how the data warehouse is likely to evolve as the central integrating hub in a heterogeneous, distributed and expanding data environment.

Understanding the fundamental characteristics of data today is becoming an essential first step in defining a data architecture and building an appropriate data store.  The emerging architecture for data is almost certainly heterogeneous and distributed.  There is simply too large a volume and too wide a variety to insist that it all must be copied into a single format or store.  The long-standing default decision--a relational database--may not always be appropriate for every application or decision-support need in the face of these surging data volumes and growing variety of data sources.  The challenge for the evolving data warehouse will be to ensure that we retain a core set of information to ensure homogeneous and integrated business usage.  For this core business information, the relational model will remain central and likely mandatory; it is the only approach that has the theoretical and practical schema needed to link such core data to other stores.

"Seven Faces of Data - Rethinking data's basic characteristics" - new White Paper by Dr. Barry Devlin (sponsored by Teradata)


Posted November 17, 2011 6:07 AM
Permalink | No Comments |
PREV 1 2