Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

orthrus.jpgMuch has happened while I've been heads down over the past few months finishing my book. Well, Business unIntelligence - Insight and Innovation Beyond Analytics and Big Data" went to the printer last weekend and should be in the stores by mid-October. And, I can rejoin the land of the living. One of the interesting news stories in the meantime was Cisco's acquisition of Composite Software, which closed at the end of July. Mike Flannagan, Senior Director & General Manager, IBT Group at Cisco and Bob Eve, Data Virtualization Marketing Director and long-time advocate of virtualization at Composite turned up at the BBBT in mid-August to brief an eclectic bunch of independent analysts, including myself. 

The link-up of Cisco and Composite is, I believe, going to offer some very interesting technological opportunities in the market, especially in BI and big data. 

BI has been slow to adopt data virtualization. In fact, I was one of the first to promote the approach with IBM Information Integrator (now part of InfoSphere), some ten years ago when I was still with IBM. The challenge was always that virtualization seems to fly in the face of traditional EDW consolidation and reconciliation via ETL tools. I say seems because the two approaches are more complementary than competitive. Way back in the early 2000s, it was already clear to me that there were three obvious use cases: (i) real-time access to operational data, (ii) access to non-relational data stores, and (iii) rapid prototyping. The advent of big data and the excitement of operational analytics have confirmed my early enthusiasm. No argument - data virtualization and ETL are mandatory components of any new BI architecture or implementation.

So, what does Cisco add to the mix with Composite? One of the biggest challenges for virtualization is to understand and optimize the interaction between databases and the underlying network. When data from two or more distributed databases must be joined in a real-time query, the query optimizer needs to know, among other things, where the data resides, the volumes in each location, the available processing power of each database, and the network considerations for moving the data between locations. Data virtualization tools typically focus on the first three database concerns, probably as a result of their histories.  However, the last concern, the network, increasingly holds the key to excellent optimization.  There are two reasons. First, processer power continues to grow, so database performance has proportionately less impact.  Second, Cloud and big data together mean that distribution of data is becoming much more prevalent.  And growth in network speed, while impressive, is not in the same ballpark as that of processing, making for a tighter bottleneck.  And who better to know about the network and even tweak its performance profile to favor a large virtualization transfer than a big networking vendor like Cisco? The fit seems just right.

For this technical vision to work, the real challenge will be organizational, as is always the case with acquisitions.  Done well, acquisitions can be successful.  Witness IBM's integration of Lotus and Netezza, to name but two. Of course, strategic management and cultural fit always count. But, the main question usually is: Does the buyer really understand what the acquired company brings and is the buyer willing to change their own plans to accommodate that new value? It's probably too early to answer that question. The logistics are still being worked through and the initial focus is on ensuring that current plans and revenue targets are at least maintained. But, if I may offer some advice on the strategy... 

The Cisco network must recognize that the query optimizer in Composite will, in some sense, become another boss. The value for the combined company comes from the knowledge that resides in the virtualization query optimizer about what data types and volumes need to be accommodated on the network. This becomes the basis of how to route the data and how to tweak the network to carry it. In terms of company size, this may be the tail wagging the dog. But, in terms of knowledge, it's more like the dog with two heads. The Greek mythological image of Kyon Orthros, "the dog of morning twilight" and the burning heat of mid-summer, is perhaps an appropriate image.  An opportunity to set the network ablaze.

Posted September 7, 2013 12:21 AM
Permalink | 1 Comment |

1 Comment

I was pleased to come across your question "Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances?"
Back in 1991, I was motivated to solve a data problem. The problem I had to solve seemed nothing like a relational database problem. I needed a better way to manage my extensive emails - to let me find a particular email easily.
My deliberations on this problem led me to a solution that is applicable across all data types, including relational databases. The solution is rich in possibilities which I keep exploring and evolving in new directions.
A simple version of this technology is currently called Faceted Navigation or Faceted Search. Its very much more powerful generalization is called Technology for Information Engineering, or TIE and as far as I know is exclusive to my company, SpeedTrack Inc.
Its basis is a change of focus from data to information with the proposition that information is in the categorical associations of data components. We extract all component associations and store these in association matrices. These matrices are optimized for very fast query evaluation. The data itself is not needed for any search and data mining. Every useful question about the information in the data can be answered using only these matrices. The structures used to store data, so critical in relational databases, is not important at all. Both structured and unstructured data can be treated the same way.
This technology replaces search with what we call Information Navigation, a much more useful and powerful function. It allows users to see the information in the data. A user chooses a key term or description, builds the query and all the associated terms or descriptions are immediately updated. The user is thus guided to the available information in a gradual winnowing process. Zero hits are impossible. Additionally the display can show all correlation measures between the data elements. Addition of a term to the query immediately makes available hundreds of thousands of correlations, with those of highest support and confidence available first. This allows users to also navigate through correlations. It makes the finding of the needle in a haystack a matter of a few moue clicks.
I would be pleased to show you a demonstration of TIE using our Guided Information Access platform on a database of 64 million hospital encounters in the State of California. I would greatly appreciate your opinion of the technology.
You can check out a few recorded demos on our web site: www.speedtrack.com however they do not show all our features.

Jerzy

Leave a comment

    
   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›