Blog: Ronald Damhof Subscribe to this blog's RSS feed!

Ronald Damhof

I have been a BI/DW practitioner for more than 15 years. In the last few years, I have become increasingly annoyed - even frustrated - by the lack of (scientific) rigor in the field of data warehousing and business intelligence. It is not uncommon for the knowledge worker to be disillusioned by the promise of business intelligence and data warehousing because vendors and consulting organizations create their "own" frameworks, definitions, super-duper tools etc.

What the field needs is more connectedness (grounding and objectivity) to the scientific community. The scientific community needs to realize the importance of increasing their level of relevance to the practice of technology.

For the next few years, I have decided to attempt to build a solid bridge between science and technology practitioners. As a dissertation student at the University of Groningen in the Netherlands, I hope to discover ways to accomplish this. With this blog I hope to share some of the things I learn in my search and begin discussions on this topic within the international community.

Your feedback is important to me. Please let me know what you think. My email address is Ronald.damhof@prudenza.nl.

About the author >

Ronald Damhof is an information management practitioner with more than 15 years of international experience in the field.

His areas of focus include:

  1. Data management, including data quality, data governance and data warehousing;
  2. Enterprise architectural principles;
  3. Exploiting data to its maximum potential for decision support.
Ronald is an Information Quality Certified Professional (International Association for Information and Data Quality one of the first 20 to pass this prestigious exam), Certified Data Vault Grandmaster (only person in the world to have this level of certification), and a Certified Scrum Master. He is a strong advocate of agile and lean principles and practices (e.g., Scrum). You can reach him at +31 6 269 671 84, through his website at http://www.prudenza.nl/ or via email at ronald.damhof@prudenza.nl.

This blog has been inspired by Martijn Evers blogpost, by Barry Devlin's work on Business Integrated Insight, by some very forward thinking customers and is bugging me for some time now.

Back in 2005 I gave masterclasses 'Data Warehousing in Depth' at the knowledge institute CIBIT in the Netherlands. One part of the class was pondering about the future of data warehousing. It was my favorite part and I remember that I always reminded the class as to the root-reason for a data warehouse; a physical construct to overcome deficiencies like performance, history, auditability, data quality, usability, ...

And I remember drawing two circles; one for data in the operational environment and one for data in the informational environment. Both represented physical environments, in other words; the data needed to physically move from the operational to the informational environment.

What would be necessary to blend the two environments, I then asked;

  •  "Sources need to maintain history and adhere to strict auditability rules"
  • "More and faster hardware and parallelization options in both hard- and software"
  • "A common vocabulary of data, reflected in the operational systems"
  • "Enterprise-wide data like master- and reference data needs to be maintained pro-active (upstream) and centrally"
  •  ..

These are all still very valid points, but I would add a more profound one:

Data Virtualization guru's preach to us the Information Hiding1 pattern. A pattern perfectly suited for the decoupling information systems and their data. These Data Virtualization guys and gals say that the software that supports this data virtualization is the new pinnacle for decoupling operational and informational environments and make the Data Warehouse2 (eventually) obsolete. 

My opinion; 'they' are right and they are wrong. Yes, Information hiding is a pattern (there are more) that enables decoupling of the information environment and the operational environment. But this distinction is somewhat flawed - it is a distinction that originated from the early 90's that reflected deficiencies of using registered data for decision support activities.

We might wanna make an effort in trying to get rid of these deficiencies....

In the last 20-30 years, the data-model and its instantiations (the data) where directly based on the Information systems. The data-model and the data were by-products. The data-model fitted the information system. The Informational environment was born to overcome the deficiencies that came with this approach.

I want to make a plea for shifting this process-oriented thinking and designing to data-oriented thinking and designing. Make the data smarter instead of making countless little/big information systems with their own 'data-store'. It is not that odd; information systems and the business processes they support are so much more susceptible to change than the data is. Data is - in its very nature - extremely stable over time.

This plea is not new; start with an Information Model of your business, construct a conceptual model and slowly design your way down to the logical, physical model and/or canonical model3. Information systems are now made to fit the data-architecture and not vice versa. These information systems are somewhat decoupled from the data architecture. These information systems need to use the Information Hiding pattern.

Now, the principle of Information Hiding and the technology that can handle this pattern can flourish. Data Virtualization technology can be used to its full potential, but the same applies to BPM technology or technology which is based on service oriented architectures.

So, now we have the following list of requirements that are necessary to blend operational- and information environments:

1.     A data-oriented design and architecture of information system as opposed to a process oriented design and architecture

2.     "Sources need to maintain history and adhere to strict auditability rules"

3.     "More and faster hardware and parallelization options in both hard- and software"

4.     "A common vocabulary of data, reflected in the operational systems"

5.     "Enterprise-wide data like master- and reference data needs to be maintained pro-active (upstream) and centrally" 

 

And I will add another three:

 

6.     Centralized design and maintenance of business rules

7.     Data security and privacy law and regulations are enforced on the data-level.

8.     An organizing framework for establishing strategy, objectives, and policies for (corporate) data4

 

It is impossible to be completely thorough in this list, there are indeed many more, but this is a blog.....not a book ;-)

The mentioned criteria can be mapped to a discipline that is about to reach a critical mass in terms of body of knowledge, technology support, rigor in science and relevance in practice; Data Management and Data Governance. 

Back in 2005, in my masterclasses, I mused over the future of data warehousing. The future is now, the journey will not be easy, but the rewards are substantial. It truly is the future of the Sense and Respond5 organization.

 

Footnotes:

David L. Parnas, 1972, On the criteria to be used in decomposing Systems into modules

2 Data Virtualization vendors often claim to make the Enterprise Data Warehouse obsolete (referencing to the Boulder BI Brain Trust meeting beginning of this year where a vendor made this claim). They confuse a technology (data virtualization software) with an architectural construct.

3 A Canonical model is a design pattern used to communicate between different data format. It needs to be based on the logical model of the organization.

4 Jill Dyche and Evan Levy, note from the blog-post-author; I adapted the quote by adding brackets between 'corporate'

5  Stephen H. Haeckel, 1999, Adaptive Enterprise: Creating and Leading 


Posted January 22, 2013 4:53 AM
Permalink | No Comments |

Leave a comment