We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Ronald Damhof Subscribe to this blog's RSS feed!

Ronald Damhof

I have been a BI/DW practitioner for more than 15 years. In the last few years, I have become increasingly annoyed - even frustrated - by the lack of (scientific) rigor in the field of data warehousing and business intelligence. It is not uncommon for the knowledge worker to be disillusioned by the promise of business intelligence and data warehousing because vendors and consulting organizations create their "own" frameworks, definitions, super-duper tools etc.

What the field needs is more connectedness (grounding and objectivity) to the scientific community. The scientific community needs to realize the importance of increasing their level of relevance to the practice of technology.

For the next few years, I have decided to attempt to build a solid bridge between science and technology practitioners. As a dissertation student at the University of Groningen in the Netherlands, I hope to discover ways to accomplish this. With this blog I hope to share some of the things I learn in my search and begin discussions on this topic within the international community.

Your feedback is important to me. Please let me know what you think. My email address is Ronald.damhof@prudenza.nl.

About the author >

Ronald Damhof is an information management practitioner with more than 15 years of international experience in the field.

His areas of focus include:

  1. Data management, including data quality, data governance and data warehousing;
  2. Enterprise architectural principles;
  3. Exploiting data to its maximum potential for decision support.
Ronald is an Information Quality Certified Professional (International Association for Information and Data Quality one of the first 20 to pass this prestigious exam), Certified Data Vault Grandmaster (only person in the world to have this level of certification), and a Certified Scrum Master. He is a strong advocate of agile and lean principles and practices (e.g., Scrum). You can reach him at +31 6 269 671 84, through his website at http://www.prudenza.nl/ or via email at ronald.damhof@prudenza.nl.

Always a tricky question and every organization is answering this differently. It is however an important one. Those who own the data are responsible for its quality, right? Not a light-harded question if you consider the compliancy pressure these days and the issue of clear responsibilities regarding the data. Those who own the data are responsible for its data whereever it is used within the organization. Is the latter one really true?

In my article that I published in september 2008 I strongly advise to register authentic factual data in the Central Data Warehouse. Business rules should be implemented downstream, after the Central warehouse.

Who owns the data in the Central Data Warehouse? Is it the BICC? No, they are not owner. Since 'we'  store authentic factual data in the Central Data Warehouse (we do not change, enrich or integrate* it!) the owner of the data should still be the same as the owner of the source. Let me put it more simple:

The people that create the authentic data also own the data, wherever it goes.

Classic architecture
Let me just highlight the importance of architecture here. What would happen if I made a classic/old style hub/spoke data warehouse. Where a central data warehouse is developed in which the data is neatly integrated, cleansed etc. In other words, data is changed on the way into the Central Data Warehouse. Or maybe even changed on the way into the staging area!! Well, taking into account the above rule, the data warehouse team, that creates/changes the data, becomes the owner now.

Classic EDW  

The result of the latter - more classic - data warehouse architectures is that there are two owners of the data and that its decoupled between the data created by application and the data coming into the staging. The above is a classic case in the majority of organizations. It creates massive problems in governing your data warehouse, especially your change management. What happens of a change occurs in the source data? Is the source owner responsible for cascading the change to the data warehouse? Well, he and the data warehouse team probably got some kind of SLA stipulating the source owner to signal the data warehouse team that a change is eminent.....The Data Warehouse team needs to go to work. What happens if you have 50 sources, or even 100. What happens if they are big changes....chaos and the sustainability of your data warehouse is in great risk.

New Architecture
So how does it look like in the new generation Enterprise Data Warehouses?
Now, what happens if a change occurs in the data? The owner of the data is going to do an impact analysis on all its interfaces he owns and he is responsible for. He is also responsible in the new generation EDW for engineering the change up untill the central data warehouse!!! This is a hugely more manageable governance model regarding change mangement in an Enterprise Data Warehouse.

Is the source owner also owner of data coming into the datamarts? Yes, this is however a tricky one. Integration, cleansing is taking place downstream, between the Central Data Warehouse and the datamarts. In the second article Lidwine van As and I published in Database Magazine (november 2008) we state that this part of the EDW is pushed by demand (where as data getting into the Central Data Warehouse is pushed by supply); in other words, those who demand put up the requirements; the business rules they want to apply on the factual data.

In datamarts, data is changed according to rules defined by a user. Is the intial owner of the data still accountable for this data? Well...yes they are. But...The rules being used - going into the datamarts - are not their responsibility. If somebody comes up with a highly fantastic rule for calculating turnover, I would say that Finance should have the ownership on this formula. But I am embarking on a whole other subject here. The subject of definition ownership...let's not go there.

IT Artificact and boundary; Data Warehouse and Business Intelligence
As you can see in the above pictures, the boundaries of the IT system has changed. The boundary of the IT system is not the system itself. The boundary is determined by the propagation of its data. The data warehouse is to be regarded as an integral part of the IS environment. You could also say that the data warehouse is evolving into a functional interface on top of the operational-transactional systems. This functional interface rationalizes the data structure of these systems and can thereby serve other functionalities like Business Intelligence. It does not have to be perse Business Intelligence! It can also be used for data sharing to third parties (e.g. co-making), data quality projects, operational control, accounting (remember; it is factual, auditable data), etc..

This shift in system-boundary is also acknowledging that building sustainable data structures differs from building succesfull Business Intelligence systems. They both differ in  competencies and skills, in organizational design (Account management, Exploitation, Development, Maintenance), technical architecture (tools, version management, security, performance etc..) and cultural aspects.

Business metadata
A small sidenote on the metadata part, the business metadata part in particular (definitions, domain values, etc..). I see a lot of EDW architectures where the responsibility for the registration, administration and publication of the business metadata is put on the shoulders of the DWH team. In the above graphic of the new generation EDW and considering this blog post, this is not the right approach. Those that create the authentic data should also take care of the business metadata. It's their job! And yes, this extends the Enterprise Data Warehouse big time! This entails the Enterprise Data Management; all data and probably also all services within the organization. Meta data Administration should be an existing function in any organisation (remember the scale - I am not talking about midsize companies here!) The Data Warehouse team however does have a responsibility for registering, administring and publicing the business metadata for the datamart.   

To summarize:

  1. Those who create the data, also own the data - all the way through the enterprise;
  2. If this data is changed/cleansed, they still own the raw data as well as the enriched/cleansed data. But
    • they can not be held responsible for the business rule or the definition. Ownership of these rules/definition can transfer to a definition owner or the user that is asking for the data.
    • they can not be held accountable for the DWH 'service'. This 'service' is owned by the DWH team or any-which-way-you-wanna-call-this-organizational-unit

This is a blog post - so I am allowed not to be thorough, scientificly correct etc...So, I am leaving out a lot of nuances, restrictions and pre-requisites. Let me just give you a few. There are some major Enterprise Architectural principles that need to be considered here.
1 - Data must be decoupled fromt its application/proces (huge one!!!)
2 - Ownership of data cascades all the way through its use within the organization
3 - The data warehouse hub MUST register authentic, factual data
4 - The data warehouse hub is to be designed in such a way that it supports a federated deployment (worth a whole new blog post) - without this one; forget it.
5 - Interfaces between source data and staging must be standardized
6 - Metadata Administration must be implemented enterprise wide
7 - Release management for CDW and datamarts need to be setup
8 - ...

Most of these pre-requisites I have written down in two articles which can be downloaded here.

Just a small brain dump I wanted to share with you guys, just give me your 2 cents on this one.

* a small level of data integration is necessary

Posted June 8, 2009 12:28 AM
Permalink | 3 Comments |



As I'm embarking on somthing similar, I found they key issue hiding in what is called semantical equivalence and its function within the business. The business needs to sign of on semantical equivalent transformations. When this happens you can easily tell who owns what data (in which store, application or service) in any data centric enviroment, not just a datawarehouse or BI system.

Martijn, thanks for posting. I like the phrase semantic equivalance a lot. Let's take an example;

Controlling owns the financial data and marketing is using it - for some reason - in some devious way (using all kinds of fuzzy business rules). Who owns the result of these transformations?

Semantically it is probably completely different (no equivalance). In the post above; the business rules (and thus the output) are owned by marketing, but the input data is still owned by finance. So, if the output data is wrong - who is to be held accountable; if the transformation has been executed correcty that would be finance.

I think we are on the same path here - am I correct?

Yes, but it is far more devious;)

All our transformations that we (that is we BI/OLTP reportes, modellers etc etc) create should be seen as semantically equivalent *TO A CERTAIN DEGREE*. In fact I consider this THE magic want all data /information specialists wave, be they theorists, dataminers or SQL hackers.
Transformations that bear *no* equivalence hold no value for the customer because he cannot relate to the information he wants to extract using the transformations.
So even suspicous and fuzzy transformations hold some equivalent information. If they did not you would not need the data in the first place!

In the event of accountability there is of course a lot more. Consider a Semantical equivalence transformation Hierarchy that ranges from 0(factual duplication) to 10 (Many2Many Fuzzy Matching sans linaging, e.g. Neural Nets with Multi in and output). In the Hierarchy is a treshold under which the business talks about the "same data" and beyond that the data "differs". There is usually also a grey area. Accountability means to define unto which level you accept transformations to be tracable and accountable, and beyond that it shoud be considered new data (from the accountability standpoint). This given the actual transformations this agreement defines who owns the (derived/transformed) data.

(I hope this is enough explanation?)

Leave a comment