Always a tricky question and every organization is answering this differently. It is however an important one. Those who own the data are responsible for its quality, right? Not a light-harded question if you consider the compliancy pressure these days and the issue of clear responsibilities regarding the data. Those who own the data are responsible for its data whereever it is used within the organization. Is the latter one really true?
In my article that I published in september 2008 I strongly advise to register authentic factual data in the Central Data Warehouse. Business rules should be implemented downstream, after the Central warehouse.
Who owns the data in the Central Data Warehouse? Is it the BICC? No, they are not owner. Since 'we' store authentic factual data in the Central Data Warehouse (we do not change, enrich or integrate* it!) the owner of the data should still be the same as the owner of the source. Let me put it more simple:
The people that create the authentic data also own the data, wherever it goes.
Classic architecture
Let
me just highlight the importance of architecture here. What would
happen if I made a classic/old style hub/spoke data warehouse. Where a
central data warehouse is developed in which the data is neatly
integrated, cleansed etc. In other words, data is changed on the way
into the Central Data Warehouse. Or maybe even changed on the way into
the staging area!! Well, taking into account the above rule, the data
warehouse team, that creates/changes the data, becomes the owner now.
The result of the latter - more classic - data warehouse architectures is that there are two owners of the data and that its decoupled between the data created by application and the data coming into the staging. The above is a classic case in the majority of organizations. It creates massive problems in governing your data warehouse, especially your change management. What happens of a change occurs in the source data? Is the source owner responsible for cascading the change to the data warehouse? Well, he and the data warehouse team probably got some kind of SLA stipulating the source owner to signal the data warehouse team that a change is eminent.....The Data Warehouse team needs to go to work. What happens if you have 50 sources, or even 100. What happens if they are big changes....chaos and the sustainability of your data warehouse is in great risk.
New Architecture
So how does it look like in the new generation Enterprise Data Warehouses?
Now,
what happens if a change occurs in the data? The owner of the data is
going to do an impact analysis on all its interfaces he owns and he is
responsible for. He is also responsible in the new generation EDW for
engineering the change up untill the central data warehouse!!! This is
a hugely more manageable governance model regarding change mangement in
an Enterprise Data Warehouse.
Is the source owner also owner of
data coming into the datamarts? Yes, this is however a tricky one.
Integration, cleansing is taking place downstream, between the Central
Data Warehouse and the datamarts. In the second article Lidwine van As
and I published in Database Magazine (november 2008) we state that this
part of the EDW is pushed by demand (where as data getting into the
Central Data Warehouse is pushed by supply); in other words, those who
demand put up the requirements; the business rules they want to apply
on the factual data.
In datamarts, data is changed according to rules defined by a user. Is the intial owner of the data still accountable for this data? Well...yes they are. But...The rules being used - going into the datamarts - are not their responsibility. If somebody comes up with a highly fantastic rule for calculating turnover, I would say that Finance should have the ownership on this formula. But I am embarking on a whole other subject here. The subject of definition ownership...let's not go there.
IT Artificact and boundary; Data Warehouse and Business Intelligence
As
you can see in the above pictures, the boundaries of the IT system has
changed. The boundary of the IT system is not the system itself. The
boundary is determined by the propagation of its data. The data
warehouse is to be regarded as an integral part of the IS environment.
You could also say that the data warehouse is evolving into a functional interface
on top of the operational-transactional systems. This functional
interface rationalizes the data structure of these systems and can
thereby serve other functionalities like Business Intelligence. It does
not have to be perse Business Intelligence! It can also be used for
data sharing to third parties (e.g. co-making), data quality projects,
operational control, accounting (remember; it is factual, auditable
data), etc..
This shift in system-boundary is also acknowledging
that building sustainable data structures differs from building
succesfull Business Intelligence systems. They both differ in
competencies and skills, in organizational design (Account management,
Exploitation, Development, Maintenance), technical architecture (tools,
version management, security, performance etc..) and cultural aspects.
Business metadata
A small sidenote on the metadata part, the business metadata
part in particular (definitions, domain values, etc..). I see a lot of
EDW architectures where the responsibility for the registration,
administration and publication of the business metadata is put on the
shoulders of the DWH team. In the above graphic of the new generation
EDW and considering this blog post, this is not the right approach.
Those that create the authentic data should also take care of the
business metadata. It's their job! And yes, this extends the Enterprise
Data Warehouse big time! This entails the Enterprise Data Management;
all data and probably also all services within the organization. Meta
data Administration should be an existing function in any organisation
(remember the scale - I am not talking about midsize companies here!)
The Data Warehouse team however does have a responsibility for
registering, administring and publicing the business metadata for the
datamart.
To summarize:
- Those who create the data, also own the data - all the way through the enterprise;
- If this data is changed/cleansed, they still own the raw data as well as the enriched/cleansed data. But
- they can not be held responsible for the business rule or the definition. Ownership of these rules/definition can transfer to a definition owner or the user that is asking for the data.
- they
can not be held accountable for the DWH 'service'. This 'service' is
owned by the DWH team or
any-which-way-you-wanna-call-this-organizational-unit
This is a blog post - so I am allowed not to be
thorough, scientificly correct etc...So, I am leaving out a lot of
nuances, restrictions and pre-requisites. Let me just give you a few.
There are some major Enterprise Architectural principles that need to
be considered here.
1 - Data must be decoupled fromt its application/proces (huge one!!!)
2 - Ownership of data cascades all the way through its use within the organization
3 - The data warehouse hub MUST register authentic, factual data
4
- The data warehouse hub is to be designed in such a way that it
supports a federated deployment (worth a whole new blog post) - without
this one; forget it.
5 - Interfaces between source data and staging must be standardized
6 - Metadata Administration must be implemented enterprise wide
7 - Release management for CDW and datamarts need to be setup
8 - ...
Most of these pre-requisites I have written down in two articles which can be downloaded here.
Just a small brain dump I wanted to share with you guys, just give me your 2 cents on this one.
Posted June 8, 2009 12:28 AM
Permalink | 3 Comments |




Ronald,
As I'm embarking on somthing similar, I found they key issue hiding in what is called semantical equivalence and its function within the business. The business needs to sign of on semantical equivalent transformations. When this happens you can easily tell who owns what data (in which store, application or service) in any data centric enviroment, not just a datawarehouse or BI system.
Martijn, thanks for posting. I like the phrase semantic equivalance a lot. Let's take an example;
Controlling owns the financial data and marketing is using it - for some reason - in some devious way (using all kinds of fuzzy business rules). Who owns the result of these transformations?
Semantically it is probably completely different (no equivalance). In the post above; the business rules (and thus the output) are owned by marketing, but the input data is still owned by finance. So, if the output data is wrong - who is to be held accountable; if the transformation has been executed correcty that would be finance.
I think we are on the same path here - am I correct?
Yes, but it is far more devious;)
All our transformations that we (that is we BI/OLTP reportes, modellers etc etc) create should be seen as semantically equivalent *TO A CERTAIN DEGREE*. In fact I consider this THE magic want all data /information specialists wave, be they theorists, dataminers or SQL hackers.
Transformations that bear *no* equivalence hold no value for the customer because he cannot relate to the information he wants to extract using the transformations.
So even suspicous and fuzzy transformations hold some equivalent information. If they did not you would not need the data in the first place!
In the event of accountability there is of course a lot more. Consider a Semantical equivalence transformation Hierarchy that ranges from 0(factual duplication) to 10 (Many2Many Fuzzy Matching sans linaging, e.g. Neural Nets with Multi in and output). In the Hierarchy is a treshold under which the business talks about the "same data" and beyond that the data "differs". There is usually also a grey area. Accountability means to define unto which level you accept transformations to be tracable and accountable, and beyond that it shoud be considered new data (from the accountability standpoint). This given the actual transformations this agreement defines who owns the (derived/transformed) data.
(I hope this is enough explanation?)