Blog: Ronald Damhof I have been a BI/DW practitioner for more than 15 years. In the last few years, I have become increasingly annoyed - even frustrated - by the lack of (scientific) rigor in the field of data warehousing and business intelligence. It is not uncommon for the knowledge worker to be disillusioned by the promise of business intelligence and data warehousing because vendors and consulting organizations create their "own" frameworks, definitions, super-duper tools etc. What the field needs is more connectedness (grounding and objectivity) to the scientific community. The scientific community needs to realize the importance of increasing their level of relevance to the practice of technology. For the next few years, I have decided to attempt to build a solid bridge between science and technology practitioners. As a dissertation student at the University of Groningen in the Netherlands, I hope to discover ways to accomplish this. With this blog I hope to share some of the things I learn in my search and begin discussions on this topic within the international community. Your feedback is important to me. Please let me know what you think. My email address is Copyright 2017 Fri, 16 Aug 2013 05:53:50 -0700 4 Quadrant Model for Data Deployment I have written numerous times about efficient and effective ways of deploying data in organizations. How to architect, design, execute, govern and manage data in an organization is hard and requires discipline, stamina and courage by all those involved. The challenge increases exponentially when scale and/or complexity increases.

On average a lot of people are involved and something of a common understanding is vital and unfortunately often lacking. As a consultant it is a huge challenge getting management and execution both on the same page, respecting the differences in responsibilities and the level of expertise regarding the field we are operating in.

An abstraction functions as a means of communication across the organization. An abstraction to which the majority can relate to and that can be used to manage, architect, design, execute, govern and manage the field that is at stake.

For data deployment I came up with the so-called '4 Quadrant model' (4QM).

]]> Fri, 16 Aug 2013 05:53:50 -0700
You are not a customer........ W.Edwars Deming once said: "The customer is the most important part of the production line."

In terms of engineering information-assets (getting relevant data to the end-user and doing valuable stuff with it) this still holds big value, but...there is a misconception I would like to stipulate. The misconception that users in an organization are 'customers'. I hear that quite a lot and I find it disturbing and dangerous.

Please, please, never treat the recipients or users of information assets within your organization a customer. These so-called customers are paid by the same people as you, whether you are an engineer, designer, architect, manager or the freakin coffee machine. You and this so-calledcustomer are part of an organisation....

According to wikipedia, an organisation is a social entity that has a collective goal and is linked to an external environment. 

... joined by collective goals - that is why you are organised in a single entity called 'organisation'. Customer are those people or entities in the external environment - outside the organisation. Got it? 

These users, that are sometimes referred to as 'customers' are, as Deming intended, a vital part of the production process of Information assets. But with users there is more....They need to be involved and are as accountable as anyone in the organisation involved in making relevant information-assets with scarce resources.

These users can not say 'I am a customer, thou have to listen to me' or 'the data sucks, you gotta fix it' or 'the data does not reconcile, I will never use this again' or 'I know we have tool X, but I just acquired tool Y'. It is a jointly effort with joint accountabilities between engineer and user, where both are joint in the common goals of the organization.

So, if you are a BICC manager, an ETL developer, a data modeler, a BI consultant or whatever, and someone comes up to you saying 'I am a customer and I want X' - you know what to say.

Stop treating internal users of information assets as customers. 


]]> Sun, 23 Jun 2013 04:21:10 -0700
A plea for a more data-oriented approach towards design & architecture This blog has been inspired by Martijn Evers blogpost, by Barry Devlin's work on Business Integrated Insight, by some very forward thinking customers and is bugging me for some time now.

Back in 2005 I gave masterclasses 'Data Warehousing in Depth' at the knowledge institute CIBIT in the Netherlands. One part of the class was pondering about the future of data warehousing. It was my favorite part and I remember that I always reminded the class as to the root-reason for a data warehouse; a physical construct to overcome deficiencies like performance, history, auditability, data quality, usability, ...

And I remember drawing two circles; one for data in the operational environment and one for data in the informational environment. Both represented physical environments, in other words; the data needed to physically move from the operational to the informational environment.

What would be necessary to blend the two environments, I then asked;

  •  "Sources need to maintain history and adhere to strict auditability rules"
  • "More and faster hardware and parallelization options in both hard- and software"
  • "A common vocabulary of data, reflected in the operational systems"
  • "Enterprise-wide data like master- and reference data needs to be maintained pro-active (upstream) and centrally"
  •  ..

These are all still very valid points, but I would add a more profound one:

Data Virtualization guru's preach to us the Information Hiding1 pattern. A pattern perfectly suited for the decoupling information systems and their data. These Data Virtualization guys and gals say that the software that supports this data virtualization is the new pinnacle for decoupling operational and informational environments and make the Data Warehouse2 (eventually) obsolete. 

My opinion; 'they' are right and they are wrong. Yes, Information hiding is a pattern (there are more) that enables decoupling of the information environment and the operational environment. But this distinction is somewhat flawed - it is a distinction that originated from the early 90's that reflected deficiencies of using registered data for decision support activities.

We might wanna make an effort in trying to get rid of these deficiencies....

In the last 20-30 years, the data-model and its instantiations (the data) where directly based on the Information systems. The data-model and the data were by-products. The data-model fitted the information system. The Informational environment was born to overcome the deficiencies that came with this approach.

I want to make a plea for shifting this process-oriented thinking and designing to data-oriented thinking and designing. Make the data smarter instead of making countless little/big information systems with their own 'data-store'. It is not that odd; information systems and the business processes they support are so much more susceptible to change than the data is. Data is - in its very nature - extremely stable over time.

This plea is not new; start with an Information Model of your business, construct a conceptual model and slowly design your way down to the logical, physical model and/or canonical model3. Information systems are now made to fit the data-architecture and not vice versa. These information systems are somewhat decoupled from the data architecture. These information systems need to use the Information Hiding pattern.

Now, the principle of Information Hiding and the technology that can handle this pattern can flourish. Data Virtualization technology can be used to its full potential, but the same applies to BPM technology or technology which is based on service oriented architectures.

So, now we have the following list of requirements that are necessary to blend operational- and information environments:

1.     A data-oriented design and architecture of information system as opposed to a process oriented design and architecture

2.     "Sources need to maintain history and adhere to strict auditability rules"

3.     "More and faster hardware and parallelization options in both hard- and software"

4.     "A common vocabulary of data, reflected in the operational systems"

5.     "Enterprise-wide data like master- and reference data needs to be maintained pro-active (upstream) and centrally" 


And I will add another three:


6.     Centralized design and maintenance of business rules

7.     Data security and privacy law and regulations are enforced on the data-level.

8.     An organizing framework for establishing strategy, objectives, and policies for (corporate) data4


It is impossible to be completely thorough in this list, there are indeed many more, but this is a blog.....not a book ;-)

The mentioned criteria can be mapped to a discipline that is about to reach a critical mass in terms of body of knowledge, technology support, rigor in science and relevance in practice; Data Management and Data Governance. 

Back in 2005, in my masterclasses, I mused over the future of data warehousing. The future is now, the journey will not be easy, but the rewards are substantial. It truly is the future of the Sense and Respond5 organization.



David L. Parnas, 1972, On the criteria to be used in decomposing Systems into modules

2 Data Virtualization vendors often claim to make the Enterprise Data Warehouse obsolete (referencing to the Boulder BI Brain Trust meeting beginning of this year where a vendor made this claim). They confuse a technology (data virtualization software) with an architectural construct.

3 A Canonical model is a design pattern used to communicate between different data format. It needs to be based on the logical model of the organization.

4 Jill Dyche and Evan Levy, note from the blog-post-author; I adapted the quote by adding brackets between 'corporate'

5  Stephen H. Haeckel, 1999, Adaptive Enterprise: Creating and Leading 

]]> Tue, 22 Jan 2013 04:53:45 -0700
The democracy of data There is something going on for some time now, decades even. It all started with the arrival of the Internet where people voluntarily contributed data to, well, everyone who was interested. Data about themselves, their relationships, their adventures, their careers. This data was shared with consent of the owner of the data - although not everyone knew what the data was used for. So one might say that there was consent, but not informed consent.

Lets take it a step further and imagine....

What if our data  - generated by others - was given back to us and we could consent in an informed manner, that we wish to share this data for the greater good? For example; my tax information is my data, it is about me and I want to decide whether or not others can use this data. Or, suppose I can get a hold ofmy location/GPS data, showing all my movements. Or my point of sale data from the grocery store, showing my eating patterns. Suppose I can even get a hold of the data of the last MRI I took, my genome data or the data of my last blood test or even the data of a clinical test I was in?


What if I could decide to contribute this data (consensually) for the public good, where my privacy was Freedom-road-signstill being honoured? What if dozens of people would decide that? What if millions of people would decide that? Clinical research would never be the same again. We would be able to scan for patterns in seas of data consisting of environmental data and healthcare data. No more clinical trials with just 2000 people and ever-increasing smarter statistics. In this setting the healthcare specialists, the quants, the sociologists and the behavioural scientists would have an unprecedented test bed of data. Is there a correlation (or even causality) between aspects of travelling, career, eating patterns, social status and cancer? Suppose even several generations would contribute their data; what would that mean for clinical research? Mindblowing.... 

In the above I discussed data that was about myself and so I should be the one who should decide whether or not to share. But what about data that is ours? The government heavily sponsors research in many countries. Research on biology, behavioural science, economic science, climate science etc.. Shouldn't the data generated by this research be public domain? I think it should...

What about data created by government - which is us. Data about im- and export movements, data regarding employment, schooling, law enforcement, crime, etc.. 


What would this democratization of data mean with regard to innovation? I think it would truly ignite a burst of possibilities and a huge potential for our general wellbeing. And no - I am not referring to the challenge of marketing handbags to middle age ladies (quote somewhat paraphrased from Neil Raden).

No, set the data free to go for the real challenges we face; decreasing poverty, climate control, improving healthcare, scarcity of resources, economic stability and decreasing crime.

This blogpost is hugely inspired by John Wilbanks - google the guy (!) -, all the Open Data initiatives of the world where governmental agencies free up their data, the technological possibilities of data storage, data deployment, data enrichment, data visualization and advanced analytics and finally...this blog is inspired by a deeply felt wish and conviction that our field of knowledge (data management and data utilisation) can make a contribution to a better place for us to live in.

]]> Fri, 04 Jan 2013 08:50:47 -0700
The journey that never ends; the origins of data quality I have always been fascinated by the true origins of modern-day phrases or trends in my domain - Information Management, data management in particular. It is like a challenge I give to myself, a puzzle waiting to be solved. Why you say? Well, Aristotle said it already:

'If you would understand anything, observe its beginning and its development'.

I tend to collect first the modern-day writings about it, mostly by practioners. Then I go to the on-line science libraries and browse through ACM journals, MIS Quarterly journals, European Journal of Information Systems, IBM Systems Journal, Decision Support Systems, Journal of Management Information Systems and lately the journal of Data and Information Quality Research. And I am forgetting a whole lot. But, since the field of information management is a relatively young science, I tend to eventually end up in the more or less classic science domains; psychology, mathematics, engineering, etc.. If I took it any further I would probably end up with philosophy and Theology ;-) and discover the meaning of life.... 

Being on such a quest is like opening up an unprecedented series of presents given to me by brilliant men and women. There is so much out there that can easily be applied to other domains, for example, the information management domain.

With 'Data Quality' the same applied. I started with books of Thomas Redman, aka the data doc, of course Larry English, Danette McGilvray, David Loshin, Jack Olson and also Arkady Maydanchik can not be missed. And one cannot overlook the books written by Yang Lee, Richard Wang and Leo Pipino. The majority of these books however (with the exception of Lee, Wang and Pipino), lack the scientific rigor, the kind of Design Research approach as introduced by Alan Hevner in 2004 (published in MIS Quarterly). And although this type of research is relatively young, there are many scientific based papers out there that more or less adhere to several of the Design Research pre-requisites that aim to have scientific rigor and relevance in practice.

Since 2004 many papers on data quality have been published that are really precious to me, but it just was not good enough for me. I had not reached the true origins yet, so I felt. So I broadened the scope to 'Quality' in general. Quality in a manufacturing/engineering/services context pointed me in the direction of Shewhart, Demming, Juran, Crosby, Feigenbaum, Ishikawa, and also Peter Drucker. Boy - did I enjoy the writing of these guys (sorry, they were all men).

However, I slowly digressed into various domains that opened up Pandora's box; the domain of coping with change, management theory, decision theory, group processes, system theory, system dynamics and much more. And although I studied on a university, economics, this was all new to me. 

Still not sure whether I have not being paying attention back in college or my university just sucked.

In between I entered into the field of Quality Software Management, not that odd I would say; on an abstract level one might argue that it is the sum of the above combined with software engineering and my own professional domain and the projects I undertook. Back then (and I still do) I felt that Gerald (Jerry) Weinberg seemed to have captured the soul of all these quality people combined with system theory, system dynamics, software engineering, a profound human perspective and a keen view on leadership and management (and why many current management models simply disfunction).

If anyone want to really go on a quest regarding 'agile software development'; do not bother, start by reading the books of Jerry Weinberg. You will not find the word 'agile', but you will recognize it.

These books (and he wrote a whole lot) put me on a roller-coaster (which I am still on) that included exploratory testing, self-organizing teams, leadership, Kanban/Scrum/XP, CMMI, Six Sigma, etc...

I have so many books now, so many papers, so many subjects, so many loose is ridiculous.

And it all started with 'data quality'....

Am I done yet?

Hell no

Will I ever be done?

Hell no

Is it fun?

Hell yes

I need a second life, and a third...

]]> Sat, 15 Dec 2012 03:50:19 -0700
Agile Data Warehousing - Ralph Hughes visits Data Vault customers in Amsterdam

Wednesday November 16th 2011 Ralph Hughes from Ceregenics was in the Netherlands. Ralph is author of the book 'Agile Data Warehousing: Delivering World-Class Business Intelligence Systems Using Scrum and XP'. Ralph is currently under contract to write more books on the topic of agility in data warehouse development.

I had been in contact with Ralph for some time; he wanted to know more about data vault, getting the facts, how it is actually used, what customers use it, how they develop and deploy, how it contributes to agility and how it impacted the business.


Of course, anything can be explained in writing or conceptually, but the 'real proof of the pudding, is in the eating'. Opportunity knocked when Ralph was in the Netherlands for his TDWI course on Agile data warehousing. He asked me whether or not I could arrange some customer visits in Amsterdam. Customers that use and deploy Data Vault and have attained a high agree of agility.

Tom Breur and me were hosts for Ralph and we visited the Free University (client of mine) and BinckBank (client of Tom), both in Amsterdam. Hans Hultgren (Genesee Academy) happened to be in the Netherlands that week and joined us as well. We met with both management and technical team members of the university and BinckBank.

Both clients were particularly interesting because their data warehouses are in production and in a mode of constant change. Both clients showed a remarkable predictability and reliability in coping with these changes. Change equated to 'business as usual'. I remember Ralph asking an engineer 'how long does it take to deploy a new data element to the warehouse?' The engineer replied: 'do you want to know the lead-time including my coffee break?'.

Ralph, Tom, me and Hans were impressed with the accomplishments of these clients in getting their data warehouse deployment in control while constantly adding value/changes to the business in a predictable fashion. 

IMG_7828I will not transcribe the whole interview in this blog - that is simply too much - send me a note if you want to know more. Interesting differences between Free University and BinckBank were the fact that they used different automating techniques and also the level of business key integration differed slightly. Free University used templating (generating XML and import in Business Objects Data Services) for data warehouse automation and the data warehouse was driven by business keys. BinckBank used Quipu for data warehouse automation and the data warehouse was partly driven by business key, and some by surrogate key (see also my presentation on the Data Vault advanced seminar about different Data Vault species). In terms of software development methods, BinckBank used the Scrum method and Free University was based on waterfall/iterative with lots of lean practises being used.

I will try to summarize both visits from the perspective of me and Tom, particularly slanted towards Agile software development, by asking my blog readers, three questions: 

  1. Why is it that you can build and deploy extremely small particles in Data Vault and not in other approaches, without having an increase in the overhead and coordination of these particles? In other words; 'Divide and Conquer to beat the Size / Complexity Dynamic'1
  2. Why is it that you can re-engineer your existing model and guarantee that the changes remain local? Something that is hugely beneficial in data warehouses that - by definition - grow over time.
  3. Why is it that - as your (Data Vault based) data warehouse grows - your costs grow 'merely' in linear fashion initially, and as you approach the end state marginal growth in cost decreases exponentially (as opposed to exponential cost increase for Kimball warehouses)?

I want to thank Free University as well as BinckBank for offering their time, their energy and enthusiasm to the general cause of knowledge sharing. Of course I want to thank Tom Breur and Hans Hultgren for putting in their time as well. 

My special thanks of course to Ralph Hughes as being an open minded, inquisitive and knowledgeable peer. It was great being your host in the Netherlands. 


1 - Gerald M. Weinberg - Quality Software Management - 1992

Photo #1: Left in the corner sits Ralph Hughes, next to him Tom Breur. On the other side the Free University; Jaap Roos (project manager), Dorien Heijting (Data Warehouse Engineer), Erwin Vreeman (Project Lead).

Photo #2: Sitting with the american flag - Ralph Hughes and Hans Hultgren. At the top of the table - BinkBank: Michel Uittenbogaard (Data Warehouse Engineer) and on the right Paul Delgman (BI manager). 

Photo #3: Sitting near the window looking down: me, myself and I

]]> Fri, 09 Dec 2011 00:50:51 -0700
Why Data Vault is supported by all ETL tools

Recently a discussion raged on LinkedIn regarding the 'ETL tools that support Data Vault OUT OF THE BOX' (link). I gotta be honest - I was annoyed by the discussion and was stupid enough to display this  by commenting kind of harshly. I would like to apologize to everyone and especially to Daan.

In this blogpost I would like to explain my point of view regarding this question. 

In the above mentioned discussion I commented very briefly 'All ETL tools support Data Vault". Allow me to explain this by paraphrasing an argument that was also used by Daan in the subsequent comments. He mentioned that technology brought about efficiency gains in the last 20 to 30 years. I agree with that, the data is quite clear about it ;-). Trying to explain these gains I leave to applied science, but I would like to take one tiny piece of the puzzle and put it in the context of my remark that 'all ETL tools support Data Vault'.

One of the 'variables' in the function of this tremendous leap - in my opinion - is uniformity. Organizing uniform systems (I use the term 'systems' in the broadest sense - People, Technology, Processes) opened the door towards repeatability, predictability, limiting waste and improving quality. In writing this I think Dr. W. Edwards Deming would agree with me.

Now, back to the subject of ETL and Data Vault. With Data Vault we design the system of modeling and logistics of data in advance. Both go hand in hand. What we want to achieve is uniformaty as much as we possibly can. Uniformity in modeling, balanced with the uniformity in loading. 

Let me elaborate some more.

In Data Vault and more generally speaking, in 'systems thinking', all objects in a system are interrelated. How I construct a data model has a strong impact on the way I (can) construct the loading (ETL). With Data Vault we standardize the data model as much as we can (there are quite some heuristics in Data Vault, it should not be applied in some dogmatic way), in a limited number of constructs (hub, link, sat). But we also design the loading constructs, which are also extremely limited in number (hubload, linkload, satload). Every load construct has got a standardized pattern, see the figure regarding the pattern for a hub load.

Schermafbeelding 2011-09-14 om 09.36.29 
If I were to translate this to SQL it would be something like: INSERT <distinct values> to HUB where NOT EXIST in HUB. Of course any ETL tool would support such a simple construct! Data Vaults are thus being build with SSIS, Informatica, InfoSphere, Business Objects, Pentaho,SAS etc...

Please be advised that the above is a simplified example, in real life the loadpatterns are considerably more complex. However, the principles however remain unchanged;

- A limited number of loading patterns

- The patterns are standardized in type

- The patterns are simple

- The patterns can be executed asynchronous

- The type of patterns can make use of parallel loading

I would like to summarize the above with two words; uniformity and automation. Because of uniformity in modeling and logistics we open the door towards repeatability/automation. Making it a lot cheaper to maintain, but also easy to change or supplement (testability is designed in the system, as well as repairability). Agile software development find great support by these kinds of systems (this is worthy of an entirely new blogpost ;-)).

We now can design a predictable system of loading data in a data model. We have created a uniform structure of the data in the data warehouse, opening the way for more uniformity towards Kimball datamarts as well (be it in-memory, on file, virtualised, etc..).

Uniformity and automation have ignited a wave of innovation in the Netherlands. Innovation led by independent consultants and consultancy firms - that saw great opportunity in the daily problems they face - to take the data logistics to a new level of automation; metadata driven ETL (example open source: Quipu, example commercial: WhereScape).

]]> Wed, 21 Sep 2011 03:44:17 -0700
IQCP, a means to an end - it is all about quality

May 2011 I was given an opportunity to be part of the first group of pioneers that field-tested the IQCP program. IQCP stands for Information Quality Certified Professional and is organized by IAIDQ - the International association for Information Quality and Data Quality.  To be fair; I really do not like certifications. It is said that with certification, clients can have some kind of objective measurement regarding knowledge and skill. Well, for the majority of certifications this is utterly deceiving.

That being said, why did I go for IQCP?

For one thing; IAIDQ is a not-for-profit organisation and vendor-neutral. People involved in IAIDQ are people with a passion to spread the word with regard to quality in general and Information Quality in particular. 

Second; the certification process was extremely well prepared, grounded in known research and adheres to widely accepted standards and regulations such as the ISO/IEC 17024 and the Standards for the Accreditation of Certification Programs, published by the USA National Commission for Certifying Agencies (NCCA, 2002).

Third; I was not mainly interested in the certification as such....To put in poetic; I was interested in the journey, not the destination. The IQCP certification is based on a reference list of books and articles,Carrot_and_the_stickcarefully selected by people who know their stuff. The certification was a 'carrot on a stick' to me, pressuring me to actually study and invest time. The deadline was end of July 2011 - I have never in my life read so many books, but I have never in my life been so authentically interested in a subject as well. I just kept reading, because there is so much clever stuff written out there.

Fourth and maybe the most important reason; in may 2011 I also happened to follow the legendary PSL (problem solving leadership) workshop given by Jerry Weinberg,Esther Derby and Johanna Rothman. What I have learned is immense and influences me daily. The root of this workshop is - in my opinion - founded in quality principles; learning, continuous improvement, root-causes, self organizing team, people, communication, understanding the problem, understanding context and so much more. The people I met in this workshop were incredible and a huge inspiration. A lot of the attendees came from the world of context-driven testing and agile software development. Both - in my opinion - strongly rooted in quality principles. 

It somehow all got together when I was asked to participate in the first batch of people to go for IQCP.

For me; IQCP was the means to an end - I have studied amazing books (for a reference list download this pdf), written by people like Edwards Deming, Joseph Duran, Jerry Weinberg, Kaoru Ishikawa, Masaaki Imai, Richard Wang and so many more. My library has grown very rapidly lately and I have still so much more to read (learn). Somehow, all I have read, learned and still learning are pieces of a puzzle that fit seamlessly in my field of expertise #datamanagement #decisionsupport #datagovernance #dataquality #datavault #architecture #softwaredevelopment.

The journey never ends....

My gratitude goes to Tom Breur, IAIDQ, Jerry Weinberg, Esther Derby, Johanna Rothman, those crazy canadians ('it is what it is'), Testsidestory, Olav, Markus, Griffin and so many more.

ps. I passed the exam.....I am now an Information Quality Certified professional 

]]> Wed, 21 Sep 2011 03:33:34 -0700
Data Vault Introductions - Download

I get a lot of questions of people (especially in the United States) about Data Vault. For those who want to familiarize themselves with Data Vault, I hereby offer the links to some papers I wrote (sometimes with others) and a presentation I gave recently (see other post for details):

Published originally in Dutch for Database Magazine, an article that is the first part of a triptych regarding the 'Next generation Enterprise Data Warehousing', this part was an introduction into Data Vault named 'Letting go of the idea of a single version of the truth'  - August 2008 - Download

Published originally in Dutch for Database Magazine, an article that is the second part of a triptych regarding the 'Next generation Enterprise Data Warehousing', this part deals with the post-Data Vault processing, the business rules in particular - November 2008 - Download

Published originally in Dutch for Database Magazine, an article that is the final part of a triptych regarding the 'Next generation Enterprise Data Warehousing', this part deals with 'Development processes in Data Warehouse environments' - June 2009 - link

Published in Belgium for, a keynote article titled: Data Vault, Business Objectives for next generation data warehousing - January 2011 - Download

And finally a link to the presentation I held on the Advanced Data Vault seminar, May 5 & 6 in Baarn, the Netherlands: link

**July 2nd,2011 - Tom Breur wrote a good piece "Tom's ten data tips", very much Data Vault related.

]]> Wed, 21 Sep 2011 03:29:41 -0700
For the love of Data (Quality)

"Cleaning the lake or reducing the pollution from the factory" - is an analogy used by Thomas Redman. It perfectly paints a picture of data quality issues 'we' all face in our data management projects. 

In projects we often have to struggle against forces that 'just wanna create the freekin report'. Whether the data is wrong is of no concern. In these instances the goal apparently is the information system ('our DWH is running', 'The report is build' or 'SAP is live') and not the data. Put in other words; data is often treated as a by- product and the information system is the main product.

Lets take a closer look - by using (among others) Richard Wang's analogy with a manufacturing process1:

Schermafbeelding 2011-07-20 om 11.23.21

I have never seen any manager, CEO or foreman happy with a successful implemented assembly line but a lousy product. Have you? In software engineering I sometimes have the feeling we lost touch....

The Information System - be it a Data Warehouse, a report or an ERP - is not the purpose, it is a means to an end. And the end should at least be sufficient data quality (where data quality is defined in the perpective of the customer, fit for his/her task)

The cool thing about the (somewhat) oversimplified analogy of Richard Y Wang is its usefulness for another reason. It stresses the system perspective you gotta have on dealing with data quality issues. You cannot go about 'cleaning your lake while the factory is still polluting'. Producing quality Information Products is executed by means of a system. Now, do not translate this 'system' into 'information system'. This system consists of people, processes and technology. Dealing with data quality issues requires a system perspective to really add value in terms of better quality products and a 'greener' environment.

So - do not blame your ERP department for creating bad data

So - do not blame your report builder for creating useless reports

So - do not blame the person entering the data

Maybe, something to consider; who do you think is accountable for organizing the 'system'? Yes - management should embrace quality in its DNA...

I know I am corny - management should have read Deming, Juran, Crosby in their MBA's. Knowledge that is like half a century old.


1 - Richard Wang - A Product perspective on Total Data Quality Management - feb.1998, Communications of the ACM

]]> Wed, 21 Sep 2011 03:26:34 -0700
We need to be flexible - Do we? Really? In 1967 Thompson wrote about the administrative paradox; a dichotomy where continuity (stability) and flexibility are positioned at both ends of the spectrum. In other words; be flexible and at the same time try to progressively eliminate or absorb uncertainty. This paradox can also be discussed in terms of time; in the short run administration seeks to reduce uncertainty. In the long run, the administration strives for flexibility.

iStock_000011274573XSmall-1.jpgNothing new I hope? Now, what about Information Systems...

In using Information Systems we also need to deal with this paradox. We tend to use Information systems to automate tasks, formalize sequences of events, kill flexibility (;-)). An Information System can be interpreted as a 'bureaucrat in an electronic version'(Checkland and Holwell, 1998).

So, what do we do? We tend to modularize information systems, integrate them via services that are of course strongly decoupled with each other. IT delivers and supports all kinds of business functions and with a brilliant Service Oriented Architecture we cross the bridge between function and business process. We can now change the business processes if demand requires it.

Yee - we=happy. we=flexible again. Easy huh?

NO, it is not easy. It can be an open check you write to your 'partners' - the System Integrators, it may takes years before you capitalize on the investment that has been made. And in the process you tend to demotivate your own personnel (or customers) big time

My point; the balance between stability and flexibility is sometimes totally lost in organizations. Some architects and many vendors/solution providers are pushing the flexibility agenda big time nowadays, but the 'why' of flexibility has never been fundamentally discussed with(in) top management. The 'why' should be related to the industry your in and the strategy you wish to approach the market with. For example; I believe firmly that many government agencies should focus on stability over flexibility. But unfortunately, they seem not to agree with me. And what also needs to be considered is that stability and flexibility are interconnected; more focus on flexibility will diminish your stability and vice versa. Accept collateral damage if you architecture is all centered around 'being flexible', if you want both, well, you can not and expect to pay a price ;-)

Even if the case for flexibility is made, the 'how' should be extremely careful considered. Is flexibility in business processes needed (hard question)? Or is flexibility in data sufficient (which is a huge difference in terms of attainability, costs and organizational impact), but the latter can overcome the Administrative Paradox at least partly....

]]> Mon, 27 Sep 2010 23:59:26 -0700
Change always comes bearing gifts
A story.....
  • Vendor X sells its ERP to a company in Healthcare;
  • Client wishes to setup its informational environment (data sharing, BI, CPM etc..) right from the start;
  • Vendor X pushes the 'standard' solution' they sell;
  • Client decides to decouple their informational environment from its source(s) for several reasons (heterogeneous sources, sustainability, compliance, adaptability etc..);
  • Vendor X deploys their ERP;
  • Client starts to design and build the informational environment;
  • Interfaces between ERP of vendor X and the informational environment are developed;
  • The ERP of vendor X off does not offer functional interfaces ('X keeps pushing their standard product'), so client needs to connect on the physical level;
  • Going-live is near; of both the ERP and the new informational environment

And then change management of vendor X regarding the ERP kicks in.

Client: 'What's your release schedule for patches'?
X: 'Every 2 weeks' 
Client: 'Huh'?

Client thinks: 'Damn, how can I keep up with this change schedule?'

Client: 'Well, can you tell me anything regarding the average impact of these patches?'
X: 'Well, they can be very small and very big' 

Client thinks: 'Ok, what are you NOT telling me' 

Client:'Ok, but this ERP is like 15 years old, so give me an overview of the average impact'
X: 'Basically anything can happen'

Client thinks: 'o, o'

Client: 'Ok, but the majority of these changes are of course situated in the application layer, not the data layer?'
X: 'Well..anything can happen.'

Client thinks: 'Is it warm in here?'

Client: 'Anything? Also in the data layer? Table changes, integrity changes, domain type changes, value changes?'
X: 'Aye'

Client thinks: 'Ok - I'm dead'

Client: ' least tell me that existing structures always remain intact and the data remains to be auditable - extent instead of replace for example'
X: 'Huh'?

Client thinks: 'Well, at least I am healthy...'

Client: 'hmm...just a side note, we use Change Data Capture, I assume that these changes are fully logged?'
X: 'Nah - log is turned off, otherwise we can't deploy the changes' 

Client thinks: ' my resume up to date?'

My point; do not assume your vendor (of any system) to engage in professional application development and a change management policy that takes into account the simple fact that data of these information systems need to be shared with other information systems in your company.

Change management and professional application development needs to be important criteria regarding the selection of information systems.

]]> Tue, 08 Jun 2010 14:29:39 -0700
Collaboration software - fluff? Business Intelligence vendors seem to embrace collaboration (I am still struggling whether this software is any different from the groupware we had in the 90's) . As an example please take a look at SAP streamwork at youtube. I am gonna be blunt here; this type of software is completely useless, unless the organization is willing to fundamentally change its decision making process.

Let me try to make my point here with the help of giants like Galbraith, Daft, Davenport and some..

There are basically two information contingencies; Uncertainty and Equivocality.
  • Uncertainty can be defined as the absence of information (e.g. Shannon and weaver) and can be overcome by simply asking the right question. The answer is out there.....
  • Equivocality is an ambiguity, the existence of multiple and conflicting interpretations about an organizational situation. Participants are not even sure about the questions that need to be asked, let alone the answers they need. I think this can also be regarded as 'wicked problems'.
Now, for overcoming uncertainty you can suffice with relatively blunt instruments. Reporting and the ever increasing possibilities in analytics really shine in reducing uncertainty.

Now, for overcoming Equivocality the Business Intelligence stuff like reporting and even analytics have diminishing usage. You need more 'richness' in the tooling. And with tooling I don't necessarily mean software. Examples of more rich tooling are group meetings, discussions, planning, creative (group) thinking, etc..Simply put; you need face-to-face contact.
Davenport wrote an article about 'Make Better Decisions' in the Harvard Business Review in 2009. He is advocating a more formalized approach towards decision making:

'Smart organizations can help their managers improve decision making in four steps: by identifying and prioritizing the decisions that must be made; examining the factors involved in each; designing roles, processes, systems, and behavior to improve decisions; and institutionalizing the new approach through training, refined data analysis, and outcome assessment.'

Davenport, in my opinion, is aiming towards the equivocality and a more formalized method of coming to an outcome. And frankly, I like it a lot. But organizations need to really be willing to change its decision making process. And this is a major organizational and cultural change in my opinion. If organizations are really committed (Davenport is naming a few of those companies - like Chevron, The Stanley Works) in making this change, collaboration software has the potential to shine in supporting such a decision making process.

I am however afraid that collaboration software from BI vendors will be sold as candy with the promise of better decisions. And that is just bullshit and my prediction is that it will fail big time. 
]]> Tue, 30 Mar 2010 12:17:36 -0700
Outsourcing DSS is not the same as OLTP What we all knew was true, but could not get across to management, is now more scientifically proven. The decision process regarding the outsourcing of a DSS is influenced by significantly other characteristics, when compared to OLTP. If you are interested in the details, the theory and the underlying data, please read:

Factors considered when outsourcing an IS system:an empirical examination of the impacts of organizations size, Strategy and the object of a decision (DSS or OLTP).

B.Berg and A.Stylianou in the European Journal of Information systems (2009 18, 235-248)

I still encounter organizations who are stuck in the OLTP world, even when the object of decision regarding outsourcing is completely different on many dimensions. They tend to use the same decision process regarding the outsourcing as they always did...whether they outsource an ERP, a CRM system a data warehouse or a more elaborate BI system.

]]> Mon, 08 Feb 2010 07:47:39 -0700
Disruption incoming?

I have just read a very intriquing paper called 'A Common Approach for OLTP and OLAP using an In-memory Column Database', written by Hasso Plattner. 

It's not a revolutionairy new technical approach for Data Warehousing a Business Intelligence. It's just a series of smaller (mostly technical and some are even quite old) innovations that together could lead to a paradigma shift [1] in the area of Data Warehousing.

This paper is focussing on the transactional world, because that's where the disruption will originate. In short;

  • Ever increasing multi-CPU cores
  • Growth of main memory
  • The I/O revolution - SSD's
  • Column databases for transactions (!)
  • Shared Nothing approach
  • In-memory access to actual data - historic data on slower devices (or not)
  • Zero-update strategies in OLTP (recognizing the imporance of history as well as the importance of parallelism)
  • Not in the paper; but I see datamodels for newly build OLTP systems increasingly resembling the datamodels of the HUB in the data warehouse architecture. 
]]> Sat, 12 Dec 2009 04:13:40 -0700