Blog: Ronald Damhof Subscribe to this blog's RSS feed!

Ronald Damhof

I have been a BI/DW practitioner for more than 15 years. In the last few years, I have become increasingly annoyed - even frustrated - by the lack of (scientific) rigor in the field of data warehousing and business intelligence. It is not uncommon for the knowledge worker to be disillusioned by the promise of business intelligence and data warehousing because vendors and consulting organizations create their "own" frameworks, definitions, super-duper tools etc.

What the field needs is more connectedness (grounding and objectivity) to the scientific community. The scientific community needs to realize the importance of increasing their level of relevance to the practice of technology.

For the next few years, I have decided to attempt to build a solid bridge between science and technology practitioners. As a dissertation student at the University of Groningen in the Netherlands, I hope to discover ways to accomplish this. With this blog I hope to share some of the things I learn in my search and begin discussions on this topic within the international community.

Your feedback is important to me. Please let me know what you think. My email address is Ronald.damhof@prudenza.nl.

About the author >

Ronald Damhof is an information management practitioner with more than 15 years of international experience in the field.

His areas of focus include:

  1. Data management, including data quality, data governance and data warehousing;
  2. Enterprise architectural principles;
  3. Exploiting data to its maximum potential for decision support.
Ronald is an Information Quality Certified Professional (International Association for Information and Data Quality one of the first 20 to pass this prestigious exam), Certified Data Vault Grandmaster (only person in the world to have this level of certification), and a Certified Scrum Master. He is a strong advocate of agile and lean principles and practices (e.g., Scrum). You can reach him at +31 6 269 671 84, through his website at http://www.prudenza.nl/ or via email at ronald.damhof@prudenza.nl.

September 2011 Archives

Recently a discussion raged on LinkedIn regarding the 'ETL tools that support Data Vault OUT OF THE BOX' (link). I gotta be honest - I was annoyed by the discussion and was stupid enough to display this  by commenting kind of harshly. I would like to apologize to everyone and especially to Daan.

In this blogpost I would like to explain my point of view regarding this question. 

In the above mentioned discussion I commented very briefly 'All ETL tools support Data Vault". Allow me to explain this by paraphrasing an argument that was also used by Daan in the subsequent comments. He mentioned that technology brought about efficiency gains in the last 20 to 30 years. I agree with that, the data is quite clear about it ;-). Trying to explain these gains I leave to applied science, but I would like to take one tiny piece of the puzzle and put it in the context of my remark that 'all ETL tools support Data Vault'.

One of the 'variables' in the function of this tremendous leap - in my opinion - is uniformity. Organizing uniform systems (I use the term 'systems' in the broadest sense - People, Technology, Processes) opened the door towards repeatability, predictability, limiting waste and improving quality. In writing this I think Dr. W. Edwards Deming would agree with me.

Now, back to the subject of ETL and Data Vault. With Data Vault we design the system of modeling and logistics of data in advance. Both go hand in hand. What we want to achieve is uniformaty as much as we possibly can. Uniformity in modeling, balanced with the uniformity in loading. 

Let me elaborate some more.

In Data Vault and more generally speaking, in 'systems thinking', all objects in a system are interrelated. How I construct a data model has a strong impact on the way I (can) construct the loading (ETL). With Data Vault we standardize the data model as much as we can (there are quite some heuristics in Data Vault, it should not be applied in some dogmatic way), in a limited number of constructs (hub, link, sat). But we also design the loading constructs, which are also extremely limited in number (hubload, linkload, satload). Every load construct has got a standardized pattern, see the figure regarding the pattern for a hub load.

Schermafbeelding 2011-09-14 om 09.36.29 
If I were to translate this to SQL it would be something like: INSERT <distinct values> to HUB where NOT EXIST in HUB. Of course any ETL tool would support such a simple construct! Data Vaults are thus being build with SSIS, Informatica, InfoSphere, Business Objects, Pentaho,SAS etc...

Please be advised that the above is a simplified example, in real life the loadpatterns are considerably more complex. However, the principles however remain unchanged;

- A limited number of loading patterns

- The patterns are standardized in type

- The patterns are simple

- The patterns can be executed asynchronous

- The type of patterns can make use of parallel loading

I would like to summarize the above with two words; uniformity and automation. Because of uniformity in modeling and logistics we open the door towards repeatability/automation. Making it a lot cheaper to maintain, but also easy to change or supplement (testability is designed in the system, as well as repairability). Agile software development find great support by these kinds of systems (this is worthy of an entirely new blogpost ;-)).

We now can design a predictable system of loading data in a data model. We have created a uniform structure of the data in the data warehouse, opening the way for more uniformity towards Kimball datamarts as well (be it in-memory, on file, virtualised, etc..).

Uniformity and automation have ignited a wave of innovation in the Netherlands. Innovation led by independent consultants and consultancy firms - that saw great opportunity in the daily problems they face - to take the data logistics to a new level of automation; metadata driven ETL (example open source: Quipu, example commercial: WhereScape).


Posted September 21, 2011 3:44 AM
Permalink | No Comments |

May 2011 I was given an opportunity to be part of the first group of pioneers that field-tested the IQCP program. IQCP stands for Information Quality Certified Professional and is organized by IAIDQ - the International association for Information Quality and Data Quality.  To be fair; I really do not like certifications. It is said that with certification, clients can have some kind of objective measurement regarding knowledge and skill. Well, for the majority of certifications this is utterly deceiving.

That being said, why did I go for IQCP?

For one thing; IAIDQ is a not-for-profit organisation and vendor-neutral. People involved in IAIDQ are people with a passion to spread the word with regard to quality in general and Information Quality in particular. 

Second; the certification process was extremely well prepared, grounded in known research and adheres to widely accepted standards and regulations such as the ISO/IEC 17024 and the Standards for the Accreditation of Certification Programs, published by the USA National Commission for Certifying Agencies (NCCA, 2002).

Third; I was not mainly interested in the certification as such....To put in poetic; I was interested in the journey, not the destination. The IQCP certification is based on a reference list of books and articles,Carrot_and_the_stickcarefully selected by people who know their stuff. The certification was a 'carrot on a stick' to me, pressuring me to actually study and invest time. The deadline was end of July 2011 - I have never in my life read so many books, but I have never in my life been so authentically interested in a subject as well. I just kept reading, because there is so much clever stuff written out there.

Fourth and maybe the most important reason; in may 2011 I also happened to follow the legendary PSL (problem solving leadership) workshop given by Jerry Weinberg,Esther Derby and Johanna Rothman. What I have learned is immense and influences me daily. The root of this workshop is - in my opinion - founded in quality principles; learning, continuous improvement, root-causes, self organizing team, people, communication, understanding the problem, understanding context and so much more. The people I met in this workshop were incredible and a huge inspiration. A lot of the attendees came from the world of context-driven testing and agile software development. Both - in my opinion - strongly rooted in quality principles. 

It somehow all got together when I was asked to participate in the first batch of people to go for IQCP.

For me; IQCP was the means to an end - I have studied amazing books (for a reference list download this pdf), written by people like Edwards Deming, Joseph Duran, Jerry Weinberg, Kaoru Ishikawa, Masaaki Imai, Richard Wang and so many more. My library has grown very rapidly lately and I have still so much more to read (learn). Somehow, all I have read, learned and still learning are pieces of a puzzle that fit seamlessly in my field of expertise #datamanagement #decisionsupport #datagovernance #dataquality #datavault #architecture #softwaredevelopment.

The journey never ends....

My gratitude goes to Tom Breur, IAIDQ, Jerry Weinberg, Esther Derby, Johanna Rothman, those crazy canadians ('it is what it is'), Testsidestory, Olav, Markus, Griffin and so many more.

ps. I passed the exam.....I am now an Information Quality Certified professional 



Posted September 21, 2011 3:33 AM
Permalink | 1 Comment |

I get a lot of questions of people (especially in the United States) about Data Vault. For those who want to familiarize themselves with Data Vault, I hereby offer the links to some papers I wrote (sometimes with others) and a presentation I gave recently (see other post for details):

Published originally in Dutch for Database Magazine, an article that is the first part of a triptych regarding the 'Next generation Enterprise Data Warehousing', this part was an introduction into Data Vault named 'Letting go of the idea of a single version of the truth'  - August 2008 - Download

Published originally in Dutch for Database Magazine, an article that is the second part of a triptych regarding the 'Next generation Enterprise Data Warehousing', this part deals with the post-Data Vault processing, the business rules in particular - November 2008 - Download

Published originally in Dutch for Database Magazine, an article that is the final part of a triptych regarding the 'Next generation Enterprise Data Warehousing', this part deals with 'Development processes in Data Warehouse environments' - June 2009 - link

Published in Belgium for BI-community.org, a keynote article titled: Data Vault, Business Objectives for next generation data warehousing - January 2011 - Download

And finally a link to the presentation I held on the Advanced Data Vault seminar, May 5 & 6 in Baarn, the Netherlands: link

**July 2nd,2011 - Tom Breur wrote a good piece "Tom's ten data tips", very much Data Vault related.


Posted September 21, 2011 3:29 AM
Permalink | No Comments |

"Cleaning the lake or reducing the pollution from the factory" - is an analogy used by Thomas Redman. It perfectly paints a picture of data quality issues 'we' all face in our data management projects. 

In projects we often have to struggle against forces that 'just wanna create the freekin report'. Whether the data is wrong is of no concern. In these instances the goal apparently is the information system ('our DWH is running', 'The report is build' or 'SAP is live') and not the data. Put in other words; data is often treated as a by- product and the information system is the main product.

Lets take a closer look - by using (among others) Richard Wang's analogy with a manufacturing process1:

Schermafbeelding 2011-07-20 om 11.23.21

I have never seen any manager, CEO or foreman happy with a successful implemented assembly line but a lousy product. Have you? In software engineering I sometimes have the feeling we lost touch....

The Information System - be it a Data Warehouse, a report or an ERP - is not the purpose, it is a means to an end. And the end should at least be sufficient data quality (where data quality is defined in the perpective of the customer, fit for his/her task)

The cool thing about the (somewhat) oversimplified analogy of Richard Y Wang is its usefulness for another reason. It stresses the system perspective you gotta have on dealing with data quality issues. You cannot go about 'cleaning your lake while the factory is still polluting'. Producing quality Information Products is executed by means of a system. Now, do not translate this 'system' into 'information system'. This system consists of people, processes and technology. Dealing with data quality issues requires a system perspective to really add value in terms of better quality products and a 'greener' environment.

So - do not blame your ERP department for creating bad data

So - do not blame your report builder for creating useless reports

So - do not blame the person entering the data

Maybe, something to consider; who do you think is accountable for organizing the 'system'? Yes - management should embrace quality in its DNA...

I know I am corny - management should have read Deming, Juran, Crosby in their MBA's. Knowledge that is like half a century old.

 

1 - Richard Wang - A Product perspective on Total Data Quality Management - feb.1998, Communications of the ACM


Posted September 21, 2011 3:26 AM
Permalink | 1 Comment |