Blog: Ronald Damhof Subscribe to this blog's RSS feed!

Ronald Damhof

I have been a BI/DW practitioner for more than 15 years. In the last few years, I have become increasingly annoyed - even frustrated - by the lack of (scientific) rigor in the field of data warehousing and business intelligence. It is not uncommon for the knowledge worker to be disillusioned by the promise of business intelligence and data warehousing because vendors and consulting organizations create their "own" frameworks, definitions, super-duper tools etc.

What the field needs is more connectedness (grounding and objectivity) to the scientific community. The scientific community needs to realize the importance of increasing their level of relevance to the practice of technology.

For the next few years, I have decided to attempt to build a solid bridge between science and technology practitioners. As a dissertation student at the University of Groningen in the Netherlands, I hope to discover ways to accomplish this. With this blog I hope to share some of the things I learn in my search and begin discussions on this topic within the international community.

Your feedback is important to me. Please let me know what you think. My email address is Ronald.damhof@prudenza.nl.

About the author >

Ronald Damhof is an information management practitioner with more than 15 years of international experience in the field.

His areas of focus include:

  1. Data management, including data quality, data governance and data warehousing;
  2. Enterprise architectural principles;
  3. Exploiting data to its maximum potential for decision support.
Ronald is an Information Quality Certified Professional (International Association for Information and Data Quality one of the first 20 to pass this prestigious exam), Certified Data Vault Grandmaster (only person in the world to have this level of certification), and a Certified Scrum Master. He is a strong advocate of agile and lean principles and practices (e.g., Scrum). You can reach him at +31 6 269 671 84, through his website at http://www.prudenza.nl/ or via email at ronald.damhof@prudenza.nl.

Wednesday November 16th 2011 Ralph Hughes from Ceregenics was in the Netherlands. Ralph is author of the book 'Agile Data Warehousing: Delivering World-Class Business Intelligence Systems Using Scrum and XP'. Ralph is currently under contract to write more books on the topic of agility in data warehouse development.

I had been in contact with Ralph for some time; he wanted to know more about data vault, getting the facts, how it is actually used, what customers use it, how they develop and deploy, how it contributes to agility and how it impacted the business.

IMG_7815

Of course, anything can be explained in writing or conceptually, but the 'real proof of the pudding, is in the eating'. Opportunity knocked when Ralph was in the Netherlands for his TDWI course on Agile data warehousing. He asked me whether or not I could arrange some customer visits in Amsterdam. Customers that use and deploy Data Vault and have attained a high agree of agility.

Tom Breur and me were hosts for Ralph and we visited the Free University (client of mine) and BinckBank (client of Tom), both in Amsterdam. Hans Hultgren (Genesee Academy) happened to be in the Netherlands that week and joined us as well. We met with both management and technical team members of the university and BinckBank.

Both clients were particularly interesting because their data warehouses are in production and in a mode of constant change. Both clients showed a remarkable predictability and reliability in coping with these changes. Change equated to 'business as usual'. I remember Ralph asking an engineer 'how long does it take to deploy a new data element to the warehouse?' The engineer replied: 'do you want to know the lead-time including my coffee break?'.

Ralph, Tom, me and Hans were impressed with the accomplishments of these clients in getting their data warehouse deployment in control while constantly adding value/changes to the business in a predictable fashion. 

IMG_7828I will not transcribe the whole interview in this blog - that is simply too much - send me a note if you want to know more. Interesting differences between Free University and BinckBank were the fact that they used different automating techniques and also the level of business key integration differed slightly. Free University used templating (generating XML and import in Business Objects Data Services) for data warehouse automation and the data warehouse was driven by business keys. BinckBank used Quipu for data warehouse automation and the data warehouse was partly driven by business key, and some by surrogate key (see also my presentation on the Data Vault advanced seminar about different Data Vault species). In terms of software development methods, BinckBank used the Scrum method and Free University was based on waterfall/iterative with lots of lean practises being used.

I will try to summarize both visits from the perspective of me and Tom, particularly slanted towards Agile software development, by asking my blog readers, three questions: 

  1. Why is it that you can build and deploy extremely small particles in Data Vault and not in other approaches, without having an increase in the overhead and coordination of these particles? In other words; 'Divide and Conquer to beat the Size / Complexity Dynamic'1
  2. Why is it that you can re-engineer your existing model and guarantee that the changes remain local? Something that is hugely beneficial in data warehouses that - by definition - grow over time.
  3. Why is it that - as your (Data Vault based) data warehouse grows - your costs grow 'merely' in linear fashion initially, and as you approach the end state marginal growth in cost decreases exponentially (as opposed to exponential cost increase for Kimball warehouses)?

P1060033
I want to thank Free University as well as BinckBank for offering their time, their energy and enthusiasm to the general cause of knowledge sharing. Of course I want to thank Tom Breur and Hans Hultgren for putting in their time as well. 

My special thanks of course to Ralph Hughes as being an open minded, inquisitive and knowledgeable peer. It was great being your host in the Netherlands. 

 

1 - Gerald M. Weinberg - Quality Software Management - 1992

Photo #1: Left in the corner sits Ralph Hughes, next to him Tom Breur. On the other side the Free University; Jaap Roos (project manager), Dorien Heijting (Data Warehouse Engineer), Erwin Vreeman (Project Lead).

Photo #2: Sitting with the american flag - Ralph Hughes and Hans Hultgren. At the top of the table - BinkBank: Michel Uittenbogaard (Data Warehouse Engineer) and on the right Paul Delgman (BI manager). 

Photo #3: Sitting near the window looking down: me, myself and I


Posted December 9, 2011 12:50 AM
Permalink | No Comments |

Recently a discussion raged on LinkedIn regarding the 'ETL tools that support Data Vault OUT OF THE BOX' (link). I gotta be honest - I was annoyed by the discussion and was stupid enough to display this  by commenting kind of harshly. I would like to apologize to everyone and especially to Daan.

In this blogpost I would like to explain my point of view regarding this question. 

In the above mentioned discussion I commented very briefly 'All ETL tools support Data Vault". Allow me to explain this by paraphrasing an argument that was also used by Daan in the subsequent comments. He mentioned that technology brought about efficiency gains in the last 20 to 30 years. I agree with that, the data is quite clear about it ;-). Trying to explain these gains I leave to applied science, but I would like to take one tiny piece of the puzzle and put it in the context of my remark that 'all ETL tools support Data Vault'.

One of the 'variables' in the function of this tremendous leap - in my opinion - is uniformity. Organizing uniform systems (I use the term 'systems' in the broadest sense - People, Technology, Processes) opened the door towards repeatability, predictability, limiting waste and improving quality. In writing this I think Dr. W. Edwards Deming would agree with me.

Now, back to the subject of ETL and Data Vault. With Data Vault we design the system of modeling and logistics of data in advance. Both go hand in hand. What we want to achieve is uniformaty as much as we possibly can. Uniformity in modeling, balanced with the uniformity in loading. 

Let me elaborate some more.

In Data Vault and more generally speaking, in 'systems thinking', all objects in a system are interrelated. How I construct a data model has a strong impact on the way I (can) construct the loading (ETL). With Data Vault we standardize the data model as much as we can (there are quite some heuristics in Data Vault, it should not be applied in some dogmatic way), in a limited number of constructs (hub, link, sat). But we also design the loading constructs, which are also extremely limited in number (hubload, linkload, satload). Every load construct has got a standardized pattern, see the figure regarding the pattern for a hub load.

Schermafbeelding 2011-09-14 om 09.36.29 
If I were to translate this to SQL it would be something like: INSERT <distinct values> to HUB where NOT EXIST in HUB. Of course any ETL tool would support such a simple construct! Data Vaults are thus being build with SSIS, Informatica, InfoSphere, Business Objects, Pentaho,SAS etc...

Please be advised that the above is a simplified example, in real life the loadpatterns are considerably more complex. However, the principles however remain unchanged;

- A limited number of loading patterns

- The patterns are standardized in type

- The patterns are simple

- The patterns can be executed asynchronous

- The type of patterns can make use of parallel loading

I would like to summarize the above with two words; uniformity and automation. Because of uniformity in modeling and logistics we open the door towards repeatability/automation. Making it a lot cheaper to maintain, but also easy to change or supplement (testability is designed in the system, as well as repairability). Agile software development find great support by these kinds of systems (this is worthy of an entirely new blogpost ;-)).

We now can design a predictable system of loading data in a data model. We have created a uniform structure of the data in the data warehouse, opening the way for more uniformity towards Kimball datamarts as well (be it in-memory, on file, virtualised, etc..).

Uniformity and automation have ignited a wave of innovation in the Netherlands. Innovation led by independent consultants and consultancy firms - that saw great opportunity in the daily problems they face - to take the data logistics to a new level of automation; metadata driven ETL (example open source: Quipu, example commercial: WhereScape).


Posted September 21, 2011 3:44 AM
Permalink | No Comments |

May 2011 I was given an opportunity to be part of the first group of pioneers that field-tested the IQCP program. IQCP stands for Information Quality Certified Professional and is organized by IAIDQ - the International association for Information Quality and Data Quality.  To be fair; I really do not like certifications. It is said that with certification, clients can have some kind of objective measurement regarding knowledge and skill. Well, for the majority of certifications this is utterly deceiving.

That being said, why did I go for IQCP?

For one thing; IAIDQ is a not-for-profit organisation and vendor-neutral. People involved in IAIDQ are people with a passion to spread the word with regard to quality in general and Information Quality in particular. 

Second; the certification process was extremely well prepared, grounded in known research and adheres to widely accepted standards and regulations such as the ISO/IEC 17024 and the Standards for the Accreditation of Certification Programs, published by the USA National Commission for Certifying Agencies (NCCA, 2002).

Third; I was not mainly interested in the certification as such....To put in poetic; I was interested in the journey, not the destination. The IQCP certification is based on a reference list of books and articles,Carrot_and_the_stickcarefully selected by people who know their stuff. The certification was a 'carrot on a stick' to me, pressuring me to actually study and invest time. The deadline was end of July 2011 - I have never in my life read so many books, but I have never in my life been so authentically interested in a subject as well. I just kept reading, because there is so much clever stuff written out there.

Fourth and maybe the most important reason; in may 2011 I also happened to follow the legendary PSL (problem solving leadership) workshop given by Jerry Weinberg,Esther Derby and Johanna Rothman. What I have learned is immense and influences me daily. The root of this workshop is - in my opinion - founded in quality principles; learning, continuous improvement, root-causes, self organizing team, people, communication, understanding the problem, understanding context and so much more. The people I met in this workshop were incredible and a huge inspiration. A lot of the attendees came from the world of context-driven testing and agile software development. Both - in my opinion - strongly rooted in quality principles. 

It somehow all got together when I was asked to participate in the first batch of people to go for IQCP.

For me; IQCP was the means to an end - I have studied amazing books (for a reference list download this pdf), written by people like Edwards Deming, Joseph Duran, Jerry Weinberg, Kaoru Ishikawa, Masaaki Imai, Richard Wang and so many more. My library has grown very rapidly lately and I have still so much more to read (learn). Somehow, all I have read, learned and still learning are pieces of a puzzle that fit seamlessly in my field of expertise #datamanagement #decisionsupport #datagovernance #dataquality #datavault #architecture #softwaredevelopment.

The journey never ends....

My gratitude goes to Tom Breur, IAIDQ, Jerry Weinberg, Esther Derby, Johanna Rothman, those crazy canadians ('it is what it is'), Testsidestory, Olav, Markus, Griffin and so many more.

ps. I passed the exam.....I am now an Information Quality Certified professional 



Posted September 21, 2011 3:33 AM
Permalink | 1 Comment |

I get a lot of questions of people (especially in the United States) about Data Vault. For those who want to familiarize themselves with Data Vault, I hereby offer the links to some papers I wrote (sometimes with others) and a presentation I gave recently (see other post for details):

Published originally in Dutch for Database Magazine, an article that is the first part of a triptych regarding the 'Next generation Enterprise Data Warehousing', this part was an introduction into Data Vault named 'Letting go of the idea of a single version of the truth'  - August 2008 - Download

Published originally in Dutch for Database Magazine, an article that is the second part of a triptych regarding the 'Next generation Enterprise Data Warehousing', this part deals with the post-Data Vault processing, the business rules in particular - November 2008 - Download

Published originally in Dutch for Database Magazine, an article that is the final part of a triptych regarding the 'Next generation Enterprise Data Warehousing', this part deals with 'Development processes in Data Warehouse environments' - June 2009 - link

Published in Belgium for BI-community.org, a keynote article titled: Data Vault, Business Objectives for next generation data warehousing - January 2011 - Download

And finally a link to the presentation I held on the Advanced Data Vault seminar, May 5 & 6 in Baarn, the Netherlands: link

**July 2nd,2011 - Tom Breur wrote a good piece "Tom's ten data tips", very much Data Vault related.


Posted September 21, 2011 3:29 AM
Permalink | No Comments |

"Cleaning the lake or reducing the pollution from the factory" - is an analogy used by Thomas Redman. It perfectly paints a picture of data quality issues 'we' all face in our data management projects. 

In projects we often have to struggle against forces that 'just wanna create the freekin report'. Whether the data is wrong is of no concern. In these instances the goal apparently is the information system ('our DWH is running', 'The report is build' or 'SAP is live') and not the data. Put in other words; data is often treated as a by- product and the information system is the main product.

Lets take a closer look - by using (among others) Richard Wang's analogy with a manufacturing process1:

Schermafbeelding 2011-07-20 om 11.23.21

I have never seen any manager, CEO or foreman happy with a successful implemented assembly line but a lousy product. Have you? In software engineering I sometimes have the feeling we lost touch....

The Information System - be it a Data Warehouse, a report or an ERP - is not the purpose, it is a means to an end. And the end should at least be sufficient data quality (where data quality is defined in the perpective of the customer, fit for his/her task)

The cool thing about the (somewhat) oversimplified analogy of Richard Y Wang is its usefulness for another reason. It stresses the system perspective you gotta have on dealing with data quality issues. You cannot go about 'cleaning your lake while the factory is still polluting'. Producing quality Information Products is executed by means of a system. Now, do not translate this 'system' into 'information system'. This system consists of people, processes and technology. Dealing with data quality issues requires a system perspective to really add value in terms of better quality products and a 'greener' environment.

So - do not blame your ERP department for creating bad data

So - do not blame your report builder for creating useless reports

So - do not blame the person entering the data

Maybe, something to consider; who do you think is accountable for organizing the 'system'? Yes - management should embrace quality in its DNA...

I know I am corny - management should have read Deming, Juran, Crosby in their MBA's. Knowledge that is like half a century old.

 

1 - Richard Wang - A Product perspective on Total Data Quality Management - feb.1998, Communications of the ACM


Posted September 21, 2011 3:26 AM
Permalink | 1 Comment |

1 2 NEXT