Blog: Ronald Damhof Subscribe to this blog's RSS feed!

Ronald Damhof

I have been a BI/DW practitioner for more than 15 years. In the last few years, I have become increasingly annoyed - even frustrated - by the lack of (scientific) rigor in the field of data warehousing and business intelligence. It is not uncommon for the knowledge worker to be disillusioned by the promise of business intelligence and data warehousing because vendors and consulting organizations create their "own" frameworks, definitions, super-duper tools etc.

What the field needs is more connectedness (grounding and objectivity) to the scientific community. The scientific community needs to realize the importance of increasing their level of relevance to the practice of technology.

For the next few years, I have decided to attempt to build a solid bridge between science and technology practitioners. As a dissertation student at the University of Groningen in the Netherlands, I hope to discover ways to accomplish this. With this blog I hope to share some of the things I learn in my search and begin discussions on this topic within the international community.

Your feedback is important to me. Please let me know what you think. My email address is Ronald.damhof@prudenza.nl.

About the author >

Ronald Damhof is an information management practitioner with more than 15 years of international experience in the field.

His areas of focus include:

  1. Data management, including data quality, data governance and data warehousing;
  2. Enterprise architectural principles;
  3. Exploiting data to its maximum potential for decision support.
Ronald is an Information Quality Certified Professional (International Association for Information and Data Quality one of the first 20 to pass this prestigious exam), Certified Data Vault Grandmaster (only person in the world to have this level of certification), and a Certified Scrum Master. He is a strong advocate of agile and lean principles and practices (e.g., Scrum). You can reach him at +31 6 269 671 84, through his website at http://www.prudenza.nl/ or via email at ronald.damhof@prudenza.nl.

January 2013 Archives

This blog has been inspired by Martijn Evers blogpost, by Barry Devlin's work on Business Integrated Insight, by some very forward thinking customers and is bugging me for some time now.

Back in 2005 I gave masterclasses 'Data Warehousing in Depth' at the knowledge institute CIBIT in the Netherlands. One part of the class was pondering about the future of data warehousing. It was my favorite part and I remember that I always reminded the class as to the root-reason for a data warehouse; a physical construct to overcome deficiencies like performance, history, auditability, data quality, usability, ...

And I remember drawing two circles; one for data in the operational environment and one for data in the informational environment. Both represented physical environments, in other words; the data needed to physically move from the operational to the informational environment.

What would be necessary to blend the two environments, I then asked;

  •  "Sources need to maintain history and adhere to strict auditability rules"
  • "More and faster hardware and parallelization options in both hard- and software"
  • "A common vocabulary of data, reflected in the operational systems"
  • "Enterprise-wide data like master- and reference data needs to be maintained pro-active (upstream) and centrally"
  •  ..

These are all still very valid points, but I would add a more profound one:

Data Virtualization guru's preach to us the Information Hiding1 pattern. A pattern perfectly suited for the decoupling information systems and their data. These Data Virtualization guys and gals say that the software that supports this data virtualization is the new pinnacle for decoupling operational and informational environments and make the Data Warehouse2 (eventually) obsolete. 

My opinion; 'they' are right and they are wrong. Yes, Information hiding is a pattern (there are more) that enables decoupling of the information environment and the operational environment. But this distinction is somewhat flawed - it is a distinction that originated from the early 90's that reflected deficiencies of using registered data for decision support activities.

We might wanna make an effort in trying to get rid of these deficiencies....

In the last 20-30 years, the data-model and its instantiations (the data) where directly based on the Information systems. The data-model and the data were by-products. The data-model fitted the information system. The Informational environment was born to overcome the deficiencies that came with this approach.

I want to make a plea for shifting this process-oriented thinking and designing to data-oriented thinking and designing. Make the data smarter instead of making countless little/big information systems with their own 'data-store'. It is not that odd; information systems and the business processes they support are so much more susceptible to change than the data is. Data is - in its very nature - extremely stable over time.

This plea is not new; start with an Information Model of your business, construct a conceptual model and slowly design your way down to the logical, physical model and/or canonical model3. Information systems are now made to fit the data-architecture and not vice versa. These information systems are somewhat decoupled from the data architecture. These information systems need to use the Information Hiding pattern.

Now, the principle of Information Hiding and the technology that can handle this pattern can flourish. Data Virtualization technology can be used to its full potential, but the same applies to BPM technology or technology which is based on service oriented architectures.

So, now we have the following list of requirements that are necessary to blend operational- and information environments:

1.     A data-oriented design and architecture of information system as opposed to a process oriented design and architecture

2.     "Sources need to maintain history and adhere to strict auditability rules"

3.     "More and faster hardware and parallelization options in both hard- and software"

4.     "A common vocabulary of data, reflected in the operational systems"

5.     "Enterprise-wide data like master- and reference data needs to be maintained pro-active (upstream) and centrally" 

 

And I will add another three:

 

6.     Centralized design and maintenance of business rules

7.     Data security and privacy law and regulations are enforced on the data-level.

8.     An organizing framework for establishing strategy, objectives, and policies for (corporate) data4

 

It is impossible to be completely thorough in this list, there are indeed many more, but this is a blog.....not a book ;-)

The mentioned criteria can be mapped to a discipline that is about to reach a critical mass in terms of body of knowledge, technology support, rigor in science and relevance in practice; Data Management and Data Governance. 

Back in 2005, in my masterclasses, I mused over the future of data warehousing. The future is now, the journey will not be easy, but the rewards are substantial. It truly is the future of the Sense and Respond5 organization.

 

Footnotes:

David L. Parnas, 1972, On the criteria to be used in decomposing Systems into modules

2 Data Virtualization vendors often claim to make the Enterprise Data Warehouse obsolete (referencing to the Boulder BI Brain Trust meeting beginning of this year where a vendor made this claim). They confuse a technology (data virtualization software) with an architectural construct.

3 A Canonical model is a design pattern used to communicate between different data format. It needs to be based on the logical model of the organization.

4 Jill Dyche and Evan Levy, note from the blog-post-author; I adapted the quote by adding brackets between 'corporate'

5  Stephen H. Haeckel, 1999, Adaptive Enterprise: Creating and Leading 


Posted January 22, 2013 4:53 AM
Permalink | No Comments |

There is something going on for some time now, decades even. It all started with the arrival of the Internet where people voluntarily contributed data to, well, everyone who was interested. Data about themselves, their relationships, their adventures, their careers. This data was shared with consent of the owner of the data - although not everyone knew what the data was used for. So one might say that there was consent, but not informed consent.

Lets take it a step further and imagine....

What if our data  - generated by others - was given back to us and we could consent in an informed manner, that we wish to share this data for the greater good? For example; my tax information is my data, it is about me and I want to decide whether or not others can use this data. Or, suppose I can get a hold ofmy location/GPS data, showing all my movements. Or my point of sale data from the grocery store, showing my eating patterns. Suppose I can even get a hold of the data of the last MRI I took, my genome data or the data of my last blood test or even the data of a clinical test I was in?

Imagine....

What if I could decide to contribute this data (consensually) for the public good, where my privacy was Freedom-road-signstill being honoured? What if dozens of people would decide that? What if millions of people would decide that? Clinical research would never be the same again. We would be able to scan for patterns in seas of data consisting of environmental data and healthcare data. No more clinical trials with just 2000 people and ever-increasing smarter statistics. In this setting the healthcare specialists, the quants, the sociologists and the behavioural scientists would have an unprecedented test bed of data. Is there a correlation (or even causality) between aspects of travelling, career, eating patterns, social status and cancer? Suppose even several generations would contribute their data; what would that mean for clinical research? Mindblowing.... 

In the above I discussed data that was about myself and so I should be the one who should decide whether or not to share. But what about data that is ours? The government heavily sponsors research in many countries. Research on biology, behavioural science, economic science, climate science etc.. Shouldn't the data generated by this research be public domain? I think it should...

What about data created by government - which is us. Data about im- and export movements, data regarding employment, schooling, law enforcement, crime, etc.. 

Imagine....

What would this democratization of data mean with regard to innovation? I think it would truly ignite a burst of possibilities and a huge potential for our general wellbeing. And no - I am not referring to the challenge of marketing handbags to middle age ladies (quote somewhat paraphrased from Neil Raden).

No, set the data free to go for the real challenges we face; decreasing poverty, climate control, improving healthcare, scarcity of resources, economic stability and decreasing crime.

This blogpost is hugely inspired by John Wilbanks - google the guy (!) -, all the Open Data initiatives of the world where governmental agencies free up their data, the technological possibilities of data storage, data deployment, data enrichment, data visualization and advanced analytics and finally...this blog is inspired by a deeply felt wish and conviction that our field of knowledge (data management and data utilisation) can make a contribution to a better place for us to live in.


Posted January 4, 2013 8:50 AM
Permalink | No Comments |