This blog has been inspired by Martijn Evers blogpost, by Barry Devlin's work on Business Integrated Insight,
by some very forward thinking customers and is bugging me for some time
Back in 2005 I gave masterclasses 'Data
Warehousing in Depth' at the knowledge institute CIBIT in the Netherlands. One
part of the class was pondering about the future of data warehousing. It was my
favorite part and I remember that I always reminded the class as to the
root-reason for a data warehouse; a physical construct to overcome deficiencies
like performance, history, auditability, data quality, usability, ...
And I remember drawing two circles; one for
data in the operational environment and one for data in the informational
environment. Both represented physical environments, in other words; the data
needed to physically move from the operational to the informational
What would be necessary to blend the two environments,
I then asked;
- "Sources need to
maintain history and adhere to strict auditability rules"
- "More and faster
hardware and parallelization options in both hard- and software"
- "A common
vocabulary of data, reflected in the operational systems"
data like master- and reference data needs to be maintained pro-active
(upstream) and centrally"
These are all still very valid points, but I
would add a more profound one:
Data Virtualization guru's preach to us the Information
Hiding1 pattern. A pattern perfectly suited for the
decoupling information systems and their data. These Data Virtualization guys
and gals say that the software that supports this data virtualization is the
new pinnacle for decoupling operational and informational environments and make
the Data Warehouse2 (eventually) obsolete.
My opinion; 'they' are
right and they are wrong. Yes, Information hiding is a pattern (there
are more) that enables decoupling of the information environment and the
operational environment. But this distinction is somewhat flawed - it is a distinction that
originated from the early 90's that reflected deficiencies of using registered
data for decision support activities.
We might wanna make an effort in trying to get rid of these deficiencies....
In the last 20-30 years, the data-model and its instantiations
(the data) where directly based on the Information systems. The data-model and
the data were by-products. The data-model fitted the information system. The
Informational environment was born to overcome the deficiencies that came with
I want to make a plea for shifting this process-oriented
thinking and designing to data-oriented thinking and designing. Make the data
smarter instead of making countless little/big information systems with their
own 'data-store'. It is not that odd; information systems and the business
processes they support are so much more susceptible to change than the data is.
Data is - in its very nature - extremely stable over time.
This plea is not new; start with an Information Model of your
business, construct a conceptual model and slowly design your way down to the
logical, physical model and/or canonical model3. Information systems
are now made to fit the data-architecture and not vice versa. These information
systems are somewhat decoupled from the data architecture. These information
systems need to use the Information Hiding pattern.
Now, the principle of Information Hiding and
the technology that can handle this pattern can flourish. Data Virtualization
technology can be used to its full potential, but the same applies to BPM
technology or technology which is based on service oriented architectures.
So, now we have the following list of
requirements that are necessary to blend operational- and information
1. A data-oriented design and architecture of information system as
opposed to a process oriented design and architecture
2. "Sources need to maintain history and adhere to strict
3. "More and faster hardware and parallelization options in
both hard- and software"
4. "A common vocabulary of data, reflected in the operational
5. "Enterprise-wide data like master- and reference data needs
to be maintained pro-active (upstream) and centrally"
And I will add another three:
6. Centralized design and maintenance of business rules
7. Data security and privacy law and regulations are enforced on
8. An organizing framework for establishing strategy, objectives,
and policies for (corporate) data4
It is impossible to be completely thorough in
this list, there are indeed many more, but this is a blog.....not a book ;-)
The mentioned criteria can be mapped to a
discipline that is about to reach a critical mass in terms of body of
knowledge, technology support, rigor in science and relevance in practice; Data
Management and Data Governance.
Back in 2005, in my masterclasses, I mused
over the future of data warehousing. The future is now, the journey will not be
easy, but the rewards are substantial. It truly is the future of the Sense
and Respond5 organization.
1 David L. Parnas, 1972, On the criteria to be used in decomposing
Systems into modules
2 Data Virtualization vendors often claim to make the
Enterprise Data Warehouse obsolete (referencing to the Boulder BI Brain Trust
meeting beginning of this year where a vendor made this claim). They confuse a
technology (data virtualization software) with an architectural construct.
3 A Canonical model is a design pattern used to communicate
between different data format. It needs to be based on the logical model of the
4 Jill Dyche and Evan Levy, note from the blog-post-author; I adapted the quote by adding brackets between 'corporate'
5 Stephen H. Haeckel, 1999, Adaptive Enterprise: Creating and Leading