We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


The Data Delivery Platform: Collected Comments

Originally published July 20, 2009

In March this year, my first article, The Flaws of the Classic Data Warehouse Architecture, Part 1, on the data delivery platform (DDP), was published by the BeyeNETWORK. It was followed by two more articles, and more will be coming soon. During the last couple of months, I have also been speaking at various conferences worldwide about the data delivery platform. And, as was to be expected, I received many comments. Most of the comments on the articles and the presentations ranged from positive to very positive. But some people had questions and some had a few concerns. In this article, I would like to react to some of those comments.

Andrea Vincenzi: Where did you get the data that 8 out of 10 projects use [the classic data warehouse] architecture?

My first article in the series started with this statement. Andrea touched on an important issue here. Through the years, many different architectures have been introduced to develop data warehouse environments. Some propose to develop data marts without a central warehouse. Others prefer a central data warehouse extended with a number of analytical cubes to speed up queries. There are experts who always include an operational data store (ODS) whilst others don't. Andrea mentions the hub-and-spoke architecture. Of course, all these architectures are different, but they all have at least two aspects in common. First, in most of these architectures, data is copied two or more times before it arrives in a data store that is accessed by some business intelligence (BI) tool. That data store can be a cube, a data mart, a central data warehouse, and sometimes even an ODS. Secondly, most of the BI applications are tightly coupled to the data store they access. Porting to another data store is not easy. For example, a BI report that uses MDX to access a Microsoft Analysis Services cube is hard to port to a data mart that uses an SQL dialect.

All these architectures with those two characteristics I call classic data warehouse architectures (CDWAs). And I think we can safely state that at least 8 out of 10 organizations have one of those classic architectures.

The main difference between the DDP and those classic architectures is that in the DDP the BI reports are decoupled from the data stores. The BI reports access the DDP. Hence, the DDP hides which data store is actually accessed. Maybe "underneath" the DDP we still have all this copying going on, but the good thing is, we don’t have to do that. This is optional. More importantly, we can change the storage structure if the need arises, or if new technology comes available. Especially in the world of storage and database servers, new technologies are still being introduced, and we want to be able to exploit those when it makes sense to us.

But I admit, it would have been more clear if I had used the plural form and had talked about classic data warehouse architectures.

Bruce Cassidy: [The whole issue of non-shareable specifications] is largely an issue with the toolsets, not with the "classic data warehouse" as such.

I fully agree with this comment. The BI tools we use in our current BI environments have been developed to support the classic data warehouse architectures. And we can’t blame it on the CDWAs that those tools don't work with shareable specifications. So, this a not a flaw of the CDWAs. But it is unfortunate that after twenty years of data warehousing, the tools and technologies still don't reuse each others’ specifications. On the other hand, we are so used to copying data in our warehouse environments, that we have started to feel very comfortable with copying metadata too. But this should not be the case. If, for example, an optional one-to-many relation exists between two entities or tables, we don't want to specify that fact multiple times. We want to be able to store that piece of metadata once, and we want every tool to be able to read and use it. It should be a shareable specification.

Andrea Vincenzi: Most reporting tools have a layer whose job [it] is to deal with different sources of data and present them in a transparent way to the user (the most famous are probably BusinessObjects [BO] Universes, but Oracle BI, Cognos and Microsoft have similar objects).

Andrea's reaction is a useful one. It is the reaction I get primarily from the BI vendors. They all claim to support federation and enterprise information integration (EII) type functionality. And in a way they do. However, this is a non-shareable technology. We can use BO's solution, but we won't able to reuse it in a SAS environment. And the same applies the other way around.

Ronald Damhof: Is the architecture you are proposing feasible today or “tomorrow”?

This question is representative of a lot of questions I have received in the last few months.

The answer to this question is yes and no, or maybe I should say it depends. Imagine we lock ten BI professionals in a room and ask them to come up with a list of features for the perfect DDP tool. If that list is ready, currently no product on the market exists that supports all the features on that list. In that respect, the answer to Ronald's question is no. But isn't this always the case when a new idea, concept or architecture is introduced? We all remember that when Ted Codd's ideas on the relational model were first published in the end of the ‘60s, it took a few years before the first relational products became commercially available. And it took another few years before these products were mature enough to be used in mission-critical environments. The same happened when data warehousing was introduced by visionaries such as Bill Inmon and Barry Devlin. That was a long time ago. At that time we didn’t have database servers to handle very complex queries. ETL tools were still simplistic; they had no support for lineage or impact analysis. Still, those ideas were good ideas, and it usually takes the vendors some time to develop the products to support the ideas. But if the market really wants the DDP, eventually vendors will release products to support it.

Note: By comparing the DDP architecture with Ted Codd's and Bill Inmon's ideas, I don’t want to imply that they are on the same level of importance and sophistication. What Codd did was especially extraordinary. These are just two examples to show that in the IT industry, the new ideas and concepts come first and then the tools. This is probably true for most technical professions.

On the other hand, the answer to Ronald's question is yes. Mature EII products are available to develop a DDP today. Composite Software, IBM and Oracle are just a few examples of vendors that have mature EII products that allow you to develop a DDP.

Anonymous: Adding a layer between the data stores and the BI tools will make it too slow.

This question was raised in various ways. Let's begin by saying yes, adding an extra layer is not going to make it faster. However, query performance is just one criteria for judging the quality of a solution. Other important criteria are flexibility and costs. By decoupling the reports from the data stores, we get a lot of flexibility. It will be easier to make changes to the data store structure, and we can more easily exploit new technologies. But, most importantly, we can sanitize and simplify. We can bring the whole environment back to a simpler architecture, which will reduce overall costs tremendously. Query performance is not a criteria that overrules all other criteria. We have to look at all the aspects. And if query performance does become an issue, the DDP allows us to bring in faster database technologies and by doing that compensate for the performance loss. Don’t forget that if the reports are decoupled from the database server, it is easier to switch to another database server.

Vijay Datla: All the reporting vendors (basically the upper layers of DDP) should also change or extend [their products] to support the DDP.

I hope they won't have to change anything. Most of the current BI tools send SQL or MDX statements to the data stores. A DDP should be able to support at least both languages and should be able to convert one language into the other. In an upcoming article, I will show that the DDP can be introduced in an evolutionary way and that it is not a disruptive technology. It can only be evolutionary and non-disruptive if we don’t have to change our reports drastically, although minor changes can be expected. In fact, with the current EII tools, this is already achievable. For example, Oracle BI Server can run a BI application that sends out SQL statements and translates those into MDX statements, or vice versa. In other words, this technology is available today.

As indicated, most people have a very positive feeling toward the data delivery platform. Therefore, I would like to end this article with a few positive comments from different people:

  • I appreciate the thought of DDP and hope it will come into existence soon.

  • Abstracting the delivery platform from the sources of information allows me to implement the platform in stages, as the sources evolve into new formats and platforms. I only need to reprogram the abstract layer and leave my delivery platform alone. Thanks for the article. I don't feel so "out in left field" as my colleagues think I am.

  • Redundancy flaw – excellent point. A horizontally scaling MPP warehouse appliance can significantly reduce overlapping data – primarily because of the sheer computational power to parallel process granular data. If you can do a complex ad hoc query (e.g., multi-way joins) off original tables in an MPP system way faster than in a CDWA, the need for aggregations and complex tuning falls dramatically (as well as redundant data costs).

  • We didn't do all the things yet, but we have the concept and are building the first projects in this new architecture.
A few people have asked me explicitly to react to their comments and questions. Hopefully these answers clear up some issues.

Stay tuned, more to come on the data delivery platform.

  • Rick van der LansRick van der Lans

    Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

    Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

Recent articles by Rick van der Lans

 

Comments

Want to post a comment? Login or become a member today!

Posted August 15, 2009 by ANDREA VINCENZI andrea.vincenzi@tiscali.it

Rick, thank you very much for taking the time to answer questions from me and other readers.

I have to say, and I hope you will not be offended, that your answers still don't address my central point, which was more or less this: "there is already a DW architecture that has been the most successful one in the last 10 years, supported by a a set of high quality books and implemented in hundreds of companies. How is it possible that you don't even mention it in an article that talks about DW architectures?" (of course I'm referring to the Kimball Dimensional Bus architecture). I think that the English way of saying in cases like this is "there is an elephant in the room".

I have no problem admitting that I'm biased: my opinion is that the Kimball approach is way smarter, cheaper, better performing and that it leads to more successful implementations than the others. However, I'm always open to new ideas, and in my course on Data Warehouse design I do an in-depth analysis and comparison of the two main approaches (Hub&Spoke / C.I.F. / Corporate DW versus Dimensional Bus).

To be completely honest I will add that the style of articles on the two sites: the b-eye-network (which as far as I know is aligned with the Inmon ideas) and the Kimball group site, reflects the difference in style and approach between the two philoshophies. The Kimball Group site is filled with articles that address specific problems and propose practical solutions, while this site contains mostly high level articles that leave a big gap between the problems that they address and practical solutions.

This is just my point of view of course, it might be wrong and I'll be happy to hear other readers comments.

With regards,

Andrea

Is this comment inappropriate? Click here to flag this comment.