We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


The Requirements of the Data Delivery Platform

Originally published February 4, 2010

In this article, a list of requirements for the data delivery platform (DDP) is described. The DDP is an architecture for developing business intelligence (BI) systems where data consumers (such as reports and spreadsheets) are decoupled from data stores (such as data warehouses, data marts, and staging areas). The primary goal of this decoupling is to get a higher level of flexibility. In the article The Definition of the Data Delivery Platform, published at BeyeNETWORK.com, the following definition for the DDP was presented:

The data delivery platform is a business intelligence architecture that delivers data and metadata to data consumers in support of decision making, reporting, and data retrieval, whereby data and metadata stores are decoupled from the data consumers through a metadata-driven layer to increase flexibility and whereby data and metadata are presented in a subject-oriented, integrated, time-variant, and reproducible style.

To describe a particular concept as accurate as possible, a definition is required. A definition also helps to explain what that concept is. Having a clear definition can also avoid endless discussions, because different people might have different opinions about what that specific concept is.

However, even if the definition of a concept is perfect, itís hard to include all the requirements of that concept in its definition. For example, this is Bill Inmonís popular definition for the term data warehouse:

A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of managementís decision-making process.

Although this definition is widely used, and although most specialists have a good understanding of what is meant with all the terms used in the definition, it still leaves many questions unresolved. For example, should the tables in a data warehouse be normalized or not? Should all the tables be stored physically, or is a more virtual solution acceptable too? Should all the data be organized in tables at all? Does the term data include unstructured data? The definition doesnít answer those questions, which could lead to different interpretations, misunderstandings, and confusion.

Inmon is aware of this, and therefore in some of his other articles and books he does describe some additional requirements. For example, in his article What a Data Warehouse is Not, he writes: ďA much more rational way to build the data warehouse is to use the relational model.Ē This means that a requirement for a data warehouse is that its table structures are normalized and are not designed as stars or snowflakes. In other words, a database containing tables that are not properly normalized cannot be called a data warehouse. More requirements can be found in other books and articles.

Thus, having additional requirements (in addition to a definition) is not uncommon for describing a concept in more detail. The main reason they are needed is that itís very hard to come up with a definition that describes exactly what a particular concept is, and that is still readable and not a full page long. This is definitely true for non-mathematical concepts such as a data warehouse. In a way, itís unfortunate that in our field we donít have more precise definitions, such as E = MC2, but alas.

To avoid confusion and misconceptions with respect to the data delivery platform, this article lists the minimum requirements of a DDP-based system. The list should also give readers a more detailed understanding of what the DDP is and what it isnít. In addition, if an organization wants to develop their own DDP-based business intelligence system, the list will also inform them about the minimum requirements.

The Requirements of the Data Delivery Platform

To bring some structure in the list of requirements, they have been classified in eight categories:
  1. Requirements related to data stores and data access:

    a. A DDP-based system should support access to a large array of heterogeneous data store technologies and systems, including relational database technology, content and document management systems, MDX-based database technology, XML-storage technology, message queues, archive, streaming database technology, HTML-based websites, web services, spreadsheets, and textual documents.

    b. A DDP-based system should be able to process a wide range of query languages, including SQL, XQuery, MDX, SOAP/XML. The more languages are supported, the more BI tools can query the DDP, and the easier it will be to migrate an existing BI application to the DDP.

    c. A DDP-based system should support a wide range of open APIs for passing the queries from the data consumers to the DDP, including ODBC, JDBC, OLE DB, OLE DB for OLAP (ODBO), XML for Analysis (XMLA), XQuery API for Java (XQJ), and ADO.NET.

  2. Requirements related to querying and data consumers:

    a. A DDP-based system should allow any combination of query language and API to access any data store. For example, we should be able to write an MDX query that joins a relational database and a MDX database, or an SQL query that joins a spreadsheet with a relational database.

    b. A DDP-based system should support a pull-model and a push-model for transporting data from the data stores to the data consumers. The pull-model is the classic model where the data consumer fires off the queries and the DDP and data stores react. With the push-model, the database server managing the data stores sends data to the data consumers, for example, using database streaming technology.

  3. Requirements related to metadata:

    a. A DDP-based system should support a registry for storing additional metadata that is not stored in any other (meta) data store. However, the DDP does not enforce that all metadata is stored and managed by the DDP itself. In fact, the DDP should be able to process metadata the way it processes data itself: it should use a federated approach to deliver metadata.

    b. A DDP-based system should make metadata accessible to any type of tool.

    c. A DDP-based system should deliver metadata to data consumers in the same way it delivers the data to the data consumers.

    d. A DDP-based system should support all types of metadata, including technical, business, and operational metadata. In fact, the more metadata-driven the DDP is, the better.

    e. A DDP-based system should allow for different definitions of data elements for different users.

  4. Requirements related to acquisition and integration of data:

    a. A DDP-based system should support acquisition and integration of data in an on-demand and a batch style. The term acquisition refers to extracting data from data stores.

    b. A DDP-based system should support all the common types of integration, including name changes, joins, selects, aggregations, splits, projections, and cleaning.

    c. A DDP-based system should support lineage and impact analysis.

  5. Requirements related to an internal data store:

    a. A DDP-based system should support an internal data store for storing data not stored in any other data store. This internal data store could be used, for example, to store extra descriptive data, predictive data, temporary, and intermediate results.

  6. Requirements related to features for improving performance, scalability, and availability:

    a. A DDP-based system should support advanced buffer management techniques to minimize interference and to reuse query results.

    b. A DDP-based system should support features for controlling which data elements are to be buffered and when.

    c. A DDP-based system should support distributed join optimization techniques to optimize query performance.

    d. A DDP-based system should be able to monitor queries and other performance-related aspects, and it should also be able to show usage (which user is using which tables). These features are needed for managed self-service business intelligence as well.

    e. A DDP-based system should allow for the definition of usage limitations. Available limitations might lead to canceling a query before it is executed to avoid queries that consume too many resources.

  7. Requirements related to security:

    a. A DDP-based system should support single sign-on.

    b. A DDP-based system should support authentication techniques.

    c. A DDP-based system should support authorization features (which users are allowed to access which data and when). It should be possible to specify rules for authorization up to the individual data value level.

  8. Requirements related to transactions:

    a. A DDP-based system should support updates, inserts, and deletes on all the data stores that allow updates, inserts, and deletes.

    b. A DDP-based system should support heterogeneous distributed transactions.
To summarize, a business intelligence system has a DDP-based architecture if it adheres to the definition of the DDP and if it meets as many of the above requirements as possible.

Non-Functional Requirements of the Data Delivery Platform

The list of requirements above does not include non-functional requirements such as those related to performance, availability, scalability, and concurrency. The reason is that in general these requirements will not determine whether something adheres to a particular definition. For example, whatever the definition of the word "car" is, even if my car is incredibly slow, itís still a car; in fact, even if it canít be driven anymore, itís still a car. Or, if my computer has crashed, it would still be called a computer because it would still adhere to the definition of computer and to the additional functional requirements. Similarly, if the query performance of a specific data warehouse is bad, itís still a data warehouse; a data mart with availability problems is still a data mart; and likewise, a DDP-based system is still a DDP-based system even if problems occur with aspects such as performance, availability, scalability, and concurrency. To summarize, in most cases non-functional requirements do not determine whether something conforms to a definition.

In most cases, non-functional requirements apply to specific solutions and are subjective. The car I drive is fast enough for me, but may be too slow for other drivers. The same applies to a DDP-based business intelligence system developed for a specific customer. Its query performance might be fast enough for them, but maybe itís too slow for another customer. It all depends on what the customer wants and needs. But whatever the performance is and whatever the customer thinks of it, itís still a DDP-based business intelligence system. It will be the responsibility of the vendors and the developers to build and deliver solutions that conform to the non-functional requirements demanded by customers and users.

Comments

I do welcome any comments, remarks, and improvements to further improve and complete this list of requirements of the data delivery platform.


  • Rick van der LansRick van der Lans

    Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

    Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

Recent articles by Rick van der Lans

 

Comments

Want to post a comment? Login or become a member today!

Posted February 6, 2010 by Robert Eve reve@compositesw.com

Rick -

It is great to see the progress of your Data Delivery Platform series. 

This practical capabilities checklist is a valuable complement to your earlier conceptual and definitional content. 

You are doing your readers a good service.  Keep it up.

Is this comment inappropriate? Click here to flag this comment.

Posted February 5, 2010 by Rick van der Lans rick@r20.nl

Hi Srini,

Thanks for your comment in my article. The architecture I propose is not a nirvana in the sense it is not feasible. We have tools on the market that allows us to build business intelligence systems based on this architecture today. Check out for example Composite Information Server and Oracle BI Server (at the heart of OBIEE).

And you're right, some tools support metadata layers that hide the data stores. However, those layers can't be accessed by tools from other vendors. In other words, they don't have an open API. The tools above do. Specifications maintained by those tools can be used by tools from all kinds of vendors, they are sharable specifications.

If relationships between data objects change and if those changes are applied for a particular report, then we will have to change both layers. Which makes sense. However, if a relationship is changed but those changes are irrelevant for specific reports, the DDP should be smart enough to translate the new data structure to the old structure (if it can technically be reconstructed), and the reports would stay unchanged.

Hope this explains it well.

Is this comment inappropriate? Click here to flag this comment.

Posted February 4, 2010 by Srini Ganesan

"Data consumers decoupled from Data stores" sounds like nirvana.

Even though tools like BOBJ, MSTR, OBIEE etc., use a (universe/objects/rpd) metadata layer to isolate the data store (db objects) from data consumer (dashboards) any change in the relationship between db objects affects the metadata layer and eventually the dashboards...

Rick, are you proposing or did you have in mind a totally different tech stack?

Is this comment inappropriate? Click here to flag this comment.