We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Granularity Redux

Originally published July 7, 2011

Some things remain a constant. Hardware changes and becomes more miniaturized and cheaper. Software becomes more versatile and more functional. Consulting services become more specialized and more cogent. There is hardly anything that remains the same.

But through thick and thin, there remains one primary factor in the desire to have flexibility of data, and that is that the secret to flexibility is in the granularity of data. Stated differently, the more granular that data is, the more flexible it is. In a day of simple processors that was true. In a day of SMP processing that was true. In a day of MPP processing that was true. It is true for Oracle. It is true for Microsoft. It is true for IBM. It is true for HP. It was true in COBOL. It is true in C++. It is true in Java.

In short, regardless of the architecture or the technology, the framework of technology, the day and age a system was created, it is true that the number one factor relating to the flexibility of a system is the granularity of the data.

Sure, there are other factors relating to flexibility of a system. But all of the other factors are unimportant if the granularity of data is not at a low level.

The data warehouse designer is keenly aware of the importance of granularity. The data warehouse designer knows that the longevity of the data warehouse in no small part depends on the proper selection of the level of granularity of data from the outset. And the data warehouse designer knows that data granularity can always be adjusted upward but not downward. You can always add units of data together and make the granularity of data more summarized. But you cannot reduce the level of granularity once the level of granularity has been established.

The level of granularity has a profound effect on the different data marts that emanate from the data warehouse. One data mart looks at the granular data one way. Another data mart looks at the granular data another way. And yet another data mart looks at granularity another way. And, through it all there remains reconcilability of data because ALL the data marts operate on the same data with the same level of granularity.

Suppose the data warehouse contains fine grains of sand – fine silicon. One data mart takes that silicon and makes a wine glass. Another data mart takes the same silicon and makes a semi-conductor chip. And yet another data mart takes the silicon and makes a medical prosthesis. No one mistakes a wine glass for a semiconductor chip or for a medical prosthesis. But all of those items at one time were the same basic material. All of those items have the same material foundation.

There is such a thing as too low a level of granularity. For example, there is clickstream data. Clickstream data is the data that results when the access to a web page is tracked. Every movement of the cursor, every new page that is accessed, every item that is examined in the journey through HTML is captured. The problem is that there is little business relevance to much of the data that is captured in clickstream analysis. Most elements of clickstream data simply have little or no business relevance. So clickstream data needs to be edited and summarized to become useful. In other words, the clickstream data must have its granularity adjusted upward. But an upward adjustment of clickstream data is not very common. It is much more common to need to have the granularity of data adjusted downward, not upward.  

And, of course, there is a direct relationship between the granularity of data and the size of the data warehouse. The more granularity there is, the larger the data warehouse.
   

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon



 

Comments

Want to post a comment? Login or become a member today!

Posted July 7, 2011 by Anonymous

The use of term granularity appears to be inconsistent. For example  "The more granular that data is, the more flexible it is" and "The more granularity there is, the larger the data warehouse" suggest that higher granularity means more detail.  Whereas, "but all of the other factors are unimportant if the granularity of data is not at a low level" and "data granularity can always be adjusted upward but not downward" suggests that lower granularity means more detail.

A Google search reveals that the term granularity is used diametrically in the context of databases, but 'higher granularity means more detail' seems to be a more common usage. It would be helpful if there was some consistency. It might also help if authors stuck to high/low granularity rather than using more/less.

Is this comment inappropriate? Click here to flag this comment.