Some things remain a constant. Hardware changes and becomes more miniaturized and cheaper. Software becomes more versatile and more functional. Consulting services become more specialized and more cogent. There is hardly anything that remains the same.
But through thick and thin, there remains one primary factor in the desire to have flexibility of data, and that is that the secret to flexibility is in the granularity of data. Stated differently, the more granular that data is, the more flexible it is. In a day of simple processors that was true. In a day of SMP processing that was true. In a day of MPP processing that was true. It is true for Oracle. It is true for Microsoft. It is true for IBM. It is true for HP. It was true in COBOL. It is true in C++. It is true in Java.
In short, regardless of the architecture or the technology, the framework of technology, the day and age a system was created, it is true that the number one factor relating to the flexibility of a system is the granularity of the data.
Sure, there are other factors relating to flexibility of a system. But all of the other factors are unimportant if the granularity of data is not at a low level.
The data warehouse designer is keenly aware of the importance of granularity. The data warehouse designer knows that the longevity of the data warehouse in no small part depends on the proper selection of the level of granularity of data from the outset. And the data warehouse designer knows that data granularity can always be adjusted upward but not downward. You can always add units of data together and make the granularity of data more summarized. But you cannot reduce the level of granularity once the level of granularity has been established.
The level of granularity has a profound effect on the different data marts that emanate from the data warehouse. One data mart looks at the granular data one way. Another data mart looks at the granular data another way. And yet another data mart looks at granularity another way. And, through it all there remains reconcilability of data because ALL the data marts operate on the same data with the same level of granularity.
Suppose the data warehouse contains fine grains of sand – fine silicon. One data mart takes that silicon and makes a wine glass. Another data mart takes the same silicon and makes a semi-conductor chip. And yet another data mart takes the silicon and makes a medical prosthesis. No one mistakes a wine glass for a semiconductor chip or for a medical prosthesis. But all of those items at one time were the same basic material. All of those items have the same material foundation.
There is such a thing as too low a level of granularity. For example, there is clickstream data. Clickstream data is the data that results when the access to a web page is tracked. Every movement of the cursor, every new page that is accessed, every item that is examined in the journey through HTML is captured. The problem is that there is little business relevance to much of the data that is captured in clickstream analysis. Most elements of clickstream data simply have little or no business relevance. So clickstream data needs to be edited and summarized to become useful. In other words, the clickstream data must have its granularity adjusted upward. But an upward adjustment of clickstream data is not very common. It is much more common to need to have the granularity of data adjusted downward, not upward.
And, of course, there is a direct relationship between the granularity of data and the size of the data warehouse. The more granularity there is, the larger the data warehouse.
SOURCE: Granularity Redux
Recent articles by Bill Inmon