We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Data Use by Analytical Applications: Primary vs. Secondary – Who is #1?

Originally published August 26, 2010

I recently participated in a one-day training session on trends in business intelligence and data management, and was lucky enough to be introduced by the organization’s CIO. One of the interesting comments the CIO made was in reference to the value proposition and business intent of organizations like Facebook, and I can summarize what she said: What is the value proposition of Facebook? While they provide a service to millions of people who want to share information, the value of the organization lies in its ability to collect data that can be used for analytical purposes. In other words, the value of the data was not in its primary use, but rather in its secondary uses.

I have seen an increase in employing the words “primary” and “secondary” in reference to data usage, and I am not innocent of this either. But it has me thinking a lot about what is meant by the use of these qualifiers, and whether their use in these contexts is warranted.
Basically, when I see the term “primary use” of data, I think of the first use of the data, or the business process within which the data was created. In many cases, this is some kind of transactional or operational system, and the intent is collecting the right amount of information to complete a workflow or transaction process. When I see “secondary use,” it conjures up the image of a variety of downstream business applications hungrily snapping up any and all available data sets to be used in a number of ways, particularly in ways for which the data was not originally intended.

Or at least that is what I used to think. I can go back to earlier versions of material that I have written that expresses this distinction of “intent” suitability for original purposes vs. suitability for downstream data uses. But let’s think about what the CIO said in her speech: the value of the organization is in its use of the data for the analytical purposes. So here is the question: which data use is primary and which is secondary? If the value is derived from analyzing the data, that should make it the primary use, shouldn’t it? In essence, we have turned the definition around. A quick web search defines primary as “first in order,” but it also defines it as “first or highest in rank of importance.” If the alternate uses are highest in rank of importance, they become the primary uses, perhaps relegating the original intent to second place.

The implication is a significant challenge, though, especially in the context of soliciting data requirements. Our current approach to defining data requirements looks at the functional needs of the business process application being designed to meet an acute functional need. In most cases, though, no one considers how the created data will be used by other applications. But if the intended uses of the data are for the downstream analytical applications, those applications become the primary users of the data. Therefore, it is incumbent upon the system designers to talk to any potential data consumer and to identify the information that any analytical applications might need.

This impacts a number of aspects of data management and data architecture. Underlying data models must be configured to accommodate any transactional or operational needs, but also must be able to satisfy the needs of a range of other applications. Some of these applications might still be in the design or even planning stages, or might not even have been yet conceived!

One idea to consider involves expanding the realm of traditional data integration to enable a collection of data services that, when layered on top of existing data sets, allows applications seamless access to the data. But this has to be done in a way that does not allow for rampant redefinition of the data elements and the corresponding concepts. In other words, the data integration framework must provide a semantically and syntactically coherent view of any of the available data sets.

To make this happen, you need improved data governance processes, such as those for data requirements assessment, capturing data quality expectations and defining data rules, business term definitions, metadata management, and data integration and data virtualization protocols. You will also need better processes for the reporting and analytics life cycle, including conceptualization of reports, self-service business intelligence, and verification, validation, and certification for delivered information products.

This may seem like a lot of potential work, but in the end it makes sense – the greater the volumes of collected data, the bigger the probability that the data is intended from the start to be used for multiple purposes. If so, we need to alter our oversight to ensure data usability.

Recent articles by David Loshin

 

Comments

Want to post a comment? Login or become a member today!

Posted August 26, 2010 by jko@informatica.com

David, good thought-provoking piece.  There has to be a shift in mentality to realize that the value of data transcends its source application, and that data typically outlives applications.  So you cannot continue to treat data as a second-class citizen, subordinate to the application.  The data needs to be managed and governed in its own right, with a view to its multiple uses across enterprise.  And that it has its own lifecycle separate from application.

Is this comment inappropriate? Click here to flag this comment.