Big Data Needs Context

Originally published November 1, 2012

Context is funny. You don’t miss it or even think about it until you don’t have it. Then suddenly context becomes a really big issue.

As long as the world had structured data, context was not much of an issue (or at least it wasn’t a large issue.) Consider the following simple structured file:

NAME
 PHONE BANK BALANCE
 Bill Inmon
 111-123-4567*  $3,122.97
 Ron Powell
 222-345-6789*  $10,981.24
 Linda Kresl
 333-111-1111*  $512.87

* Not valid phone numbers

Looking at the simple file and knowing what the columns of data mean, we make several assumptions about context. For example, we assume that the phone number is current. We assume that the phone number is in the U.S. We assume that the area code designates some geographical locale. As far as the bank balance is concerned, we assume that it is the current bank balance. We assume that it is in U.S. dollars. We assume that the name that is in the file is associated with the phone number and the account. In a word, because the data is structured, we make a lot of contextual assumptions about the data contained in the file.

Such is the nature of structured data. A big part of the “structure” of structured data is the context of the data found in the record or the file.

But when it comes to unstructured data, there is no context that can be conveniently associated with the data. For example, suppose you are reading an unstructured file. Suppose you encounter the number 7. Now what does “7” mean?

Is it the days in the week? The seven seas? The amount the Dow Jones went up this morning? The number of brothers and sisters you have? The truth is that the number “7” is naked. By itself it means nothing. In order for “7” to have meaning, it MUST have context. And with unstructured data there is no context.   
   
So before you get all excited about “big data” and all the unstructured data you find there, you need to spend some time thinking about how you are going to apply context to your unstructured text. If you are seriously going to ponder that question, spend a few minutes on the larger question: What does context of raw text really mean? It turns out that there are many different kinds of context – some of them more useful than another. Some of the forms of context are:
  • What type of data has been stated?
  • Who stated it?
  • Where was it stated?
  •  What was it stated in response to?
  • When was it stated?
  • How was it stated?
  • What day and time was it stated?
  • What was stated before it? After it?
  • And so forth.
This issue of context is a fundamental issue that we take for granted. But when we get into the world of unstructured data, we enter a whole new world where context simply is not present.

The people that are going nuts over big data today seem to either not know or not care that there is this major issue of context that comes with big data.

SOURCE: Big Data Needs Context

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!