Originally published February 19, 2009
The world of commercial information technology has been created as a result of structured data. Structured data is data that occurs in a predictable manner, usually as a result of a transaction. When it comes to structured data, one thinks of banking activities, airline reservations, ATM activities, shop floor control systems and the like. In these environments, the same activity is executed repeatedly. The only difference from one transaction to the next is the parametric data that has been entered into the transaction.
All of this structured data and processing is the basis for 99% of the processing that occurs in the world of information systems today.
But the world of information processing is on the cusp of an entirely different kind of data and processing – unstructured data and processing. So what does the world look like when one takes a peek over the edge, peering into the new world of unstructured information?
The first question that is usually asked is – exactly what is unstructured data? A simple answer is that unstructured data is everything that structured data is not. The following are examples of unstructured data:
For the most part, the technology underlying all but electronic textual data is so undeveloped that doing analytical processing against unstructured data is an impossibility (or if not an impossibility, at least a high impracticality). For the most part, the only real analytical processing that is done by unstructured processing is that is done by and on electronic textual data.
There are some interesting examples of unstructured data processing that fall outside of electronic text however. One of the most interesting unstructured technologies that is non textual is facial recognition.
In order to understand the value of facial recognition software, I recommend you read the book Bringing Down the House (Mezrich). This book is the true story of the students at MIT that took Las Vegas for millions of dollars playing team blackjack. (Read the book to see how team blackjack works; it is truly a fascinating story.)
Toward the end of their caper, the students were turned in by one of their own to the gambling casinos in Las Vegas. The casinos were given pictures of the students. Upon arriving at a casino, the students were recognized and escorted out of the casino. But having a lot of money, the students hired professional makeup artists to disguise themselves. Even with the best disguises that money could buy, the students were still recognized by the facial recognition technology. So there is at least one example of viable nontextual unstructured technology.
Another form of nontextual unstructured technology is that of voice recognition. In voice recognition technology, speech is transformed to electronic text. There has been voice recognition technology for a long time. Some of the complaints against voice recognition technology are that the technology does not recognize words perfectly. In even the best of circumstances, approximately 90% of the words are recognized.
The less than perfect voice recognition technology can be greatly enhanced by “training” the software to understand accents and colloquialisms. But even in the best of circumstances, there is a percentage of the words that are not recognized. The interesting thing is that when humans listen to another person, we do not understand 100% of the words said. Our brains are good at “filling in the blanks.” So if our brains do not hear at a 100% rate, why should we be concerned with software technology that does not hear at a 100% rate?
There have been attempts to look at and recognize colors, sizes and shapes. Every now and then, there is an announcement of a breakthrough. But for the most part, unstructured technology that deals with this aspect of unstructured information is a long way away from a commercial application.
Even when it comes to electronic textual recognition, perfection is not achieved. Some of the less than perfect aspects of electronic textual processing are:
One of the biggest problems facing the analyst who wishes to address the usage of text for analytical processes is that of filtering text. When one looks at text, it is normal to find all sorts of text. Some text is important and useful to the business. Other text is irrelevant to the business. It is necessary to “weed out” the irrelevant text from the important and salient text.
Another issue is that of sheer volumes of text. It is estimated that in the standard corporation, there is at least 4 to 5 times the amount of text than there is structured information. From the sheer standpoint of size, something must be done to be able to filter out the unnecessary and the irrelevant text before the text can be meaningfully included in corporate analytical systems.
In addition, there is the simple process of transcription that has its own difficulties. One would think that reading text would be an easy process. And in most circumstances, that is the case. But for a variety of reasons, it turns out that merely reading text can be a challenge.
These then are some of the interesting things that one sees when looking over the edge of the cliff to the next horizon of information processing.
Recent articles by Bill Inmon