Blog: Dan E. Linstedt« Unstructured Data and Blogging, and Data Mining? | Main | Nanotech and the "Next Big Thing" » Unstructured data, and Business Interpretation of resultsThis is a follow on to my previous blog where many questions were asked in regards to what it might be like to "mine" web-blogs for content. A good comment was made on watching the context of the blogging statements. Here we explore the notion of that comment. A comment was received on the previous blog entry about the fact that "it is very dangerous to take a sentence or part of a sentence out of context." I would tend to agree. One of the things data mining engines have been traditionally good at is mining pre-structured information, the context for this structured information (just one view of this) might be the table structure, the organization and categorization of this information - sometimes dictated by the surrounding data, other times assisted by the data model itself. Of course the full meaning of the information content is defined only by a) the person asking the mining questions and b) the individual(s) who put together the data model being utilized as a source. However, the age of unstructured information integration has already arrived. Is the technology there yet? Maybe; is context inference available? Not so sure. Of course limited context parsing engines are being built to associate words and word patterns with many different formulas, ranging from statistics to calculus and neural nets. I would like to say that the comment above is appropriate, taking sentences out of full context for results in a data mining structured manner is very dangerous. It would be akin to misquoting a speaker in a public forum without exploring the meaning behind it, and yet - there is some base-level value here. Another question was asked in the comment: "In the end is it cheaper and safer to go for searchable blog archives instead mined blogs?" In my opinion, yes - it is probably cheaper and safer to go for searchable blog archives instead of mined blogs, however - let's not get ahead of our-selves. I stated a minute ago that it is my belief that there is base value in mining blogs, let me explain. For example: when an executive brings accounting numbers to the table in a financial meeting, they don't just bring the numbers - they do homework to understand what the numbers mean. I feel that in order to add value to the data mining results, due-diligence should be done to place the context and the meaning of the phrases back into the picture. This way the mining tool can be utilized for "discovery" purposes only, maybe a protected manner - followed by human research and investigation beyond that point. Just a few more thoughts... Great questions see you next time. |