Business Intelligence Network business intelligence resources

Blog: Dan E. Linstedt

« Welcome! | Main | Unstructured data, and Business Interpretation of results »

Unstructured Data and Blogging, and Data Mining?

What would you suppose they have in common? Is anyone considering mining blogs for business value? Sure there are aggregators out there (and some great ones at that), but what would happen if business began looking for real-patterns?

Has anyone setup a web-blogging data mining component? Interesting thought. Let's just say: fictitional company A has employees that blog, fictitional company #2 B is in competition with them, it would seem to me that competitive intelligence is just one aspect of utilizing a blog-miner.

Ok - so let's take a quick look at what that might take (technology wise) to get into. You might want: a) a genuine aggregator of good blogs to garner links and information from b) a list of relevant terms and words that make up your interest c) ranking of those keywords and key-phrases d) A web-scraping or blog scraping/RSS capture tool e) a back-end data model to load the blogs into f) a data mining tool that would either mine text, or a structured data model to fit "parts" of the scanned blog into. And finally, an analysis tool to make sense of the mined results.

What do they have in common? It's another form of super expression, no middle man, just pure information, and it's all unstructured - free flowing. Data mining and business intelligence provide a way to garner information from it. The information might be competitive, maybe watchdog, maybe useful in trying to mine the CEO's latest thoughts on the business.

What if we mined all the CEO blogs for "how to run a company successfully"? We might find some things we don't expect.

Cheers for now,
Dan L

  Posted by Dan Linstedt on March 15, 2005 3:16 PM |

Comments

My first thought is that it is very dangerous to take a sentence or part of a sentence out of context. For example the blog sentence "I hate Microsoft" from a Microsoft employee may be preceded by the words "I friend of mine told me". Pushing unstructured opinion text into a structured model opens up a lot of avenues to misinterpretation. When you store and show parts of a scanned blog you need to give drill down access to the entire sentence, paragraph and blog that the sentence came from and make sure people are using these drill downs to doublecheck the classification of the text.

In the end is it cheaper and safer to go for searchable blog archives instead mined blogs?

Post a comment