Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approachand Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at loshin@knowledge-integrity.com or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

May 2008 Archives

I have been tinkering with some of the blogging tools out there (so far I like wordpress a lot). One nice aspect of the blogging framework is the expectation of meta-tagging of your content that helps in organization and presentation, which is quite nice because the system does some of the work that I have always been loathe to do (that is, "organizing things").

One way to do this is by categorizing your entries as well as adding additional tags. I was pondering this at some point, thinking that it should be possible at this point to use text mining tools to scan your content and pull out the "statistically improbable" phrases (as our friends at Amazon like to say) to be used as tags.

But what about non-text content? I can think of three commonly used content types that are growing in popularity yet require some extra thought for assigning meta-tags: pictures, voice recordings, and video recordings. As more of this unstructured stuff comes down the pike, we metadata folks should think hard about how to assess and capture semantics associated with these objects for the purposes of organization.

A few years back my friend Greg Elin put together a system for selectively annotating pictures. Check out his fotonotes web site. Perhaps there is some future in this for video?


Posted May 28, 2008 8:03 AM
Permalink | No Comments |

Strolling around the exhibit floor at the TDWI conference in Chicago the past few days provided an interesting look into a rapidly evolving trend in data warehousing applicances. Of the 30 or so vendors exhibiting, I counted at least 7 that would be considered appliance vendors:

DATAllegro
Dataupia
Kognitio
Netezza
ParAccel
Sybase
Teradata

I might throw Oracle, HP, and Sand Technology in there as well, but I think you see my point - there seems to be the perception that there is a market for high performance "plug-in" systems to deploy data warehouses. What is perhaps even more interesting is that half of these vendor offerings are not specifically hardware appliances, but rather software database systems that can be deployed on top of different hardware systems - in other words, they are "software appliances" (!?)

In essence, many of these approaches, along with some from other vendors as well (Vertica was notably absent from this crowd, but showed up at the previous Las Vegas TDWI) focus on structural optimizations (such as columnar-oriented databases) that are very well-suited for loading into core memory and providing very fast read access, making it especially nice for query/reporting clients. The realization that the database system can be optimized and parallelized in a way that is decoupled from the hardware makes these software-only approaches look very cost-effective, especially when considering sizing a warehouse to meet current needs while considering future growth. Not only that, these systems are finely tuned for performance, (see Mark Madsen's comments about ParAccel's TPC-H benchmark scores).

The common theme with the software appliance crowd is lowering the barrier of entry to Small/Medium businesses seeking to jump on the BI bandwagon. WIth a variety of operational modes that span full-blown deployments (with hardwre purchase and integration) down to a service-based hosted model, this platform enables data warehousing at a fraction of the cost. This concept in its own right is worth some more exploration, and I think I may try to address that in an upcoming column.


Posted May 15, 2008 11:40 AM
Permalink | No Comments |

We are currently updating our company web site, and I am extremely impressed with the ways that emerging blogging tools are able to solve certain "challenges" associated with managing a web site's content. I am planning to put together a new web site to accompany my MDM book and I am also thinking that blogging is the way to go.

Check out Movable Type and Wordpress - pretty impressive. The software is pretty cool, provides all sorts of widgets and plug ins, and makes life a lot easier for keeping a web site fresh.


Posted May 12, 2008 8:07 AM
Permalink | No Comments |


   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›