Business Intelligence Network business intelligence resources

Blog: William McKnight

« DW/BI Market and RFID progress | Main | McKnight Associates, Inc. makes Inc. 500 »

Not so fuzzy

Occassionally, a vendor will come up with a new feature that you didn't think of, but once you see it, you find immediate application for it. Such is the case with 2 new features in SQL Server 2005's SSIS (Integration Services, the successor to Data Transformation Services). They're called fuzzy lookup and fuzzy grouping.

Many of the transformations in the warehouses I've been associated with are lookups. These range from simple matching to the use of some pretty complex rules for data cleansing. While many of these lookups require complete accuracy, many can accept "close enough". Actually, the right answer is often the close match and determining close requires the most complex logic.

Fuzzy lookup searches for "close" matches using its own logic. It creates similarity and confidence scores. Some combination of the 2 can be used to determine your acceptance systemically.

Fuzzy grouping looks at a group of potential records for loading and determines the probability that two (say, customer names) are actually duplicates.

Lookups and de-duplication is a huge ETL effort and I'm sure many Microsoft DW/BI shops will benefit from fuzzy lookup and fuzzy grouping in SQL Server 2005.

  Posted by William McKnight on October 19, 2005 8:40 PM |

Comments

By The way William, SQLServer 2005 "fuzzy transformations" actually incorporate neural net logic - that's right, live dynamic data mining - just as I predicted (as did many others) in my ETL/ELT and RDBMS arcticle in Teradata Magazine three years ago.

The beauty of this is that it can "learn" new data types, and we have the ability to set thresholds for data that "falls beyond the curve" or is suspect with some level of confidence.

This is something we will see more of, as RDBMS vendors offer stronger, easier, and faster access to the data mining engines - and something I hope we will see creep into EII, EAI, and other ETL products as well.

It's a powerful business proposition.

Cheers,
Dan L

Post a comment