We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Blog: William McKnight Subscribe to this blog's RSS feed!

William McKnight

Hello and welcome to my blog!

I will periodically be sharing my thoughts and observations on information management here in the blog. I am passionate about the effective creation, management and distribution of information for the benefit of company goals, and I'm thrilled to be a part of my clients' growth plans and connect what the industry provides to those goals. I have played many roles, but the perspective I come from is benefit to the end client. I hope the entries can be of some modest benefit to that goal. Please share your thoughts and input to the topics.

About the author >

William is the president of McKnight Consulting Group, a firm focused on delivering business value and solving business challenges utilizing proven, streamlined approaches in data warehousing, master data management and business intelligence, all with a focus on data quality and scalable architectures. William functions as strategist, information architect and program manager for complex, high-volume, full life-cycle implementations worldwide. William is a Southwest Entrepreneur of the Year finalist, a frequent best-practices judge, has authored hundreds of articles and white papers, and given hundreds of international keynotes and public seminars. His team's implementations from both IT and consultant positions have won Best Practices awards. He is a former IT Vice President of a Fortune company, a former software engineer, and holds an MBA. William is author of the book 90 Days to Success in Consulting. Contact William at wmcknight@mcknightcg.com.

Editor's Note: More articles and resources are available in William's BeyeNETWORK Expert Channel. Be sure to visit today!

Occassionally, a vendor will come up with a new feature that you didn't think of, but once you see it, you find immediate application for it. Such is the case with 2 new features in SQL Server 2005's SSIS (Integration Services, the successor to Data Transformation Services). They're called fuzzy lookup and fuzzy grouping.

Many of the transformations in the warehouses I've been associated with are lookups. These range from simple matching to the use of some pretty complex rules for data cleansing. While many of these lookups require complete accuracy, many can accept "close enough". Actually, the right answer is often the close match and determining close requires the most complex logic.

Fuzzy lookup searches for "close" matches using its own logic. It creates similarity and confidence scores. Some combination of the 2 can be used to determine your acceptance systemically.

Fuzzy grouping looks at a group of potential records for loading and determines the probability that two (say, customer names) are actually duplicates.

Lookups and de-duplication is a huge ETL effort and I'm sure many Microsoft DW/BI shops will benefit from fuzzy lookup and fuzzy grouping in SQL Server 2005.


Posted October 19, 2005 8:40 PM
Permalink | 1 Comment |

1 Comment

By The way William, SQLServer 2005 "fuzzy transformations" actually incorporate neural net logic - that's right, live dynamic data mining - just as I predicted (as did many others) in my ETL/ELT and RDBMS arcticle in Teradata Magazine three years ago.

The beauty of this is that it can "learn" new data types, and we have the ability to set thresholds for data that "falls beyond the curve" or is suspect with some level of confidence.

This is something we will see more of, as RDBMS vendors offer stronger, easier, and faster access to the data mining engines - and something I hope we will see creep into EII, EAI, and other ETL products as well.

It's a powerful business proposition.

Cheers,
Dan L

Leave a comment

    
   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›