Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Hmmm, I've been thinking about this for quite a while. In the tangible world we have tags for physical goods - yesterday they were bar codes, today they're RFIDS and RTLS systems. Tomorrow, physical elements may be tagged with DNA sequences, or electron signatures at the nano level.

Why then is it so hard to track intangible "data"? For applications we have the equivalent of software licenses, but for the actual data? Nothing.

In a world of hypothetical speculation, I would suppose that tagging every data element with an individual signature may be desirable. We could start with "units of work". Take this blog entry for example, tag the header with a signature, and tag the extended entry with a signature. The important thing is: the signature must travel with the unit of data - everywhere it goes. It becomes the unique ID for data sets, like RFID's and should be tracked across the network.

What possibilities does this open up? In data warehousing we often tag our data with CRC32/CRC64, and MD5 (hash functions producing mostly unique values across a row of data). Why then can't these "keys" become universal, and shared around the world? These are standard functions that produce the same keys for the same data everywhere.

What would happen if we could actually tag every "word" entered into every application? I assume data traceability would increase exponentially, talk about a boost to search engines! Unfortunately the downside to these functions is they produce very large keys, and for the most part functions like MD5 cannot be "reversed" - which leads to a massive storage and lookup function. Another issue is that CRC32 can produce duplicates (as can CRC64, although less frequently).

If someone were to produce a device that can "tag" data going over the internet, store the data in compressed format with the key-tag, then pattern recognition would be easier to spot - data mining would see a huge boost, and it may be possible to aggregate what used to be seen as dissimilar data into a similar keyed entry. These keys could also be shared across environments - maybe this is a call to EII vendors who are sharing data over SOA and web-services?

For now it's a pipe dream, but it may step into reality with DNA computing or nanotechnology. Just think: Data Unique Universal Keys (DUUK) - a fascinating idea. From compliance and monitoring perspectives it opens a ton of doors.

Thoughts?


Posted September 30, 2005 6:24 AM
Permalink | 1 Comment |

1 Comment

I like it... I know there are some algorithms to build, in a distributed (ie, non communicating) implementation agnostic way, Unique Universal Identifier.

UPCs/BarCodes/ISBNs have meaning though... first two numbers represent country of origin, next two are industry, next two are company, etc.

Are you thinking this would be useful in a DUUK or are you just looking for a way to generate a pure Universal Unique ID?

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›