Business Intelligence Network business intelligence resources

Blog: Dan E. Linstedt

« March 2006 | Main | May 2006 »

April 29, 2006

RFID Tracking and Nanotechnology for BI

Imagine, a smart RFID (radio frequency identifier tag) - in other words, not just one that bounces a signal that was received by a transmitter, but one that emanates a unique number (like a RIN (RFID identification number) - like a VIN only for RFID's. I realize they already have RTLS (real-time locator systems) with this technology embedded, but imagine it at a smaller scale. RTLS are currently very large (compared to RFID tags). How would this affect BI? What if it could use Nanotechnology and an embedded power source (like Nanotech reports is possible) to power a unique signal? What would happen to the supply chain for example?

This entry is just a thought experiment.

Well, I was thinking about my can of soda; yes I drink soda, and it's usually Pepsi, uhhh I mean Coke... Oh Well, I drink both - but anyhow, what if the can's paint could contain a RID and a modified RFID that generated signals? What if Coke/Pepsi cared about geographic location of the can? It is possible to send a satellite signal to each MRFID (modified RFID). This would have to be done using Nanotech, for an internal power source, and a transmitter would have to be embedded, or an encoding device.

In other words, since the power source is usually too weak to respond to a satellite signal, it would have to record where it was (latitude and longitude). Every hour it would record it's lat/long in a DNA computing style by folding DNA elements.

Yea, so what, what if Pepsi/Coke could track the can, and what difference would it make?
Well - from a vendor perspective they could start to discover where the cans went when they left the store. Perhaps a scary thought, perhaps not. In any case, it's bound to happen and not just with the Drink manufacturers, but with cars, clothing, artwork, and so on. In fact with On-Star in GM Cars, it's already happening (only on a larger scale). Imagine what marketing power the cola company would have if they knew that on July 4th many of the cans were not only purchased at Wal-Mart, but driven to a remote location where they were subsequently consumed; in other words, a campground.

If the cola manufacturer could figure out how to open a store closer to that location, they might have a boost in sales, or even a dispenser machine or they drive a truck up to the location to sell or promotionally give away their product; all in the sake of loyalty.

But hey, we're talking tracking of the products that we purchase. This raises serious invasion of privacy concerns. I may not want my cola / pants / T-Shirt producer tracking my activities and locations. They'd quickly find out that I'm not worth tracking - moving around the country to present at IT meetings, and working at home most of the time.

On the other hand, think of what Law enforcement could do from a business intelligence perspective - a criminal purchases a set of pants or a mask that's tagged with MRFID, and all of the sudden the FBI has a fix on their location... Hey maybe it's good for tracking wanted individuals. But we'll leave that alone.

What I'm suggesting is the following:
* This technology will come to pass, like it or not - it will happen within 15 to 20 years (or sooner) Because vendors would have a huge increase in revenue as a result.
* Nanotech is already here, and there are limitless utilizations for it.
* Privacy and Ethics are a hot debate in the nanotech industry
* There are some interesting applications for MRFID in the productized world.

Care to share some ideas? Thoughts?
Daniel Linstedt

  Posted by Dan Linstedt at 4:23 PM | | Comments (5)


April 28, 2006

Meta Integration and Capabilities

If you haven't heard of this company, and you're looking for Master Metadata Management, you'll want to check them out. They are a privately held firm based in Silicon Valley, and are OEM'ing their core technology to just about every major vendor out there. That said, they do sell their complete package with all kinds of cool features for a reasonable cost. The business reasons to use their software? Read on - let's try to shed some light on this...

If you're involved in Metadata Implementation, master metadata initiatives or in fact, data management, data administration, change management, of process flows, and other such items, then this tool can fill the bill. Meta Integration (http://www.MetaIntegration.com) is:

a metadata repository that contains versioning, and over 50 different bridges for BOTH import / export of technical metadata. But it goes deeper than that. It manages process metadata, data model metadata, logical metadata, ETL / ELT flow metadata, BI Query and database view metadata, and so on. The bridges can connect to many different tools and pull in the metadata from all of these - and through a process called stitching can overlap the metadata together (where appropriate).

From a business perspective their tool provides drill down and FULL HISTORY DATA LINEAGE from the source to the target, and can provide this lineage across the business, in a web-based browser to the business users. The repository can also handle registries of business metadata information, and allow the business users to maintain copies of the metadata (those who have privileges and access may upload the new metadata to Meta Integration as a new version).

It is quite a tool, it can save time and money by providing PROFILING of the existing FLOW of the data set across the systems involved in data integration, this is a massive time-saving device, even if the tool is not used for metadata management.

Here are a few of the connectivity options it has: ER-Win, Informatica, Ascential, Teradata, Oracle, DB2 UDB, SAP, PeopleSoft, Oracle Financials, SAS, and so on.

We have built metadata management best practices with roles and responsibilities, business definitions, and execution strategies around their tool. We can really speed up or enhance your data management / data administration and metadata initiatives.

Thanks,
Dan Linstedt

  Posted by Dan Linstedt at 6:28 AM | | Comments (0)


April 26, 2006

SQLServer2005 and Integration Services

I recently had the chance to work with the Beta program of SQLServer2005, and Integration services. Let me tell you, it was amazing. There are several highlights in the technology that I'd like to share. This entry will go through these items. There are only a couple of things that I'd like to see improved. My basic thoughts are: this will give customers who are tight on budget, or not moving huge volumes per load-cycle, an opportunity to really begin building out a true data warehousing/BI solution.

SQLServer2005:

* Database/Table partitioning. At long last, SQLServer has implemented range-partitioning. This will be a HUGE boost to performance on larger data sets, as long as the partitions are architected properly across the existing disk sets. If you're not familiar with partitioning, now's your chance to read up on it - do all the reading you can before you upgrade. Partitioning is not something you simply want to "turn-on". However, SQLQueries that use clustered indexes can finally avoid table scanning when partitions are invoked, why? next feature... Other types of partitioning (like hash, and sub-partitions) are said to be in the works for a future release.

* Query Parallelism. Query parallelism has been implemented to take advantage of the database and table partitioning. The queries can and will be broken into their respective components to take advantage of the range partitions. Queries also take advantage of new performance of the SQLServer2005.

* Performance. SQLServer2005 has extended its performance by optimizing to 8k block sizes, windows operating system hooks, and shares RAM caches with I/O buffers. They've streamlined the I/O from SQLServer2005 to the disk, and as I understand it - bypassed the page-file to bring data into RAM (although page-file is still used for locking and swapping). The performance of SQLServer2005 is much higher than 2000, and can process millions of rows per minute, even without clustering.

* Clustering. There are new mechanisms for clustering SQLServer2005, and as I understand it, it can take shape of multiple clusters on a network of nodes (SMP under an MPP look and feel). This brings new meaning to low-cost high power computing. This notion is still challenging to setup, and make happen with SQLServer2005, but will get easier as Microsoft continues to release their product.

Integration Services:
* Fast fast fast, is all I can say. I saw a windows platform processing 2 Million rows in 2 minutes from a flat file, into SQLServer2005 through Integration services, and through a data mining transformation that provided name and address cleansing (and learned) - It was based on a neural network that you could tune. This was done on a single CPU laptop running all the software. It's been tuned to run with Flat Files, and SQLServer connections in parallel. Finally, direct connections gain parallel utilization and catch up with other DBMS vendors.

* Tons of transformations. Comes out of the box - lots of transformations, as I mentioned just a minute ago, one of them is a neural network. It also houses the parallel processing properties for splitting data (in parallel) down multiple transformation paths. They also have the notion of an error control and error target, which you can control the rules for.

* Workflow style transformation - all transformations are workflow style, meaning they are "runnable designs". You can see the row count increase as the data passes through the transformations along the way.

Improvements I'd like to see:
* Metadata. I had heard that they don't have a real "metadata repository", they apparently are working something out with Meta Integration (you'll hear about them shortly). They need to get into the OPEN metadata game.

* Import/Export of transformation logic via XML. There apparently is no import/export of the transformation designs, therefore there is no current capability for sharing the transformation logic across servers.

* Reusable transformation "workflows" - they do not have the concept of reusability down yet. I've heard they are working on this.

* Biggest one: Connectivity options. Today, if you want Oracle Data, DB2 UDB, DB2 AS/400, Teradata, Sybase, MySQL, or Informix you have three options: use BRIDGE software provided by the other vendor (which is slow), use ODBC and OLE*DB drivers from Microsoft, some are fast, some are slow, or write your own for native connectivity using their SDK.

Bottom line: if you're a Microsoft only shop, SQLServer2005 and Integration services is a HUGE boost to productivity over DTS and SQLServer2000 - well worth the upgrade cost. If you're a small shop that deals with mostly flat-file transfers from other databases, it will help as well. If you’re into volume - and native connectivity, I suggest looking elsewhere, and letting Microsoft grow into this space.

More to come, did I miss something? Post a reply!

Cheers,
Dan Linstedt

  Posted by Dan Linstedt at 5:45 AM | | Comments (1)


April 21, 2006

MDM: Local Copies versus Localized Copies

What do I mean by Local versus Localized copies of Master Data? In this entry I will try to explain my definition for each. This is a short entry, and as always, comments are welcome.

I did not say "local copies" should not exist, what I did suggest is that no "localized" copies should exist. If I did actually refer to "no local copies" then shame on me, I made a mistake.

Local Copies of the data replicated Master Lists across geographically dispersed regions for fast access times, this is fine.

Localized copies of the data
Master Data that has been "changed, altered, or is different" in some way from the source of the Master Data Set. In other words, someone not only changed the master metadata (definition in how the data set is used in context - which is OK to override), but they also changed the MASTER DATA itself, to represent "their version" of the master data. This is NOT ok. It re-creates the silo's and destroys the master data purpose to span the horizontal enterprise.

I hope this clears things up.
Cheers,
Dan Linstedt
CTO, Myers-Holum, Inc
http://www.MyersHolum.com

  Posted by Dan Linstedt at 8:55 AM | | Comments (0)


April 15, 2006

Welcome to BI Vendors

This is a new category for me, typically I don't focus on "vendor specific" items, but in this category I will begin doing so. I may blog on Microsoft SQLServer2005 and it's intelligence services, or Teradata, Netezza, DatAllegro, Ipedo, MetaMatrix, Oracle, DB2 UDB, IBM Ascential, IBM Websphere or any number of other vendors that I think are doing something interesting in the market space as it pertains to Business Intelligence. All entries here will be vendor specific. Feel free to send me questions about specific vendors. Thank-you! Hope to see you out here.

  Posted by Dan Linstedt at 7:35 AM | | Comments (0)


Demystifying SoR (System of Record) and MDM

When Claudia Imhoff and Shawn Rogers and I got together for lunch the other day, we discussed this notion of SoR - it's a very interesting take. SoR has long been held as a single definition, and has been defined as residing in the source systems. Today, there are multiple definitions (3 to be exact) of SoR. Particularly since MDM evokes new notions of what SoR means to the business, as does a compliant and auditable enterprise warehouse. In this entry I'll walk through the multiple definitions of SoR. In my MDM night course in August at TDWI (2006, San Diego) I'll be discussing many of these things.

Let's start with the three types of definitions: The first definition is the widely accepted definition, based on the origination point of data (source systems). The second definition is not so well known, but for those of you with a normalized and AUDITABLE enterprise warehouse, you'll be happy to see the second definition. The third definition is based on the ever-present "single view of today's enterprise". Many people and vendors call this: "Single Version of the Truth", but truth is subjective to EACH individual user, so there's no possible way (in the purist sense) that truth actually exists!!

SoR Definition 1: The data that exists in the source system, in other words, where the data is entered, or originates for the first time. It contains a record of entry or creation or origination for the information it houses. Hopefully this system is auditable, and compliant. The data might not be "clean, quality checked, or integrated" unless it sits in some sort of master list (master data set). Most of this data is shipped across the enterprise to other systems, and to an enterprise data warehouse for integration.

SoR Definition 2: This data resides in a NORMALIZED enterprise data warehouse, and is auditable and compliant. This data is RAW Data that is INTEGRATED by business keys (See Data Vault Data Modeling). This data is NOT cleansed, altered, modified, or quality checked - but is auditable, meaning that an auditor can trace the data back to the source system from whence it came. The integration point is in fact the master-key (business key) that horizontally integrates data across the enterprise. Duplicates reside within this data set, dirty data resides within this data set, the data that resides in the normalized warehouse is captured by type of data and rate of change, and is the only place in the entire enterprise where this integrated version of uncleansed raw data exists. This data is often used by business to FIND broken business processes. If you don't have source system raw data in an integrated fashion, it will be harder to spot the trends, and the broken business processes will continue to go un-noticed. I've seen save companies millions of dollars. This is a RAW system of record, it is the only place in the enterprise where this integrated raw version exists.

SoR definition 3: Master Data, or Conformed Dimensions - data that has been cleansed, quality checked, duplicates removed - and is seen and used by the business as "single version of today's truth", it is covered by master metadata, and is understood by the organization to have meaning. In the master data set there is only a SINGLE copy of each (customer / part / work order / supplier etc..) item. This is an SoR by business standards because it represents value to the business in eliminating duplicates and understanding how the business looks "TODAY". It's a snapshot of the current consistent, and quality cleansed information that feeds the rest of the source systems.

So, as you can see - there is value to each type of System-of-Record. So what then is exactly, a system of record itself?

I would tend to suggest that a System of record has the following characteristics:
1. It's a data origination point (the only place in the enterprise where this "vision" of data exists).
2. It begins to feed other systems, providing automatic feedback to source systems, and becomming a part of the operational LOOP in business decision making.
3. In some cases it's auditable and traceable, in other cases it's quality cleansed - but in all cases it provides business value in different formats and assists the business in DOING business on a daily basis.

If you have questions or comments I'd love to hear them, please post them below.

Thank-you
Daniel Linstedt
CTO Myers-holum, Inc

  Posted by Dan Linstedt at 7:12 AM | | Comments (2)


April 14, 2006

MDM: Deciphering Vendor Hype

I've been blogging about MDM for a while now, and in my last entry I defined what Master Data and Master Metadata should be. By the way, both of those definitions along with the entry has been certified by Bill Inmon, and Clive Finkelstein as the standard definitions for MDM. In my sense of adventure I decided to take a look at 10 different vendors, what they claim MDM to be, how they define it (if they define it), and how they claim to implement it. What I discovered is not that shocking, MDM SOFTWARE: BUYER BE-WARE!!

WARNING: THIS ENTRY IS NOT FOR THE FAINT OF HEART, IT IS MY BIASED OPINION ON WHAT MDM REALLY IS VERSUS WHAT VENDORS CLAIM IT TO BE. I'm not saying that vendors are all wrong or bad, quite the contrary - I'm saying that while Master data vendors have good software and provide ROI, not all solutions are built to meet your needs, and the marketing hype would have you believe otherwise.

Are you a vendor? Please feel free to comment, to counter any of my opinions with facts. I'd love to learn more about your specific solution.

Are you a customer of an MDM "tool"? Please feel free to comment, share your experiences - anonymously if desired. I want to see where the tools have worked for you, and where they have not.

**** DISCLAIMER ******
I Have not received vendor demonstrations from any of the vendors, all I have done is read through their web-sites and looked at what they "claim", and looked for supporting information on their sites to see if they've included people, business process, governance, compliance, integration, master data, and metadata management. Please take all this advice with a grain of salt. The purpose of this entry is to raise awareness of the customer base, and to have you ask questions of the vendors so that your business expectations can be set appropriately.

First off, MDM is a business process, what the vendors are really selling is a piece of Master Data Management, the mechanical part of integrating, cleansing, and quality checking the master data. Most do not offer Business Rules Integration, applied Data Mining, registries, web services (to meet SOA), EAI, EII, ETL/ELT, and RDBMS. I've listed (as a comment) to my last entry, a slew of vendor URL's that I've looked at. For reference, I'll list them again here:

LIST OF VENDORS SELLING MDM SOFTWARE SOLUTIONS:
www.ibm.com
www.sas.com
www.DataMirror.com
www.i2.com/solutionareas/sixone/scos/mdm.cfm
www.kalido.com/products/mdm
www.hyperion.com/products/bi_platform/ core_data_integration/mdm_index.cfm
www.Stratature.com
www.Gemstone.com
www.SilverCreekSystems.com
www.Purisma.com
www.Nimaya.com
www.Netics.com
www.datafoundations.com
www.ObjectRiver.net
https://www.sdn.sap.com/irj/sdn/developerareas/mdm
http://ibm.ascential.com/solutions/master_data_management.html
www.metamatrix.com/pages/solutions/master_data_management.htm

In no particular order. Now let's look at a few of their definitions to see just what they say MDM is to them:

SAS:

Creating a master data environment enables organizations to provide a single source of truth around which enterprise systems can be synchronized...Reusable business rules clean, standardize and enhance data as it moves into the master reference file so all information is accurate.

IBM:

IBM WebSphere Information Integration is the master data integration offering that delivers authoritative master data for any industry or business function…support the full master data lifecycle…IBM defines MDM as the set of disciplines, technologies, and solutions used to create and maintain consistent, complete, contextual and accurate business data for all stakeholders

Kalido:

KALIDO 8M is an enterprise-wide master data management software solution for harmonizing, storing and managing master data over time…The master data management software produces a master data warehouse from which "golden-copy" master data can be distributed to enterprise applications and business people throughout the organization

Now I've listed a few vendors, let's talk about the pros and cons of each vendor (taking from additional inormation on their web sites).
SAS:
Pros:
* They have an embedded data mining capability, and are best of breed for data mining (separate module)
* Embedded ETL engine (if you purchase this module)
* GUI integration (separate module)
* Reporting Engine (separate Module)
* They handle large scale data sets

Cons: (according to industry analyst groups)
* They are not best of breed when it comes to web-services
* They are not best of breed when it comes to answering SOA
* They are not best of breed when it comes to ETL
* They do not have EAI or EII embedded (or so it seems)
* They are a code driven solution
* They are not best of breed for Enterprise Master Metadata
* They do not appear to have a pure business rules processing/management engine like ILOG or JRULES
* Their web site does not provide enough surrounding information to describe their implementation methodology regarding the "data management" and "governance" processes needed to fully implement MDM.
* They want you to believe that "ETL" is Master Data Management, they are touting old-tools under a new skin without including Data Mining, Information Quality, Business Process Management, Compliance, Governance, EAI, and EII as a part of their solution.

IBM:
Pros:
* BIG COMPANY, Lots of information (freely available) on how they handle Governance, Data Management, Metadata and Enterprise Metadata Management.
* Implementation methodology documented (in overview form) for Master Data Management
* Include EAI (websphere / MQ series) as their solution
* Include Ascential Quality Stage for data quality/scrubbing and consolidation
* Include Ascential Meta Stage for Metadata
* Include Web Services (websphere) for Production and handling of master data
* Include Registries (websphere) for production and handling of master data

Cons:
* In one diagram and implementation methodology claim that WebSphere and EAI is the entire solution for MDM, in another diagram and description, they describe additional needs for Quality Stage and Meta Stage
* They seem to be in conflict with themselves, no clear story as to the "Complete" vision of MDM. One document leads you to five or ten others that discuss governance, EII, EAI, Reference Tables, Information Quality, and so on. Different authors have different ideas as to what MDM really is.
* They would have you believe that Data mining, and business rules are not a part of Master Data Management (until you dig deeper into their implementation methodologies).
* Their solution seems to be: buy the entire SUITE of products from IBM, then buy all of their consulting to get the full and complete MDM solution. This is good if you have TONS of money to throw at the problem, and several (3 to 5 years) to solve the problem.
* No single tool seems to be the shining star for helping tackle the Master Data issues.

Kalido 8M:
Pros:
* Easy integration
* Easy Logical Data Model Changes
* Contains an ETL tool
* Contains a data modeling tool
* Vendor says: it contains Data Quality, Profiling, ETL, EAI, EII, Web Services, and Business workflows.

Cons:
* It is not best of breed (according to industry analyst groups) in: EII, ETL, EAI, Web Services, and Business Process Management (ILOG, JRULES).
* The vendor makes it seem like they are the SINGLE tool for the entire suite of MDM - yet they don't document how the rest of Data Management takes place.
* They are missing methodology definition, compliance, governance procedures, implementation best practices, definition of scalability into the 50TB+ range.

No single tool can be-all-end-all for the MDM architecture. Again, Master Data is one thing, Master Data Management encompasses master data, and data management. The entire MDM is an enterprise initiative involving people, process, compliance, governance, data, systems, and a variety of best of breed tool sets.

Buyer-be-ware, do your homework, interview your vendors. Sponsor your MDM initiative at the executive level, apply best practices for Project Management, SEI/CMM, follow standards for including your data warehouse as a part of your Master Data Management Initiative, and proceed. Soon, I'll have an MDM Vendor Scorecard available on our web-site at: http://www.MyersHolum.com

Are you a vendor? Please feel free to comment, to counter any of my opinions with facts. I'd love to learn more about your specific solution.

Are you a customer of an MDM "tool"? Please feel free to comment, share your experiences - anonymously if desired. I want to see where the tools have worked for you, and where they have not.

Thank-you kindly for your time,
Daniel Linstedt, CTO, Myers-Holum, Inc
Daniel.Linstedt@MyersHolum.com

  Posted by Dan Linstedt at 6:12 AM | | Comments (2)


April 7, 2006

Clarifying MDM - Setting Standards

I've had a lot of great feedback on the MDM blogs that I've been adding lately, and one kind individual sent me an email asking for a couple of things, including a definition, a practical criteria, a practical taxonomy, and to keep the picture simple enough for organizations to use. In this entry I will do my best to offer my *opinion* on the subject, I am open to comments, corrections, and thoughts from all of you - again, this will be only my opinion. Please note that my opinion is biased towards compliance, accountability of data, traceability, and accountability of business users and arises from my experiences with SEI/CMM, PMP, Six Sigma, TQM, BPR, Lean Initiatives and Cycle Time Reduction. I can't wait to hear back from you.

+ First, a definition of master data that "holds true" across all scenarios.

Ok, I'll take a crack at it - in my last blog entry I showed multiple definitions on Master Data Management from various friends. Here are my thoughts on Master Data itself:

Master Data - Information housed in a single, consistent, quality cleansed reference table; located at a single location. All elements except the surrogate key in the master data set are defined by Master Metadata at a global (enterprise, or sometimes industry wide) level. Master Data is not localized (with multiple copies), does not contain duplicates, and may reside anywhere in the corporation, but is exposed through SOA to the enterprise and possibly outside the enterprise. Master Data and Master Metadata are both delivered through a service in the SOA, Master Metadata is exposed to the service layer for interrogation of meaning of the Master Data. In other words, Master Data are reference tables that exist in a single copy somewhere in the enterprise, but with enterprise level visibility.

Master Metadata - Information describing the elements / Attributes, utilization of those attributes, which make up the Master Data Structure. These metadata are agreed upon by the business users to be universal in definition. However if localized utilization descriptions are required, then the executive staff must buy-off on the changes and overrides, and be capable of defining the impact of the change to the bottom line (financial figures). Master Metadata is enabled to be edited and maintained by business users in accordance with Governance policies created at the enterprise level. Master Metadata is also exposed through SOA to the enterprise, and possibly outside the enterprise. Master Metadata is delivered every time Master Data is delivered - they are co-joined in order to increase the level of understanding and provide context at the time of interrogation of the Master Data Set.

* By the way, Bob Seiner and I had a wonderful discussion yesterday about this topic, I owe quite a bit of this information to Bob's thought processes. He and I agree that Master Data and Master Metadata require Governance policies and procedures, along with Data Management approaches, and Data Administration roles and responsibilities. It will be nearly impossible to have a successful Enterprise Master Data / Master Metadata Management effort without the Governance embedded in the processes. You can see my first article on Governance here, on B-Eye, Bill Inmon's Newsletter.

+ Second, practical criteria for determining whether a piece of information "qualifies" as master data.

This one is a bit harder to qualify, my opinion here is based on my personal experience 12 years ago in a government organization where "master lists" where a huge part of lean initiatives, cycle time reduction, and business process understanding. In order to understand the deterministics of Master Data we should remember that frequently there exist multiple copies of this data across our source systems when we start our efforts. We should also remember that the bottom line is: it's up to the business users, NOT I.T. to fully approve master data sets and master "systems".

Deterministics for Master Data Qualification:
* Data that is entered in the source system via an application, and is assigned a FULL business key within the organization.
* These business keys are UNIQUE within a given time-slice. If they are not unique (for all time), then there may be a broken business process (business is losing money), or it isn't the right starting point for master data.
* Data doesn't exist anywhere else in the organization, yes - as unfortunate as it sounds, even Excel Spreadsheets on users desktops that organize, consolidate, and aggregate / define can be considered master data sets, or perhaps master metadata. It is our job to get these loaded in raw-detail into a data warehouse, as they stand - so that the BI tools and integration tools can roll the data for them, and the business can have better visibility as to how their business works.
* Data related to non-composite business keys. Composite business keys often reflect relationships to other data, inter-dependencies. Data related to non-composite business keys are a dead give away (like VIN number for vehicles). Of course VIN number is a SMART business key, and can be broken down at the manufacturer into additional codes, to the outside world (you and I), a VIN number is a single key that indicates master data (descriptives about the vehicle). Some composite keys can be seen or used as Master Data sets, but more often than not - most composite keys when broken into their respective components reveal more than one "master data set". Remember, that the business keys are in fact the KEYS to the business processes, and reflect how the data travels through the business, what impacts it, how it changes. Without business keys, the data in the systems are useless (they can't be searched, located, edited, or managed). I discuss more about business keys in the Data Vault data modeling concepts (Hub Table Definitions).
* Keep in mind that while master data STARTS with the business key identification, it includes the descriptive data that goes with it. Look for hierarchical structures where multiple business keys are "listed" or housed, these should be broken apart to help create the taxonomy of separate master data sets.

If Master Data is left in hierarchical format with a number of business keys embedded, chances are that over time the master list will Lose value (quickly, possibly on an exponential scale), because editing meaning, and maintaining definitive "lists" of individual master data sets will become impossible. For example a bill of material containing work-order numbers, contract numbers, and parts numbers MUST be broken into separate components, each containing their own master data set: Work Orders Master, Contracts Master, Parts Master, etc..

Then you can re-examine the locality of where EACH of these master data sets come from, where they need to reside, and what parts of the business they affect - along with the individual definitions of Master Metadata. You can also then examine the nature of the business process and really begin to understand where these "business keys" originate, how they change in the business processes, who uses which keys, and whom changes the keys as they course through the business. Again, Master Data is all about BPR and Lean Initiatives (overhead reduction, quality improvement, cycle time reduction).

There is a lot more I can say about this, and I will. I have a book on the Data Vault data modeling architecture that will be published here on B-Eye Network soon, the first two chapters are dedicated to the Business, and all the notions I'm discussing here, along with the impact of Data Architecture, and the value of clear definitions.

+ Third, a practical taxonomy (or matrix) of classes of data, for which "master data" is just one class...what are all the other classes of data...and are their inter-relations truly hierarchical, or are they matrixed (or "multi-matrixed")?

Ok we've come to the third of three questions. I began (in the last section) to discuss the normalization of hierarchical data structures according to true business keys. The business users are a great source of knowledge on the "keys" they use to track, file, and relate to the data, as are the applications they use on a daily basis. Business keys are NOT necessarily surrogate keys, or sequence numbers (although in some cases, these sequences and surrogates have been presented to the business for utilization).

In order to understand the taxonomy or matrix of master data / master metadata, we need to first understand two things: WBS (work breakdown structure), and OBS (organizational breakdown structure). Each of these structures must then be cross-hatched to create a lattice of where the work overlays the organization. Below the lattice (in each cell), are the business processes - which must be outlined, highlighted, and understood. The business processes dictate the flow of data, the origination points, and essentially the "master systems", or what should be the master systems anyhow... The taxonomy of the data follows the taxonomy of the business, and the business processes.

For example: I have an organizational structure: CEO->(VP)->(Director)->Management->Line Manager->Worker
I have a work breakdown structure which defines the roles and responsibilities of each, but shows the matrix of work, and how the work flows. There may be several WBS overlays depending on the business task at hand.

Let's say that I have the following parts of the organization: Sales, Contracts, Finance, Procurement, HR, Planning, and Manufacturing. Let's say that this organization sells very large widgets that cost billions of dollars. Let's say we discover that the business process flow is as follows:
1. Sales Sells a contract,
2. Contracts ratifies the contract and passes it to Planning for estimation.
3. Planning makes an estimate of what it will take, how long, how many people, parts, cost, etc... and passes it back to contracts
4. Contracts contacts the customer, ratifies it again, and seals the deal. Contracts passes the contract to Finance
5. Finance begins the billing cycle for each phase, and expects payment in exchange for delivery. Finance sets up charge numbers for the employees, and passes the contract to HR. Finance continues to watch, the cost, planning cycle, budgeting, etc..
6. HR authorizes specific departments and employees to work on the contract, tracks the hours to the estimate, and ensures (governs) which employees are actually working on the contract. HR Passes the contract to Planning
7. Planning re-plans the contract based on the skill set of the employees involved, and the phases that finance and contracts has setup. Planning assigns part numbers, suppliers, and contractors to the planned bill of material. Planning produces a full planned bill of material for manufacturing to build, and hands it off to Manufacturing.
8. Manufacturing then builds the widget, tracking actuals against planned hours, assigns work-orders to parts of the contract, work stations, and executes agreements with the suppliers (working with finance to ensure the right pricing and delivery grade is provided by the suppliers and contractors). Manufacturing completes the deliverable, and hands the phase I/ II / III deliverable back to Contracts and Finance to complete this portion of the transaction.
9. Contracts then needs to compare as-built to as-planned to as-contracted in order to understand if what was built and delivered was actually contracted.
10. Finance needs to compare as-sold to as-contracted to as-financed to as-planned to as-built (manufactured) in order to understand if the estimates match, if sales is selling beyond their means, if contracts has the right dollar figures, in other words: is the business losing or gaining money?
11. HR needs to compare as-assigned, to as-planned, to as-built (actuals versus estimated) for hours, employees, and variations to know how to better bid next time, and to review employee proficiency.
12. Manufacturing needs to compare as-planned to as-built to as-delivered, because parts break, are replaced, plans don't often match actuals, and the whole point is to narrow the gap between planned and built so that next time it costs less money, is easier to build, and what is delivered is what was built (less parts replacement).

This is a huge example of a business process re-engineering effort I was involved in, we created taxonomies of master data starting with "as-contracted, as-financed, as-manufactured, as-built, as-planned, etc..." each of these had different systems identified where the data flowed, was constructed and was processed. One of the things the business discovered was that EACH silo of business was using their "own" version of contract number, which made it nearly impossible to trace the entire contract through the full-cycle of the business. Our data warehouse integrated all the contract numbers, and showed where they changed, what systems changed them, and demonstrated the "consistent patterns of change" across the business keys.

Our taxonomy flowed from there - understanding the business processes (even across sectors, and across business units) is absolutely VITAL to creating taxonomies of Master Data. Discover the Business Keys, and the business key FLOW, and the taxonomy of Master Data will become clear quite quickly.

+ Trying to keep the picture simple enough, or at least organized enough, so everyone (top-to-bottom in the enterprise) can understand...and we can develop an effective information management strategy and set of approaches.

We ended up with a single PowerPoint presentation consisting of 10 slides for the executive staff and board of directors at this corporation (consisting at that time of 7 sectors, 23 companies and 150,000 employees) which showed just this one portion of business from top to bottom. Using the WBS, OBS, and BPS (business process flow), we showed the taxonomy of the business, and therefore the taxonomy of the Master Data origination points. We then organized the Master Data to "test" the hypothesis: that in fact, what were supposed to be the "master systems" actually were responsible for producing the master data.

In some cases we found that the planning system was actually producing new contract numbers when the business had told us "that is the contracts systems' job, planning should never be producing new contract numbers." We found the data problem, were able to demonstrate when in the business process this was happening, and thus the business was able to fix the problem by issuing change requests to both systems - saving time and money in the long run.

Whew, sorry about the long explanation, but I felt it necessary. If you have thoughts, or comments - I'd love to hear from you. This process works at ALL levels of data integration for master data, even government.

More questions? Send them in...
Thanks,
Dan L

  Posted by Dan Linstedt at 5:11 AM | | Comments (4)


April 6, 2006

A tribute to my mentors

I wouldn't be here, blogging for Bill today - writing articles, and teaching at conferences if it weren't for my various mentors along the way. All of whom have helped me tremendously. I want to stop for a moment, and give credit where credit is due. There are many many individuals who have helped me get to where I am today, and please forgive me if I forget to mention your name, drop me a reply or a comment - on what we did together, so that it is included as a part of this tribute.

I needed to stop and say: THANK-YOU to all those who are helping me on my journey.

First, thank-you to Claudia Imhoff. She has been a true inspiration throughout these years. She was the first to review the Data Vault Data Modeling architecture over 5 years ago, and stood up and said: "This is exactly what Bill and I have been looking for, for over 10 years." Of course she's helped me with my articles on www.TDAN.com, B-Eye-Network, she got me involved with TDWI, and encouraged me to write. She also is responsible for introducing me to Bill Inmon.

Bill Inmon, back in 1999 I was unknown - just an upstart with a lot of green - of course I still have a lot of green, and a lot to learn (always learning from all my mentors). Bill not only reviewed the Data Vault architecture, but helped me learn the true meaning of Data Warehousing. I've had the pleasure of visiting Bill at his house several times, and on numerous occasions he's graciously given me time to ask un-informed questions. Bill then let me write for his original Bill Inmon Newsletter, on Nanotechnology of all things, occasionally he entertained my articles on Data Warehousing, Data Integration and Business Intelligence. Bill and I do not always agree on what is and what isn't in the world of DW, BI and data integration - but that doesn't stop a wonderful friendship.

Bob Terdeman. I saw Bob speak at a TDWI conference in 1999, and was fascinated by his attention to detail. Back then he was the CTO for EMC. Bob is a wonderful individual and always had time to help me with the ideas of what makes a very large data store (be it a warehouse or otherwise) successful. He reviewed my ideas, and helped me discover new things about the world of smart storage and the business of presenting.

Tammy Henderson and I worked together as employees of a very large government contracting corporation. Tammy and I but heads all the time, but in a good way - she founded the center of excellence at this particular corporation, and gave me resolve to really work on the compliance, accountability, project planning tracking and oversight portions of the business. Tammy showed me around the company and introduced me to the business users - giving me access and insight to business which I would otherwise never have experienced. She challenged my ideas.

Steve Koons. Another individual at this particular organization, showed me the ropes and handled all of the classified interactions with our data warehousing effort. He took our data and our integration processes to a whole new level, clearing the way for our data model and our best practices methodology to be utilized across the organization. Steve helped me formulate the mechanisms for turning our group of IT into a profit center for our business users. He assisted us in getting more contracts than we could possibly fulfill.

Hans Hultgren. Hans asked for my help in formulating a Masters Degree program for DW/BI at Daniels College of Business, Denver University. He graciously asked me to sit in on the academic advisory board, and help him recruit many of my mentors and friends in the industry. He has given me the opportunity to give back to the community which has provided me with so much success. I thoroughly enjoy guest lecturing at DU, and our world-class board members shows it: Bill Inmon, Lowell Fryman, Hans Hultgren, Claudia Imhoff, (me), Tammy Henderson, Maureen Clarry, Kent Graziano, and a few others.

Kent Graziano, speaking of which - has assisted me in numerous ways. He's currently assisting in the editing and co-authoring of my Data Vault Data Modeling book, due out on B-Eye Network in PDF format shortly. He's been a great guide, and advocate of the Data Vault architecture since the beginning. He's also helped me with the Unversal Data Models that he, Bill and Len Silverston have made famous.

Grady Booch. Grady and I have had several conversations about the world of object oriented programming, UML class design, architecture, and where the Data Vault data modeling fits in. Grady has assisted me with many different facets of understanding in the business requirements world.

Richard Hackathorn. Richard and I go way back. Claudia introduced me to Richard many years ago, and I had the opportunity to discuss cycle time reduction, business process "squeeze" in time, and a range of topics relating to business requirements, the market space and developing interests.

Stephen Brobst and Kim Stanick. I saw Stephen teach at TDWI on high performance data warehousing techniques. Soon after I was asking him questions about parallelism, partitioning, and of course Teradata. He also spent some time reviewing the Data Vault data modeling architecture. Needless to say, he and Kim Stanick got me involved in the "Friends of Teradata Network", of which I am most greatful to be a part of.

Clive Finkelstein, one of the originators with John Zachman of the Zachman Framework - helped me see clearly on the nature of the framework, differences between architecture and implementation, and what some of the components in the framework were. Clive has also reviewed the Data Vault Data Modeling architecture, and has given his approval for it as a good architecture for implementation of one of the cells within the Zachman framework.

Wayne Eckerson. Wayne first gave me the opportunity to teach at TDWI, and has repeatedly helped me with slide edits, comments, documentation, teaching styles and improvements. Wayne has been a true inspiration for me in the speaker’s circuit world. Wayne and Richard Winter gave me the opportunity to teach VLDW at TDWI in the beginning, needless to say, Richard Winter has also provided me with valuable feedback and good friendship on what makes a good VLDW - and how they work. Lately I've been helped tremendously by Dave Wells - he's also provided wonderful feedback throughout the years, and continues to do so. All the folks at TDWI deserve a huge thank-you.

Bob Seiner - a wonderful friend and mentor. Bob has helped me with my publications along the way - edits, changes, reviews, and of course the articles on http://www.TDAN.com He's always given me good feedback and great content, lately he's been assisting me with the Knowledge Management, Metadata Management, and Master Data Management components.

Calla Knopman - the one friend who stuck by me during the development phases of the Data Vault way back in 1995 through 2000. She pushed me to write about it, share it, finish the thought processes. She's always been a good sounding board for me, and continues to be one of the best friends I have.

Shawn Rogers, Ron Powell, Ketherine Drewek, and all the good people at B-Eye Network. They continue to provide me with world-class opportunity to publish, write, and espouse my opinions. They are a huge part of my publishing success today.

There are many, many more people who've helped me along the way, including: Dan Chatten, Jeff Hild, Barb Darrow, Bill Wesley Brown, James O'Bannon, Peter Aiken, David Marco, Dan Sullivan, Harm Van Der Lek, Chris Busch, Juan Jose Van Der Linden, Scott Crownover, Randy Law, Kevin Goodfellow, Ben Isenhour, Matt Duncan, Kim Dossey, Frank Sparacino, Henry Morris, Jill Dyche, Jeff Jonas, and all my friends at different vendors.

Last but not least, I must thank my current employer Myers Holum, Inc for supporting me - allowing me to continue my writing career, while earning a living.

Thank-you for allowing me to provide you with this tribute to my life journey.

  Posted by Dan Linstedt at 3:25 PM | | Comments (0)


April 3, 2006

Understanding MDM: Rough Seas and Muddy Water

I recently blogged on this subject, and then I asked my friends in the industry what they thought MDM meant to them. Not surprisingly, I received a number of different answers. I'll post their answers in this entry, and then discuss what I believe MDM to be. David Loshin recently wrote a new article on MDM and the basics of MDM, some of which I agree with, other parts I disagree with - I'll post here what my thoughts are on this subject as well. In the next blog entry we'll dive a little deeper and work at defining specifics for Master Data.

Bill Inmon on MDM:

I have always thought of MDM as reference tables. You can extend the definition a little and include other kinds of metadata I suppose. But I think people are stretching MDM way out of proportion.

Stephen Brobst:

I would extend MDM beyond just dimensional data.

I like to think of data in three categories:

* Data that keeps track of events (e.g., sales detail, deposit/withdrawal txns, CDRs, etc.)
* Data that keeps track of the “state” of an entity (e.g., customer, account, product, etc.)
* Data that keeps track of “relationships” (e.g., the relationship between products and categories, persons and households, etc.)

MDM generally encompasses the second category and the third category as it applies to the second category. In other words, MDM keeps track of “state” entities and relationships between “state” entities. But not events or relationships to events.

Claudia Imhoff:
Claudia and I had a discussion about MDM, I will have a quote for the next entry I make, and she and my good friend Colin White are currently writing a research study on MDM, due out soon. I expect this to be extremely helpful. However, in paraphrasing what Claudia and I discussed, here is the general gist:

Master Data Management should be thought of as a process, a business process, not as a tool set or something that a vendor sells. Master Data exists as reference tables throughout the enterprise, be-it on source systems, data marts, data warehouses, staging areas, or operational data stores. It doesn't matter where the tables reside, just that they do reside within the organization as a centralized master list of information containing both the KEY and the DESCRIPTIVE components of what that key represent today.

David Loshin:

Master data objects are those core business objects that are used in the different applications across the organization, along with their associated metadata, attributes, definitions, roles, connections and taxonomies. Master data objects are those "things" that we care about - the things that are logged in our transaction systems, measured and reported on in our reporting systems, and analyzed in our analytical systems. Common examples of master data include:

* Customers
* Suppliers
* Parts
* Products
* Locations
* Contact mechanisms

Ok, I can see where all the confusion comes from. The father of data warehousing and my good friend Bill Inmon has said quite plainly that MDM is nothing more than REFERENCE TABLES, which can include references to anything you want. Centralized information, centralized metadata, and centralized definitions - all quality checked, integrated and providing singularity for data attributes by their corresponding key.

MDM is NOT:
* A data warehouse
* A data Mart
* An Operational Data Store
* A single source system
* A Tool, or technology per-say

MDM IS:
* A governance process, a business driven process, a consolidation of localized information used for reference by the business to better understand their business processes.

Yes, MDM includes Governance, Data Administration, and parts of Data Management. It also includes the data itself as the "master list" of reference material. We've had these concepts for years, especially back on the mainframes. I remember 12 to 15 years ago working with "master parts lists, master contracts lists, master work-orders lists, master employees lists", and of course "master systems" that are supposed to dictate the creation and management of the data set, dispersing it to "child systems" - of course this is all part of what lead to integration mass confusion (nightmarish over-night synchronization processes that were supposed to keep the master lists in synch across the different systems).

I just read a Knightsbridge paper that stated that "localized copies of master data are OK." I beg to differ. Today's networks are fast enough and global definitions (metadata synchronization) is better done these days, good enough to create and locate a SINGLE MDM data source for reference information. The point is to avoid the future "fights" over who's right and who's wrong with their "copies" of the data sets. Integration NOT DIS-INTEGRATION is what we are trying to achieve with MDM.

If you need to warehouse your Master Lists, then do so, but fight the good fight to consolidate the metadata definitions, and consolidate the data sets across the same semantically defined layers (see Data Vault Data Modeling). At the end of your MDM initiative you should be able to identify true MASTER data stores that are not localized, that can be utilized horizontally and vertically across your organization without fear. Quality data and single copies of information are what we want. Utilizing the Data Vault Data Modeling architectures, you can easily encompass the MDM definition of all four individuals above.

In my next blog entry I'll go through a few things like creating a single definition for MDM,

How are you implementing MDM? Which of these definitions matches your business?

Come to TDWI in August, San Diego, share your MDM experience in my MDM night course.

Thanks,
Dan L

  Posted by Dan Linstedt at 5:15 AM | | Comments (3)