Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

About this blog >

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can’t wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

Recently in Service Oriented Architecture Category

Bill Inmon and I sat down the other day to discuss a system that we are building. We didn't have a good "name" for it, but what it amounts to is: Operational Data Warehousing. If you can believe it, what we've done is taken the Operational specifics of systems capturing data - and placed it on top of the Data Warehouse as a single integrated historical and operational data store. We are currently using the Data Vault model for this componentry. Some folks have called this "Active Data Warehousing" in the past, but we feel that this is one step beyond, in that it actually IS the operational store at the same time as being the Data Warehouse. Convergence has arrived...

I've blogged about convergence in the past, it's no secret that the world is converging, and I.T. is no different. It is also no secret that EDW technology is converging with operational technology. Well, if we look behind us (20/20 is always best) we can see the divergence path of data warehousing and operational systems, and the re-convergence of these systems. Active Data Warehousing coupled with SOA, and real-time alerts coming back from the ADW have begun to turn the tables.

We have closed the gap on this one. Using the principles of the Data Vault modeling (http://www.DanLinstedt.com) we've constructed an Operational Data Warehouse (right now, Bill and I do not have a better term for this, Bill also thought that this is a new approach).

What does Operational Data Warehouse do?
One way to describe it is as an Operational Data Store with history.

Another way to describe it is: as a data warehouse with operational (raw) data.

Why do it this way?
Well for one, it provides traceability in all the data. Bringing in the RAW operational data over a web-service (as generated by the upstream machines), provides us with accountability, auditability and pure traceability. By utilizing the notions of the HUB entity within the Data Vault structures, we achieve horizontal integration across the data sets. This operational data warehouse is front-ended by web-services, and has direct integration in to the business processes. It is not fed with any sort of "batch" system, it is however pre-loaded with master data.

The structures of the Data Vault have been setup within the databases in such a way as to allow tremendous scalability and flexibility. We have physically partitioned the machines for security purposes, and scalability purposes. We can join 800M rows to 300M rows to 100M rows, and bring back 10k rows in under 10 seconds when we know what we're looking for. This setup is housed on SQLServer2005 on Windows 2003 r2, with 32 bit, 2 dual core CPU's at 2.8 GHZ, 2GB RAM.

So what's this got to do with Operational Data Warehousing?
Plenty. Operational data warehouses (a very lose term today) consist of the following requirements:
* Must be accountable
* Must be auditable
* Must be a system-of-record
* Must interact with other operational systems
* Must house operational data
* Must house historical data
* Must NOT separate operational data from historical data in the data store.
* Must be the SOURCE for a major business function
* Must be real-time (can have batch feeds, but must be real-time in data streams)
* Must be part of the business process flows.

So what are the technical requirements?
* Must be scalable
* Must be flexible
* Must NOT break history when the business changes/data models change
* Must NOT break existing data feeds when the model changes
* Must be FAST access, fast insert, etc...

And of course it MUST follow the DW2.0 requirements:
* Must have historical data
* Must not be "updated" directly (would break auditability)
* Must maintain cross-functional relationships
* Must be GRANULAR (to the absolute lowest level of grain available)
* Must provide strategic and tactical value
* Must include indexes/pointers/links to unstructured information

So what? How do I get there?
We've used the Data Vault data modeling to get there. It meets all these needs and has been blessed by Bill Inmon as the "optimal choice for DW2.0" data modeling. Because of the structures, along with the foundational approaches to loading the Data Vault, and what the data in the Data Vault represent - we've been able to construct the system described above. In fact, we have two of these up and running. One in our facilities in Denver, and one in Washington DC.

So you mean to say there "is no operational system"?
There is partially, there are many "machines" that collect the information operationally, and pass it back to our Operational Data Warehouse (Data Vault), but - they do not house the information after they've released it to us. The ODW Data Vault actually stores all the operational information from around the country, and soon - around the world.

Next time we'll dive in a little deeper as to what it means to construct one of these, and how they work.

You might already have one of these, if you do - I'd love to hear about it. As always, thoughts, comments, corrections, are welcome.

Cheers,
Dan Linstedt


Posted February 25, 2008 8:26 AM
Permalink | 2 Comments |

Governance is an industry buzzword these days, with all the SOA initiatives going on, one would think that Governance would be on the top of the list as well. If you're not governing your enterprise consolidation, you probably are not taking full advantage of the benefits and cost savings that could be coming your way. Sure governance is an uphill battle in the beginning, sure everyone fights standards and agreed standards, and yes - absolutely - no one can seem to decide on how to define the common data sets (common data model). But if you're involved in, or working with SOA it is imperative to engage governance at the enterprise level. However it's not just governance that makes it work, a formal methodology should be utilized to assist with the governance as the organization organically grows its efforts. These include: ITIL, SEI/CMMI and a few others.

I've defined different kinds of governance in my articles here on B-Eye Network in the past, just for re-iteration, I'll define the governance again:

Governance
n 1: the persons (or committees or departments etc.) who make up a body for the purpose of administering something; "he claims that the present administration is corrupt"; "the governance of an association is responsible to its members"; "he quickly became recognized as a member of the establishment" [syn: administration, governing body, establishment, brass, organization, organisation] 2: the act of governing; exercising authority; "regulations for the governing of state prisons"; "he had considerable experience of government" [syn: government, governing, government activity] Webster’s Definition of Governance, http://dictionary.reference.com/search?q=governance

IT Governance
IT governance ensures IT-related decisions match company-wide objectives by establishing mechanisms for linking objectives to measurable goals. IT governance is the decision rights and accountability framework for encouraging desirable behavior in the use of IT. IT Governance: http://www.faculty.de.gcsu.edu/~dyoung/MMIS-6393/Reading-IT-Governance-defined.htm

Data Governance
Is a combination of people, process, and technology required to support the ongoing management of the enterprise wide data that will be centralized. (my definition)

SOA Governance
SOA Governance is the ability to ensure that all of the independent efforts (whether in the design, development, deployment, or operations of a Service) come together to meet the enterprise SOA requirements. … Including SOA Policies, Auditing and Conformance, Management (track, review, improve), and integration. SOA Governance: http://www.weblayers.com/gcn/whitepapers/Introduction_to_SOA_Governance.pdf

So what is SEI/CMMI in the first place?
Capability Maturity Model Integration (CMMI) is a process improvement approach that provides organizations with the essential elements of effective processes. http://www.sei.cmu.edu/cmmi/general/general.html

And how does it affect my governance?
It can BE your governance guidelines. SEI/CMMI has (built-in) governance procedures. All governance requires that the efforts be monitored, measured, and of course estimated before started. There are all kinds of estimations that must take place ranging from RISK analysis, to implementation ability, to project lifecycle, and implementation time. If you aren't estimating, then measuring (something akin to Earned Value Management), then you are not exercising effective governance, and certainly - you cannot improve on what you cannot measure.

What are some of the groups involved in governance?
governanceImage.jpg
Diagram adapted from: http://www-128.ibm.com/developerworks/webservices/library/ws-improvesoa/

MDM Initiatives are like any other, they will require governance to be executed properly. In fact, any initiative that “serves” at the enterprise level should fall under an over-laying governance initiative. Master Data and Master Metadata are highly visible, therefore: high risk = high return = high visibility = high pressure to do it right. This means that Governance cannot and should not be ignored when addressing the MDM initiative. Again, MASTER DATA and MASTER METADATA serve the entire enterprise.

Central governance:
Best for the enterprise. The governance council has representation from each service domain (more on this later) and from subject matter experts who can speak to the people who implement key technological components of the solution. The central governance council reviews any additions or deletions to the list of services, along with changes to existing services, before authorizing the implementation of such changes.

Distributed governance:
Best for distributed teams. Each business unit has control over how it provides the services within its own organization. This requires a functional service domain approach. A central committee can provide guidelines and standards to different teams.

Guiding principles:


  • Reuse, granularity, modularity, composability, and componentization
  • Compliance to standards (both common and industry-specific)
  • Services identification and categorization, provisioning and delivery, and monitoring and tracking

Specific architectural principles:


  • Encapsulation
  • Separation of business logic from the underlying technology
  • Single implementation and enterprise-view of components
  • Leveraging existing assets wherever an opportunity exists
  • Life cycle management
  • Efficient use of system resources
  • Service maturity and performance

Why should I utilize SEI/CMMI as my methodology guide for Governance?


  • About 95% of companies have a formal IT strategy which in most cases is "reasonably“ aligned to the business strategy (~80%) Source: The Compass World IT Strategy Census 2001
  • However, research concludes that there is no evidence that IT spending levels positively correlate to companies’ productivity (IT productivity paradox) Source: Information Productivity, P. Strassmann 1999, Information Economic Press
  • Paradoxically, 80% of strategic decisions related to IT are only based on “gut feeling“ Source: Gartner Symposium News Preview 2002, Florence, Italy
  • Businesses typically waste 20 % of corporate IT budgets on investments which fail to achieve their objectives Source: Gartner Symposium News Preview 2002, Florence, Italy
  • The companies that manage their IT most successfully generate returns as much as 40% higher than their competitors Source: Accenture project experience

CMMI Helps reduce and control IT Spending, in other words, CMMI is GOVERNANCE for IT in action!

Come see my TDWI presentation in Las Vegas, February 2007 on Governance, Compliance, and CMMI principles. I'm also teaching "Defining and Understanding MDM"

As always, I'd love to hear your experiences, positive and negative about governance principles. Please comment!

Cheers,
Dan Linstedt
CTO, Myers-Holum, Inc
http://www.MyersHolum.com


Posted December 19, 2006 6:53 AM
Permalink | No Comments |

The question? What does the new business initiative really need to focus on?

Today's business initiatives seem to be headed in many different directions, from SOA to MDM to registries, and business processes. The issue is that when different initiatives take on different directions (rather than a consolidated view and set of drivers) they all end up at different destinations. The cost is heart-ache, silo'd solutions, and a maintenance nightmare. The bottom line is that there is convergence afoot. I've written about this over the past 5 years in my convergence articles on TDAN, B-Eye Network, and Teradata Magazine. In this entry we'll explore what business should do, and how they should approach these very different initiatives (all with a common goal).

MDM - Master Data Management
MMDM - Master Metadata Management
SOA - Service Oriented Architectures
Registries - well, registries of web-services, taxonomies and hierarchies of access points, names, and security access restrictions, I guess one could say more metadata...
BPEL - Business Process Execution Language
BPM - Business Process Management

And of course the tools of the trade:
EAI - Enterprise Application Integration
EII - Enterprise Information Integration
ETL/ELT - Extract Transform / Load
RDBMS - Relational Database Management System

Ok now that we got that out of the way... Businesses have been divesting their interests for years (at least when it comes to I.T. projects). It's time to get a little convergence back into the mix. Businesses who start separate initiatives for each of the categories above will quickly find that they end up with one or more of the following:

* Silo'd answer sets
* Silo'd information assets
* Argumentative Fiefdoms within the kingdom (arguing over who's right and who's wrong and who has the best answers).
* IT Constrained Business - disparate projects, tons of sunk cost, high maintenance overhead
* Inconsistent standards
* Missing best practices
* Holes in the I.T. security wall (all over the place)
* Lack of IT business initiative
* Poorly motivated IT employees

And so on... Executive staff should realize that the good things in life don't come cheap, or easy. After all, they've worked extremely hard to get where they are. IT is no different, and should be treated as a single operational business unit. IT's initiatives should be aligned, but in a way that allows IT to work together rather than against each other.

So you've heard this all before have you?
I'm sure you have - it's been printed in the magazines for years, lately it was called IT alignment. Let's get back to the issues shall we?

What does this have to do with lining up: MDM, MMDM, SOA, and Registries?
Everything. Businesses today should establish an overriding IT umbrella, that umbrella is in fact, an SOA initiative. One way to think about it is: IT is a service based organization, SOA is a service based architecture from which automated services make business information, processes and descriptions available (on-demand) to the business. Let's just say SOA does for IT what JIT does for manufacturing and supply chains.

Underneath the SOA are Master Data, Master Metadata, Web Services, Registries, Auditability, EDW, OLTP, data marts, and Information Integration. All of these are the components necessary to make SOA a success. But remember, SOA is a journey not a destination - just like alignment of IT is a continuous process (it never ends).

So what do all of these have in common?
* Shared business insight
* Shared executive level sponsorship
* Shared information and data sets
* Shared asset base
* Shared security model
* Shared business processes
* Shared Metadata
* Common information dissemination model

From a project standpoint:
* Shared milestones
* Shared Risks
* Shared training
* Shared knowledge

There is also a certain dependency (order) in which these items must be executed. If one is left out of the process chain, then the business stands to suffer at the end of the day. Convergence is upon us, and real-time (active), metadata (descriptive), data sets (asset base), registries (organization of all data and metadata underneath), security and services (access layers) are all a part of the enterprise initiative to bring IT in to focus.

More to come on this topic - if you have questions, I'd like to try to answer them. Feel free to ask publicly or privately.

Cheers,
Dan Linstedt
CTO, Myers-Holum, Inc


Posted May 15, 2006 5:26 AM
Permalink | 2 Comments |

When Claudia Imhoff and Shawn Rogers and I got together for lunch the other day, we discussed this notion of SoR - it's a very interesting take. SoR has long been held as a single definition, and has been defined as residing in the source systems. Today, there are multiple definitions (3 to be exact) of SoR. Particularly since MDM evokes new notions of what SoR means to the business, as does a compliant and auditable enterprise warehouse. In this entry I'll walk through the multiple definitions of SoR. In my MDM night course in August at TDWI (2006, San Diego) I'll be discussing many of these things.

Let's start with the three types of definitions: The first definition is the widely accepted definition, based on the origination point of data (source systems). The second definition is not so well known, but for those of you with a normalized and AUDITABLE enterprise warehouse, you'll be happy to see the second definition. The third definition is based on the ever-present "single view of today's enterprise". Many people and vendors call this: "Single Version of the Truth", but truth is subjective to EACH individual user, so there's no possible way (in the purist sense) that truth actually exists!!

SoR Definition 1: The data that exists in the source system, in other words, where the data is entered, or originates for the first time. It contains a record of entry or creation or origination for the information it houses. Hopefully this system is auditable, and compliant. The data might not be "clean, quality checked, or integrated" unless it sits in some sort of master list (master data set). Most of this data is shipped across the enterprise to other systems, and to an enterprise data warehouse for integration.

SoR Definition 2: This data resides in a NORMALIZED enterprise data warehouse, and is auditable and compliant. This data is RAW Data that is INTEGRATED by business keys (See Data Vault Data Modeling). This data is NOT cleansed, altered, modified, or quality checked - but is auditable, meaning that an auditor can trace the data back to the source system from whence it came. The integration point is in fact the master-key (business key) that horizontally integrates data across the enterprise. Duplicates reside within this data set, dirty data resides within this data set, the data that resides in the normalized warehouse is captured by type of data and rate of change, and is the only place in the entire enterprise where this integrated version of uncleansed raw data exists. This data is often used by business to FIND broken business processes. If you don't have source system raw data in an integrated fashion, it will be harder to spot the trends, and the broken business processes will continue to go un-noticed. I've seen save companies millions of dollars. This is a RAW system of record, it is the only place in the enterprise where this integrated raw version exists.

SoR definition 3: Master Data, or Conformed Dimensions - data that has been cleansed, quality checked, duplicates removed - and is seen and used by the business as "single version of today's truth", it is covered by master metadata, and is understood by the organization to have meaning. In the master data set there is only a SINGLE copy of each (customer / part / work order / supplier etc..) item. This is an SoR by business standards because it represents value to the business in eliminating duplicates and understanding how the business looks "TODAY". It's a snapshot of the current consistent, and quality cleansed information that feeds the rest of the source systems.

So, as you can see - there is value to each type of System-of-Record. So what then is exactly, a system of record itself?

I would tend to suggest that a System of record has the following characteristics:
1. It's a data origination point (the only place in the enterprise where this "vision" of data exists).
2. It begins to feed other systems, providing automatic feedback to source systems, and becomming a part of the operational LOOP in business decision making.
3. In some cases it's auditable and traceable, in other cases it's quality cleansed - but in all cases it provides business value in different formats and assists the business in DOING business on a daily basis.

If you have questions or comments I'd love to hear them, please post them below.

Thank-you
Daniel Linstedt
CTO Myers-holum, Inc


Posted April 15, 2006 7:12 AM
Permalink | 2 Comments |

I've been blogging about MDM for a while now, and in my last entry I defined what Master Data and Master Metadata should be. By the way, both of those definitions along with the entry has been certified by Bill Inmon, and Clive Finkelstein as the standard definitions for MDM. In my sense of adventure I decided to take a look at 10 different vendors, what they claim MDM to be, how they define it (if they define it), and how they claim to implement it. What I discovered is not that shocking, MDM SOFTWARE: BUYER BE-WARE!!

WARNING: THIS ENTRY IS NOT FOR THE FAINT OF HEART, IT IS MY BIASED OPINION ON WHAT MDM REALLY IS VERSUS WHAT VENDORS CLAIM IT TO BE. I'm not saying that vendors are all wrong or bad, quite the contrary - I'm saying that while Master data vendors have good software and provide ROI, not all solutions are built to meet your needs, and the marketing hype would have you believe otherwise.

Are you a vendor? Please feel free to comment, to counter any of my opinions with facts. I'd love to learn more about your specific solution.

Are you a customer of an MDM "tool"? Please feel free to comment, share your experiences - anonymously if desired. I want to see where the tools have worked for you, and where they have not.

**** DISCLAIMER ******
I Have not received vendor demonstrations from any of the vendors, all I have done is read through their web-sites and looked at what they "claim", and looked for supporting information on their sites to see if they've included people, business process, governance, compliance, integration, master data, and metadata management. Please take all this advice with a grain of salt. The purpose of this entry is to raise awareness of the customer base, and to have you ask questions of the vendors so that your business expectations can be set appropriately.

First off, MDM is a business process, what the vendors are really selling is a piece of Master Data Management, the mechanical part of integrating, cleansing, and quality checking the master data. Most do not offer Business Rules Integration, applied Data Mining, registries, web services (to meet SOA), EAI, EII, ETL/ELT, and RDBMS. I've listed (as a comment) to my last entry, a slew of vendor URL's that I've looked at. For reference, I'll list them again here:

LIST OF VENDORS SELLING MDM SOFTWARE SOLUTIONS:
www.ibm.com
www.sas.com
www.DataMirror.com
www.i2.com/solutionareas/sixone/scos/mdm.cfm
www.kalido.com/products/mdm
www.hyperion.com/products/bi_platform/ core_data_integration/mdm_index.cfm
www.Stratature.com
www.Gemstone.com
www.SilverCreekSystems.com
www.Purisma.com
www.Nimaya.com
www.Netics.com
www.datafoundations.com
www.ObjectRiver.net
https://www.sdn.sap.com/irj/sdn/developerareas/mdm
http://ibm.ascential.com/solutions/master_data_management.html
www.metamatrix.com/pages/solutions/master_data_management.htm

In no particular order. Now let's look at a few of their definitions to see just what they say MDM is to them:

SAS:

Creating a master data environment enables organizations to provide a single source of truth around which enterprise systems can be synchronized...Reusable business rules clean, standardize and enhance data as it moves into the master reference file so all information is accurate.

IBM:

IBM WebSphere Information Integration is the master data integration offering that delivers authoritative master data for any industry or business function…support the full master data lifecycle…IBM defines MDM as the set of disciplines, technologies, and solutions used to create and maintain consistent, complete, contextual and accurate business data for all stakeholders

Kalido:

KALIDO 8M is an enterprise-wide master data management software solution for harmonizing, storing and managing master data over time…The master data management software produces a master data warehouse from which "golden-copy" master data can be distributed to enterprise applications and business people throughout the organization

Now I've listed a few vendors, let's talk about the pros and cons of each vendor (taking from additional inormation on their web sites).
SAS:
Pros:
* They have an embedded data mining capability, and are best of breed for data mining (separate module)
* Embedded ETL engine (if you purchase this module)
* GUI integration (separate module)
* Reporting Engine (separate Module)
* They handle large scale data sets

Cons: (according to industry analyst groups)
* They are not best of breed when it comes to web-services
* They are not best of breed when it comes to answering SOA
* They are not best of breed when it comes to ETL
* They do not have EAI or EII embedded (or so it seems)
* They are a code driven solution
* They are not best of breed for Enterprise Master Metadata
* They do not appear to have a pure business rules processing/management engine like ILOG or JRULES
* Their web site does not provide enough surrounding information to describe their implementation methodology regarding the "data management" and "governance" processes needed to fully implement MDM.
* They want you to believe that "ETL" is Master Data Management, they are touting old-tools under a new skin without including Data Mining, Information Quality, Business Process Management, Compliance, Governance, EAI, and EII as a part of their solution.

IBM:
Pros:
* BIG COMPANY, Lots of information (freely available) on how they handle Governance, Data Management, Metadata and Enterprise Metadata Management.
* Implementation methodology documented (in overview form) for Master Data Management
* Include EAI (websphere / MQ series) as their solution
* Include Ascential Quality Stage for data quality/scrubbing and consolidation
* Include Ascential Meta Stage for Metadata
* Include Web Services (websphere) for Production and handling of master data
* Include Registries (websphere) for production and handling of master data

Cons:
* In one diagram and implementation methodology claim that WebSphere and EAI is the entire solution for MDM, in another diagram and description, they describe additional needs for Quality Stage and Meta Stage
* They seem to be in conflict with themselves, no clear story as to the "Complete" vision of MDM. One document leads you to five or ten others that discuss governance, EII, EAI, Reference Tables, Information Quality, and so on. Different authors have different ideas as to what MDM really is.
* They would have you believe that Data mining, and business rules are not a part of Master Data Management (until you dig deeper into their implementation methodologies).
* Their solution seems to be: buy the entire SUITE of products from IBM, then buy all of their consulting to get the full and complete MDM solution. This is good if you have TONS of money to throw at the problem, and several (3 to 5 years) to solve the problem.
* No single tool seems to be the shining star for helping tackle the Master Data issues.

Kalido 8M:
Pros:
* Easy integration
* Easy Logical Data Model Changes
* Contains an ETL tool
* Contains a data modeling tool
* Vendor says: it contains Data Quality, Profiling, ETL, EAI, EII, Web Services, and Business workflows.

Cons:
* It is not best of breed (according to industry analyst groups) in: EII, ETL, EAI, Web Services, and Business Process Management (ILOG, JRULES).
* The vendor makes it seem like they are the SINGLE tool for the entire suite of MDM - yet they don't document how the rest of Data Management takes place.
* They are missing methodology definition, compliance, governance procedures, implementation best practices, definition of scalability into the 50TB+ range.

No single tool can be-all-end-all for the MDM architecture. Again, Master Data is one thing, Master Data Management encompasses master data, and data management. The entire MDM is an enterprise initiative involving people, process, compliance, governance, data, systems, and a variety of best of breed tool sets.

Buyer-be-ware, do your homework, interview your vendors. Sponsor your MDM initiative at the executive level, apply best practices for Project Management, SEI/CMM, follow standards for including your data warehouse as a part of your Master Data Management Initiative, and proceed. Soon, I'll have an MDM Vendor Scorecard available on our web-site at: http://www.MyersHolum.com

Are you a vendor? Please feel free to comment, to counter any of my opinions with facts. I'd love to learn more about your specific solution.

Are you a customer of an MDM "tool"? Please feel free to comment, share your experiences - anonymously if desired. I want to see where the tools have worked for you, and where they have not.

Thank-you kindly for your time,
Daniel Linstedt, CTO, Myers-Holum, Inc
Daniel.Linstedt@MyersHolum.com


Posted April 14, 2006 6:12 AM
Permalink | 2 Comments |

I've had a lot of great feedback on the MDM blogs that I've been adding lately, and one kind individual sent me an email asking for a couple of things, including a definition, a practical criteria, a practical taxonomy, and to keep the picture simple enough for organizations to use. In this entry I will do my best to offer my *opinion* on the subject, I am open to comments, corrections, and thoughts from all of you - again, this will be only my opinion. Please note that my opinion is biased towards compliance, accountability of data, traceability, and accountability of business users and arises from my experiences with SEI/CMM, PMP, Six Sigma, TQM, BPR, Lean Initiatives and Cycle Time Reduction. I can't wait to hear back from you.

+ First, a definition of master data that "holds true" across all scenarios.

Ok, I'll take a crack at it - in my last blog entry I showed multiple definitions on Master Data Management from various friends. Here are my thoughts on Master Data itself:

Master Data - Information housed in a single, consistent, quality cleansed reference table; located at a single location. All elements except the surrogate key in the master data set are defined by Master Metadata at a global (enterprise, or sometimes industry wide) level. Master Data is not localized (with multiple copies), does not contain duplicates, and may reside anywhere in the corporation, but is exposed through SOA to the enterprise and possibly outside the enterprise. Master Data and Master Metadata are both delivered through a service in the SOA, Master Metadata is exposed to the service layer for interrogation of meaning of the Master Data. In other words, Master Data are reference tables that exist in a single copy somewhere in the enterprise, but with enterprise level visibility.

Master Metadata - Information describing the elements / Attributes, utilization of those attributes, which make up the Master Data Structure. These metadata are agreed upon by the business users to be universal in definition. However if localized utilization descriptions are required, then the executive staff must buy-off on the changes and overrides, and be capable of defining the impact of the change to the bottom line (financial figures). Master Metadata is enabled to be edited and maintained by business users in accordance with Governance policies created at the enterprise level. Master Metadata is also exposed through SOA to the enterprise, and possibly outside the enterprise. Master Metadata is delivered every time Master Data is delivered - they are co-joined in order to increase the level of understanding and provide context at the time of interrogation of the Master Data Set.

* By the way, Bob Seiner and I had a wonderful discussion yesterday about this topic, I owe quite a bit of this information to Bob's thought processes. He and I agree that Master Data and Master Metadata require Governance policies and procedures, along with Data Management approaches, and Data Administration roles and responsibilities. It will be nearly impossible to have a successful Enterprise Master Data / Master Metadata Management effort without the Governance embedded in the processes. You can see my first article on Governance here, on B-Eye, Bill Inmon's Newsletter.

+ Second, practical criteria for determining whether a piece of information "qualifies" as master data.

This one is a bit harder to qualify, my opinion here is based on my personal experience 12 years ago in a government organization where "master lists" where a huge part of lean initiatives, cycle time reduction, and business process understanding. In order to understand the deterministics of Master Data we should remember that frequently there exist multiple copies of this data across our source systems when we start our efforts. We should also remember that the bottom line is: it's up to the business users, NOT I.T. to fully approve master data sets and master "systems".

Deterministics for Master Data Qualification:
* Data that is entered in the source system via an application, and is assigned a FULL business key within the organization.
* These business keys are UNIQUE within a given time-slice. If they are not unique (for all time), then there may be a broken business process (business is losing money), or it isn't the right starting point for master data.
* Data doesn't exist anywhere else in the organization, yes - as unfortunate as it sounds, even Excel Spreadsheets on users desktops that organize, consolidate, and aggregate / define can be considered master data sets, or perhaps master metadata. It is our job to get these loaded in raw-detail into a data warehouse, as they stand - so that the BI tools and integration tools can roll the data for them, and the business can have better visibility as to how their business works.
* Data related to non-composite business keys. Composite business keys often reflect relationships to other data, inter-dependencies. Data related to non-composite business keys are a dead give away (like VIN number for vehicles). Of course VIN number is a SMART business key, and can be broken down at the manufacturer into additional codes, to the outside world (you and I), a VIN number is a single key that indicates master data (descriptives about the vehicle). Some composite keys can be seen or used as Master Data sets, but more often than not - most composite keys when broken into their respective components reveal more than one "master data set". Remember, that the business keys are in fact the KEYS to the business processes, and reflect how the data travels through the business, what impacts it, how it changes. Without business keys, the data in the systems are useless (they can't be searched, located, edited, or managed). I discuss more about business keys in the Data Vault data modeling concepts (Hub Table Definitions).
* Keep in mind that while master data STARTS with the business key identification, it includes the descriptive data that goes with it. Look for hierarchical structures where multiple business keys are "listed" or housed, these should be broken apart to help create the taxonomy of separate master data sets.

If Master Data is left in hierarchical format with a number of business keys embedded, chances are that over time the master list will Lose value (quickly, possibly on an exponential scale), because editing meaning, and maintaining definitive "lists" of individual master data sets will become impossible. For example a bill of material containing work-order numbers, contract numbers, and parts numbers MUST be broken into separate components, each containing their own master data set: Work Orders Master, Contracts Master, Parts Master, etc..

Then you can re-examine the locality of where EACH of these master data sets come from, where they need to reside, and what parts of the business they affect - along with the individual definitions of Master Metadata. You can also then examine the nature of the business process and really begin to understand where these "business keys" originate, how they change in the business processes, who uses which keys, and whom changes the keys as they course through the business. Again, Master Data is all about BPR and Lean Initiatives (overhead reduction, quality improvement, cycle time reduction).

There is a lot more I can say about this, and I will. I have a book on the Data Vault data modeling architecture that will be published here on B-Eye Network soon, the first two chapters are dedicated to the Business, and all the notions I'm discussing here, along with the impact of Data Architecture, and the value of clear definitions.

+ Third, a practical taxonomy (or matrix) of classes of data, for which "master data" is just one class...what are all the other classes of data...and are their inter-relations truly hierarchical, or are they matrixed (or "multi-matrixed")?

Ok we've come to the third of three questions. I began (in the last section) to discuss the normalization of hierarchical data structures according to true business keys. The business users are a great source of knowledge on the "keys" they use to track, file, and relate to the data, as are the applications they use on a daily basis. Business keys are NOT necessarily surrogate keys, or sequence numbers (although in some cases, these sequences and surrogates have been presented to the business for utilization).

In order to understand the taxonomy or matrix of master data / master metadata, we need to first understand two things: WBS (work breakdown structure), and OBS (organizational breakdown structure). Each of these structures must then be cross-hatched to create a lattice of where the work overlays the organization. Below the lattice (in each cell), are the business processes - which must be outlined, highlighted, and understood. The business processes dictate the flow of data, the origination points, and essentially the "master systems", or what should be the master systems anyhow... The taxonomy of the data follows the taxonomy of the business, and the business processes.

For example: I have an organizational structure: CEO->(VP)->(Director)->Management->Line Manager->Worker
I have a work breakdown structure which defines the roles and responsibilities of each, but shows the matrix of work, and how the work flows. There may be several WBS overlays depending on the business task at hand.

Let's say that I have the following parts of the organization: Sales, Contracts, Finance, Procurement, HR, Planning, and Manufacturing. Let's say that this organization sells very large widgets that cost billions of dollars. Let's say we discover that the business process flow is as follows:
1. Sales Sells a contract,
2. Contracts ratifies the contract and passes it to Planning for estimation.
3. Planning makes an estimate of what it will take, how long, how many people, parts, cost, etc... and passes it back to contracts
4. Contracts contacts the customer, ratifies it again, and seals the deal. Contracts passes the contract to Finance
5. Finance begins the billing cycle for each phase, and expects payment in exchange for delivery. Finance sets up charge numbers for the employees, and passes the contract to HR. Finance continues to watch, the cost, planning cycle, budgeting, etc..
6. HR authorizes specific departments and employees to work on the contract, tracks the hours to the estimate, and ensures (governs) which employees are actually working on the contract. HR Passes the contract to Planning
7. Planning re-plans the contract based on the skill set of the employees involved, and the phases that finance and contracts has setup. Planning assigns part numbers, suppliers, and contractors to the planned bill of material. Planning produces a full planned bill of material for manufacturing to build, and hands it off to Manufacturing.
8. Manufacturing then builds the widget, tracking actuals against planned hours, assigns work-orders to parts of the contract, work stations, and executes agreements with the suppliers (working with finance to ensure the right pricing and delivery grade is provided by the suppliers and contractors). Manufacturing completes the deliverable, and hands the phase I/ II / III deliverable back to Contracts and Finance to complete this portion of the transaction.
9. Contracts then needs to compare as-built to as-planned to as-contracted in order to understand if what was built and delivered was actually contracted.
10. Finance needs to compare as-sold to as-contracted to as-financed to as-planned to as-built (manufactured) in order to understand if the estimates match, if sales is selling beyond their means, if contracts has the right dollar figures, in other words: is the business losing or gaining money?
11. HR needs to compare as-assigned, to as-planned, to as-built (actuals versus estimated) for hours, employees, and variations to know how to better bid next time, and to review employee proficiency.
12. Manufacturing needs to compare as-planned to as-built to as-delivered, because parts break, are replaced, plans don't often match actuals, and the whole point is to narrow the gap between planned and built so that next time it costs less money, is easier to build, and what is delivered is what was built (less parts replacement).

This is a huge example of a business process re-engineering effort I was involved in, we created taxonomies of master data starting with "as-contracted, as-financed, as-manufactured, as-built, as-planned, etc..." each of these had different systems identified where the data flowed, was constructed and was processed. One of the things the business discovered was that EACH silo of business was using their "own" version of contract number, which made it nearly impossible to trace the entire contract through the full-cycle of the business. Our data warehouse integrated all the contract numbers, and showed where they changed, what systems changed them, and demonstrated the "consistent patterns of change" across the business keys.

Our taxonomy flowed from there - understanding the business processes (even across sectors, and across business units) is absolutely VITAL to creating taxonomies of Master Data. Discover the Business Keys, and the business key FLOW, and the taxonomy of Master Data will become clear quite quickly.

+ Trying to keep the picture simple enough, or at least organized enough, so everyone (top-to-bottom in the enterprise) can understand...and we can develop an effective information management strategy and set of approaches.

We ended up with a single PowerPoint presentation consisting of 10 slides for the executive staff and board of directors at this corporation (consisting at that time of 7 sectors, 23 companies and 150,000 employees) which showed just this one portion of business from top to bottom. Using the WBS, OBS, and BPS (business process flow), we showed the taxonomy of the business, and therefore the taxonomy of the Master Data origination points. We then organized the Master Data to "test" the hypothesis: that in fact, what were supposed to be the "master systems" actually were responsible for producing the master data.

In some cases we found that the planning system was actually producing new contract numbers when the business had told us "that is the contracts systems' job, planning should never be producing new contract numbers." We found the data problem, were able to demonstrate when in the business process this was happening, and thus the business was able to fix the problem by issuing change requests to both systems - saving time and money in the long run.

Whew, sorry about the long explanation, but I felt it necessary. If you have thoughts, or comments - I'd love to hear from you. This process works at ALL levels of data integration for master data, even government.

More questions? Send them in...
Thanks,
Dan L


Posted April 7, 2006 5:11 AM
Permalink | 5 Comments |

Recently I've been asked about Active Data Warehousing, and (Real-time) Right Time Data Warehousing, what do these mean to the enterprise? In this short blog entry, I offer my opinion on the definition of each. In future entries I will define the basics of building one, the questions to ask, and potential value to the enterprise. I lead an effort in Active Data Warehousing, and Right Time Data Warehousing for Myers-Holum, Inc. We have best practices surrounding these efforts, and will soon offer tips and tricks for free on our site.

Too often, we are confused by marketing literature and vendor hype. I'm going to set the line and offer my opinion in DEFINING just what an Active Data Warehouse is, and just what a Right Time Data Warehouse is. Are they different? Yes, why? Well, we'll get to that in a minute. For now, this is the way I define each:

Active Data Warehousing (ADW)
The technical ability to capture transactions when they change, and integrate them in to the warehouse - along with maintaining batch or scheduled cycle refreshes.

Right-Time Data Warehouse (RTDW) (Not REAL-TIME)
The ability to answer a specific justifiable business question at the time in which it is asked. In other words: a pre-designed business question that requires heaps of pre-integrated data (from the warehouse) in order to answer the question. The answer to the question drives a competitive business decision or a decision that already has a cost / benefit analysis tied to the answer (in other words: quantifiable answer).

RTDW: If the business needs to answer a question at the end of the day, every day - then a RTDW would refresh on a daily basis. If the business needs to answer a question every 30 minutes, then an RTDW would refresh every 30 minutes - assuming the data is available.

Is an RTDW an Active Data Warehouse?
In my opinion: not always. The way I see it: ADW refreshes WHEN TRANSACTIONS CHANGE (they also combine pre-scheduled batch cycles). An RTDW is an ADW when the two are in sync - in other words, if transactions change every 10 minutes, and are captured and integrated into the warehouse as they change, and I have a business question that must be answered every 10 minutes, then I have an ADW and an RTDW.

Is an ADW also an RTDW?
Not necessarily, although 99% of the time, yes - it should be. Why? Because it costs a lot of money to build and feed an ADW, the ADW shouldn't be constructed without solid quantifiable business questions, and in doing so - thus answer the RTDW criteria.

So then, what is a Real-Time Data Warehouse?
I've blogged before on my interpretation of Real-Time data warehousing, I still maintain that it does not exist, and that timing just can't be fast enough (due to laws of physics) to make Real-Time decisions. That of course is the technical definition.

Is the business definition of Real-Time different than Right-Time?
Yes, there is a distinction between the two. Real-Time is more geared towards what I've defined as ADW (here), than it is Right-Time (as defined here).

Please don't mince words when going forward. Develop best-practices, metadata definitions, and terminology standards (business metadata). Too often our businesses are confused by all the vendor hype and marketing material that "throws a term in" just because it sounds cool.

I'd love to hear your thoughts on this topic, even if you disagree. How do you define Real-Time, Right-Time, and Active DW?

Thanks,
Dan L


Posted January 30, 2006 12:47 PM
Permalink | 4 Comments |

In the interest of SOA, and on my search for governance lately, I've been looking at System Of Registry (SoR) and what it means. If you've got an SOA project, or would like to build one, or maybe you're looking at Master Data Management (MDM) or metadata stewardship, or data stewardship then you might be interested in understanding basic registries and systems of registries.

In the SOA/EII world there has been a lot of buzz about SoR and what it can provide, some vendors offer software to answer this call, and state that their SoR software helps you build a complete solution when it's integrated with other efforts. What does this mean? How can an SoR help you? Why would you want one?

SoR (System of Registry) can be likened to a Taxonomy or classification of information. There is a good reference definition for UDDI and Web Services SoR here. For example, given a Ferrari, or a Mustang, or a Chevy Malibu, these are all in a class called "cars." If you were to construct a location (search) service, you might setup a class for searching cars, trucks, motorcycles, and so on. Each of these classes is it's own SoR, and when combined under one label: Search for Motorized Vehicles, you've got the start of a master taxonomy (master classification).

By adding specific makes and models, and production years you can break each class into respective and locatable physical items. This helps complete the taxonomy (aka: System of Registry). It's very much like the Windows "Registry" that keeps all of your programs running and organized in such a manner that Microsoft Windows can reference them.

What does a SoR do for EII and SOA, and ETL, and EAI or any integration platform for that matter?
The SoR is responsible for housing (at a minimum) the following elements:
* Name
* Class
* Business Description
* Date/Time of registration
* Active / Inactive
* Accessibility (Security/control/allowed access)
* Encrypted
* Keyed
* Defined Inputs
* Defined Outputs
* Availability (when is it available)
* Last time it was updated
* Version

These are just a few of the basic components embedded in an SoR (specifically for SERVICES). Integration Services should provide these basic components no matter what; it's all metadata, but important metadata. Some of these elements are "protected" metadata and can only be accessed by authorized parties. Of course there's always the HISTORY of the SoR - when it was accessed, who accessed it, what the inputs and outputs were.

So how can an SoR help you? Why would you want one?
An SoR can help you organize, classify, and manage the metadata across your organization. By the way, the ultimate working Requirements Document is in fact - a System of Registry!! Interesting how that works out, it just so happens that in order to meet governance and compliance - an SoR of business requirements can be of tremendous help, see my upcoming series on Governance in the Enterprise for more information. The SoR helps manage a multitude of access points and data elements from within an easy to use classification system. It saves on the bottom line of implementation costs, rework costs, management and risk reduction, along with a few others.

So what about an SoR for SOA/Web-Services?
Well, SOA is architecture - an abstraction, not a product. There are products within the SOA definition, but a product is NOT an SOA. See my article here for more info. SoR for Web-Services will track the above list and more. It is vital (if not critical) to combine the SoR implementation with compliance and governance initiatives, this will provide longevity to the SoR and keep it from "going out of date" or falling out of interest with the executive staff of the organization.

As web-services are vast and numerous, and the grain can vary from service to service, it is wise to develop a classification (taxonomy) to manage, organize and maintain the web-services, the SoR can help with that. A central SoR should be:
* Web Enabled (management, updates, and additions)
* Flexible (based on security for management)
* Autonomic in Discovery of undocumented web-services across the enterprise
* Visible and understandable all the way to the board of directors (they may choose to look at only the classifications)
* GUI Driven with Icons, and definitions - single click drill down
* BONUS: Visualization of the classification, number of elements registered, and number of elements "active" with frequency graphs.

Whether you're starting an SOA "project", building an MDM strategy, creating compliance and governance initiatives or integrating your Enterprise Warehouse with your active data warehousing strategies, it is considered a best practice to build an SoR along the way. My company specializes in bringing best-practices surrounding these types of efforts, and can help you kick start your projects. Avoid the pitfalls, and don't re-invent the wheels.

What are your thoughts about SoR? Do you have any comments about specific software in this area? What do you like / dislike about the software?

See you next time,
Dan L


Posted January 26, 2006 5:33 AM
Permalink | No Comments |

It is vital in any EII implementation to MANAGE YOUR METADATA. Well, what the heck does that mean? That's a big definition, but it certainly encompasses the ability to manage your services from a GUI perspective, manage the interaction of the API's under the covers, and the accessibility of the EII queries. At a process level it may mean to handle your web-services with ease.

Systinet has been doing this for a while now, and they've gotten good at it. There are a number of software resources out there in this "young" market for managing registries, but Systinet was well known among them. In particular they've been utilized by a number of EII vendors in the market space. As with any advancing technology it is important to have a plan, and implementation strategy, and a set of best practices which utilize the best of breed tools going forward.

Well, the good news is that Systinet provides this kind of thing. The not-so-good news (for EII vendors who partnered with them) is that Systinet has been purchased by Mercury Interactive for $105M.

http://www.newratings.com/analyst_news/article_1175005.html

Good for Mercury Interactive, bad for EII vendors who use their tool set. Once upon a time there were lots of ETL vendors, all these vendors and several other data movement players were using Striva to access their data. Striva got HOT, so hot that Informatica purchased them, and thus ends the story - the other vendors had been "Striva'd"... if you can turn that into a verb.

The last thing any EII vendor needs today is to have this scenario play out again, but it just has. In order to make EII a better business proposition, a system of registries is recommended. I would suggest that any EII vendor out there who's listening take heed: it's time to roll your own, this is product functionality that will add to the bottom line valuation of your company, along with the business proposition - and to have an integrated GUI from which to manage it all would be wondrous. Of course, hold the horses a bit - because if an enterprise already has a System of Registries package, they'll want to integrate. If you roll your own - be sure to include an API that can exchange the information bi-directionally.

If you are NOT a vendor, and are looking at implementing an EII solution, I would strongly urge you to take a look at the success stories spelled out in CIO magazine, most of these recommend a system-of-registries component be in place as a part of the critical architecture.

Do you have a "story" about a system-of-registries and EII interaction? Let us hear it!

Cheers,
Dan L


Posted January 24, 2006 1:40 PM
Permalink | No Comments |

In this entry I will explore some futuristic capabilities (a wish list) of features that I would like to see EII work towards. The real questions are beginning to surface about EII and ETL / ETLT and EAI, there are other questions about web-services, security, standardization, and the best practices needed for implementation of SOA around the enterprise. Let's take a look at the feature set that may be needed via an EII tool in the near future.

What are some of the business problems that EII solves compared to ETL and EAI?
* Access to "now data", current view of transactions across multiple disparate source systems
* Management of Metadata (currently mostly meta-models) for "conforming" of the data model across the enterprise. In this manner, it may actually assist in the development of the data warehouse of the future.
* Dynamic integration of unstructured and semi-structured data
* Real-Time / Right-Time reporting

Technical Problems that EII solves
* Access to XML, XQuery, XPATH data and documents.
* Access to Web Services
* Access to semi-structured and unstructured data sets
* Control over publication of Web-Services
* Definition of consistent enterprise metadata
* GUI Development Interface for web-services

What we need is a single tool, a single interface to handle a much more broad set of requirements. EII has such a narrow scope right now (because most EII tools are just now coming into the second generation), that additional functionality is necessary to really take a chunk of the market space. For instance, a huge potential exists for a very strong single GUI in an EII tool to manage, maintain, and help define UDDI registries (in other words manage the web-services through metadata). Today, there appear to be partnerships between EII vendors and "Registry" vendors. This is good, but won't remain a differentiator for long.

Wish list of features
* Virtual Tables
* Registry (UDDI) management and integration
* Automated Query Tuning
* Two-Phase commit across sources that allow write-back
* Management of Security Policies
* Business Metadata and Ontology Support
* Additional bi-directional metadata interface, particularly to work with MetaIntegration


The next generation of EII tool will have to extend it's metadata reach, into business metadata, across process metadata, and down into Web Services Management and maintenance. It will have to add Version Control of Registries, web-services as a whole, security policies, and so on. Why? The EII space will need to continue to take chunks of "very hard domain problems" and show enterprise information integration with their solution. They will need to focus LESS on transformation (although that will remain a key function), and focus MORE on additional TYPES of information integration.

Their ability to truly integrate the enterprise and ALL of it's data (not necessarily in volume, but remaining true to the notions of currency) will have a huge impact IF this information can also be managed. Reaching into new domains of information integration will help EII grow into a major player in the implementation space.

SOA is growing, best practices are being developed, web-services and EII are major players in the success of SOA. Particularly when EII can provide the management of the Web-Services and it's metadata. It's a domain that is a natural fit for EII, the EII vendor of the future will "purchase" a registry solution as their own, and will begin to differentiate beyond other vendors in this area and in what they can do with the metadata. One of the largest keys to success will be: how does the EII tool tackle the problem of "bringing that management to the end-user?"

In other words, can the tool provide enough of an end-user or business user interface to entice metadata management to take place as a natural function of business? The GUI interface and codeless solutions will become more and more important, tying the metadata to a master integrated meta-model (single view of the enterprise) will also become paramount to success. Finally, the EII tool that can communicate bi-directionally with a metadata solution will have tremendous success, as business users see added leverage for utilizing a single GUI interface to assist with true EII.

Do you agree / disagree? I'd love to hear your thoughts on the matter.

Thanks,
Dan L


Posted December 13, 2005 6:35 AM
Permalink | No Comments |