We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Recently in Service Oriented Architecture Category

Bill Inmon and I sat down the other day to discuss a system that we are building. We didn't have a good "name" for it, but what it amounts to is: Operational Data Warehousing. If you can believe it, what we've done is taken the Operational specifics of systems capturing data - and placed it on top of the Data Warehouse as a single integrated historical and operational data store. We are currently using the Data Vault model for this componentry. Some folks have called this "Active Data Warehousing" in the past, but we feel that this is one step beyond, in that it actually IS the operational store at the same time as being the Data Warehouse. Convergence has arrived...

I've blogged about convergence in the past, it's no secret that the world is converging, and I.T. is no different. It is also no secret that EDW technology is converging with operational technology. Well, if we look behind us (20/20 is always best) we can see the divergence path of data warehousing and operational systems, and the re-convergence of these systems. Active Data Warehousing coupled with SOA, and real-time alerts coming back from the ADW have begun to turn the tables.

We have closed the gap on this one. Using the principles of the Data Vault modeling (http://www.DanLinstedt.com) we've constructed an Operational Data Warehouse (right now, Bill and I do not have a better term for this, Bill also thought that this is a new approach).

What does Operational Data Warehouse do?
One way to describe it is as an Operational Data Store with history.

Another way to describe it is: as a data warehouse with operational (raw) data.

Why do it this way?
Well for one, it provides traceability in all the data. Bringing in the RAW operational data over a web-service (as generated by the upstream machines), provides us with accountability, auditability and pure traceability. By utilizing the notions of the HUB entity within the Data Vault structures, we achieve horizontal integration across the data sets. This operational data warehouse is front-ended by web-services, and has direct integration in to the business processes. It is not fed with any sort of "batch" system, it is however pre-loaded with master data.

The structures of the Data Vault have been setup within the databases in such a way as to allow tremendous scalability and flexibility. We have physically partitioned the machines for security purposes, and scalability purposes. We can join 800M rows to 300M rows to 100M rows, and bring back 10k rows in under 10 seconds when we know what we're looking for. This setup is housed on SQLServer2005 on Windows 2003 r2, with 32 bit, 2 dual core CPU's at 2.8 GHZ, 2GB RAM.

So what's this got to do with Operational Data Warehousing?
Plenty. Operational data warehouses (a very lose term today) consist of the following requirements:
* Must be accountable
* Must be auditable
* Must be a system-of-record
* Must interact with other operational systems
* Must house operational data
* Must house historical data
* Must NOT separate operational data from historical data in the data store.
* Must be the SOURCE for a major business function
* Must be real-time (can have batch feeds, but must be real-time in data streams)
* Must be part of the business process flows.

So what are the technical requirements?
* Must be scalable
* Must be flexible
* Must NOT break history when the business changes/data models change
* Must NOT break existing data feeds when the model changes
* Must be FAST access, fast insert, etc...

And of course it MUST follow the DW2.0 requirements:
* Must have historical data
* Must not be "updated" directly (would break auditability)
* Must maintain cross-functional relationships
* Must be GRANULAR (to the absolute lowest level of grain available)
* Must provide strategic and tactical value
* Must include indexes/pointers/links to unstructured information

So what? How do I get there?
We've used the Data Vault data modeling to get there. It meets all these needs and has been blessed by Bill Inmon as the "optimal choice for DW2.0" data modeling. Because of the structures, along with the foundational approaches to loading the Data Vault, and what the data in the Data Vault represent - we've been able to construct the system described above. In fact, we have two of these up and running. One in our facilities in Denver, and one in Washington DC.

So you mean to say there "is no operational system"?
There is partially, there are many "machines" that collect the information operationally, and pass it back to our Operational Data Warehouse (Data Vault), but - they do not house the information after they've released it to us. The ODW Data Vault actually stores all the operational information from around the country, and soon - around the world.

Next time we'll dive in a little deeper as to what it means to construct one of these, and how they work.

You might already have one of these, if you do - I'd love to hear about it. As always, thoughts, comments, corrections, are welcome.

Dan Linstedt

Posted February 25, 2008 8:26 AM
Permalink | 2 Comments |

Governance is an industry buzzword these days, with all the SOA initiatives going on, one would think that Governance would be on the top of the list as well. If you're not governing your enterprise consolidation, you probably are not taking full advantage of the benefits and cost savings that could be coming your way. Sure governance is an uphill battle in the beginning, sure everyone fights standards and agreed standards, and yes - absolutely - no one can seem to decide on how to define the common data sets (common data model). But if you're involved in, or working with SOA it is imperative to engage governance at the enterprise level. However it's not just governance that makes it work, a formal methodology should be utilized to assist with the governance as the organization organically grows its efforts. These include: ITIL, SEI/CMMI and a few others.

I've defined different kinds of governance in my articles here on B-Eye Network in the past, just for re-iteration, I'll define the governance again:

n 1: the persons (or committees or departments etc.) who make up a body for the purpose of administering something; "he claims that the present administration is corrupt"; "the governance of an association is responsible to its members"; "he quickly became recognized as a member of the establishment" [syn: administration, governing body, establishment, brass, organization, organisation] 2: the act of governing; exercising authority; "regulations for the governing of state prisons"; "he had considerable experience of government" [syn: government, governing, government activity] Webster’s Definition of Governance, http://dictionary.reference.com/search?q=governance

IT Governance
IT governance ensures IT-related decisions match company-wide objectives by establishing mechanisms for linking objectives to measurable goals. IT governance is the decision rights and accountability framework for encouraging desirable behavior in the use of IT. IT Governance: http://www.faculty.de.gcsu.edu/~dyoung/MMIS-6393/Reading-IT-Governance-defined.htm

Data Governance
Is a combination of people, process, and technology required to support the ongoing management of the enterprise wide data that will be centralized. (my definition)

SOA Governance
SOA Governance is the ability to ensure that all of the independent efforts (whether in the design, development, deployment, or operations of a Service) come together to meet the enterprise SOA requirements. … Including SOA Policies, Auditing and Conformance, Management (track, review, improve), and integration. SOA Governance: http://www.weblayers.com/gcn/whitepapers/Introduction_to_SOA_Governance.pdf

So what is SEI/CMMI in the first place?
Capability Maturity Model Integration (CMMI) is a process improvement approach that provides organizations with the essential elements of effective processes. http://www.sei.cmu.edu/cmmi/general/general.html

And how does it affect my governance?
It can BE your governance guidelines. SEI/CMMI has (built-in) governance procedures. All governance requires that the efforts be monitored, measured, and of course estimated before started. There are all kinds of estimations that must take place ranging from RISK analysis, to implementation ability, to project lifecycle, and implementation time. If you aren't estimating, then measuring (something akin to Earned Value Management), then you are not exercising effective governance, and certainly - you cannot improve on what you cannot measure.

What are some of the groups involved in governance?
Diagram adapted from: http://www-128.ibm.com/developerworks/webservices/library/ws-improvesoa/

MDM Initiatives are like any other, they will require governance to be executed properly. In fact, any initiative that “serves” at the enterprise level should fall under an over-laying governance initiative. Master Data and Master Metadata are highly visible, therefore: high risk = high return = high visibility = high pressure to do it right. This means that Governance cannot and should not be ignored when addressing the MDM initiative. Again, MASTER DATA and MASTER METADATA serve the entire enterprise.

Central governance:
Best for the enterprise. The governance council has representation from each service domain (more on this later) and from subject matter experts who can speak to the people who implement key technological components of the solution. The central governance council reviews any additions or deletions to the list of services, along with changes to existing services, before authorizing the implementation of such changes.

Distributed governance:
Best for distributed teams. Each business unit has control over how it provides the services within its own organization. This requires a functional service domain approach. A central committee can provide guidelines and standards to different teams.

Guiding principles:

  • Reuse, granularity, modularity, composability, and componentization
  • Compliance to standards (both common and industry-specific)
  • Services identification and categorization, provisioning and delivery, and monitoring and tracking

Specific architectural principles:

  • Encapsulation
  • Separation of business logic from the underlying technology
  • Single implementation and enterprise-view of components
  • Leveraging existing assets wherever an opportunity exists
  • Life cycle management
  • Efficient use of system resources
  • Service maturity and performance

Why should I utilize SEI/CMMI as my methodology guide for Governance?

  • About 95% of companies have a formal IT strategy which in most cases is "reasonably“ aligned to the business strategy (~80%) Source: The Compass World IT Strategy Census 2001
  • However, research concludes that there is no evidence that IT spending levels positively correlate to companies’ productivity (IT productivity paradox) Source: Information Productivity, P. Strassmann 1999, Information Economic Press
  • Paradoxically, 80% of strategic decisions related to IT are only based on “gut feeling“ Source: Gartner Symposium News Preview 2002, Florence, Italy
  • Businesses typically waste 20 % of corporate IT budgets on investments which fail to achieve their objectives Source: Gartner Symposium News Preview 2002, Florence, Italy
  • The companies that manage their IT most successfully generate returns as much as 40% higher than their competitors Source: Accenture project experience

CMMI Helps reduce and control IT Spending, in other words, CMMI is GOVERNANCE for IT in action!

Come see my TDWI presentation in Las Vegas, February 2007 on Governance, Compliance, and CMMI principles. I'm also teaching "Defining and Understanding MDM"

As always, I'd love to hear your experiences, positive and negative about governance principles. Please comment!

Dan Linstedt
CTO, Myers-Holum, Inc

Posted December 19, 2006 6:53 AM
Permalink | No Comments |

The question? What does the new business initiative really need to focus on?

Today's business initiatives seem to be headed in many different directions, from SOA to MDM to registries, and business processes. The issue is that when different initiatives take on different directions (rather than a consolidated view and set of drivers) they all end up at different destinations. The cost is heart-ache, silo'd solutions, and a maintenance nightmare. The bottom line is that there is convergence afoot. I've written about this over the past 5 years in my convergence articles on TDAN, B-Eye Network, and Teradata Magazine. In this entry we'll explore what business should do, and how they should approach these very different initiatives (all with a common goal).

MDM - Master Data Management
MMDM - Master Metadata Management
SOA - Service Oriented Architectures
Registries - well, registries of web-services, taxonomies and hierarchies of access points, names, and security access restrictions, I guess one could say more metadata...
BPEL - Business Process Execution Language
BPM - Business Process Management

And of course the tools of the trade:
EAI - Enterprise Application Integration
EII - Enterprise Information Integration
ETL/ELT - Extract Transform / Load
RDBMS - Relational Database Management System

Ok now that we got that out of the way... Businesses have been divesting their interests for years (at least when it comes to I.T. projects). It's time to get a little convergence back into the mix. Businesses who start separate initiatives for each of the categories above will quickly find that they end up with one or more of the following:

* Silo'd answer sets
* Silo'd information assets
* Argumentative Fiefdoms within the kingdom (arguing over who's right and who's wrong and who has the best answers).
* IT Constrained Business - disparate projects, tons of sunk cost, high maintenance overhead
* Inconsistent standards
* Missing best practices
* Holes in the I.T. security wall (all over the place)
* Lack of IT business initiative
* Poorly motivated IT employees

And so on... Executive staff should realize that the good things in life don't come cheap, or easy. After all, they've worked extremely hard to get where they are. IT is no different, and should be treated as a single operational business unit. IT's initiatives should be aligned, but in a way that allows IT to work together rather than against each other.

So you've heard this all before have you?
I'm sure you have - it's been printed in the magazines for years, lately it was called IT alignment. Let's get back to the issues shall we?

What does this have to do with lining up: MDM, MMDM, SOA, and Registries?
Everything. Businesses today should establish an overriding IT umbrella, that umbrella is in fact, an SOA initiative. One way to think about it is: IT is a service based organization, SOA is a service based architecture from which automated services make business information, processes and descriptions available (on-demand) to the business. Let's just say SOA does for IT what JIT does for manufacturing and supply chains.

Underneath the SOA are Master Data, Master Metadata, Web Services, Registries, Auditability, EDW, OLTP, data marts, and Information Integration. All of these are the components necessary to make SOA a success. But remember, SOA is a journey not a destination - just like alignment of IT is a continuous process (it never ends).

So what do all of these have in common?
* Shared business insight
* Shared executive level sponsorship
* Shared information and data sets
* Shared asset base
* Shared security model
* Shared business processes
* Shared Metadata
* Common information dissemination model

From a project standpoint:
* Shared milestones
* Shared Risks
* Shared training
* Shared knowledge

There is also a certain dependency (order) in which these items must be executed. If one is left out of the process chain, then the business stands to suffer at the end of the day. Convergence is upon us, and real-time (active), metadata (descriptive), data sets (asset base), registries (organization of all data and metadata underneath), security and services (access layers) are all a part of the enterprise initiative to bring IT in to focus.

More to come on this topic - if you have questions, I'd like to try to answer them. Feel free to ask publicly or privately.

Dan Linstedt
CTO, Myers-Holum, Inc

Posted May 15, 2006 5:26 AM
Permalink | 2 Comments |

When Claudia Imhoff and Shawn Rogers and I got together for lunch the other day, we discussed this notion of SoR - it's a very interesting take. SoR has long been held as a single definition, and has been defined as residing in the source systems. Today, there are multiple definitions (3 to be exact) of SoR. Particularly since MDM evokes new notions of what SoR means to the business, as does a compliant and auditable enterprise warehouse. In this entry I'll walk through the multiple definitions of SoR. In my MDM night course in August at TDWI (2006, San Diego) I'll be discussing many of these things.

Let's start with the three types of definitions: The first definition is the widely accepted definition, based on the origination point of data (source systems). The second definition is not so well known, but for those of you with a normalized and AUDITABLE enterprise warehouse, you'll be happy to see the second definition. The third definition is based on the ever-present "single view of today's enterprise". Many people and vendors call this: "Single Version of the Truth", but truth is subjective to EACH individual user, so there's no possible way (in the purist sense) that truth actually exists!!

SoR Definition 1: The data that exists in the source system, in other words, where the data is entered, or originates for the first time. It contains a record of entry or creation or origination for the information it houses. Hopefully this system is auditable, and compliant. The data might not be "clean, quality checked, or integrated" unless it sits in some sort of master list (master data set). Most of this data is shipped across the enterprise to other systems, and to an enterprise data warehouse for integration.

SoR Definition 2: This data resides in a NORMALIZED enterprise data warehouse, and is auditable and compliant. This data is RAW Data that is INTEGRATED by business keys (See Data Vault Data Modeling). This data is NOT cleansed, altered, modified, or quality checked - but is auditable, meaning that an auditor can trace the data back to the source system from whence it came. The integration point is in fact the master-key (business key) that horizontally integrates data across the enterprise. Duplicates reside within this data set, dirty data resides within this data set, the data that resides in the normalized warehouse is captured by type of data and rate of change, and is the only place in the entire enterprise where this integrated version of uncleansed raw data exists. This data is often used by business to FIND broken business processes. If you don't have source system raw data in an integrated fashion, it will be harder to spot the trends, and the broken business processes will continue to go un-noticed. I've seen save companies millions of dollars. This is a RAW system of record, it is the only place in the enterprise where this integrated raw version exists.

SoR definition 3: Master Data, or Conformed Dimensions - data that has been cleansed, quality checked, duplicates removed - and is seen and used by the business as "single version of today's truth", it is covered by master metadata, and is understood by the organization to have meaning. In the master data set there is only a SINGLE copy of each (customer / part / work order / supplier etc..) item. This is an SoR by business standards because it represents value to the business in eliminating duplicates and understanding how the business looks "TODAY". It's a snapshot of the current consistent, and quality cleansed information that feeds the rest of the source systems.

So, as you can see - there is value to each type of System-of-Record. So what then is exactly, a system of record itself?

I would tend to suggest that a System of record has the following characteristics:
1. It's a data origination point (the only place in the enterprise where this "vision" of data exists).
2. It begins to feed other systems, providing automatic feedback to source systems, and becomming a part of the operational LOOP in business decision making.
3. In some cases it's auditable and traceable, in other cases it's quality cleansed - but in all cases it provides business value in different formats and assists the business in DOING business on a daily basis.

If you have questions or comments I'd love to hear them, please post them below.

Daniel Linstedt
CTO Myers-holum, Inc

Posted April 15, 2006 7:12 AM
Permalink | 3 Comments |

I've been blogging about MDM for a while now, and in my last entry I defined what Master Data and Master Metadata should be. By the way, both of those definitions along with the entry has been certified by Bill Inmon, and Clive Finkelstein as the standard definitions for MDM. In my sense of adventure I decided to take a look at 10 different vendors, what they claim MDM to be, how they define it (if they define it), and how they claim to implement it. What I discovered is not that shocking, MDM SOFTWARE: BUYER BE-WARE!!

WARNING: THIS ENTRY IS NOT FOR THE FAINT OF HEART, IT IS MY BIASED OPINION ON WHAT MDM REALLY IS VERSUS WHAT VENDORS CLAIM IT TO BE. I'm not saying that vendors are all wrong or bad, quite the contrary - I'm saying that while Master data vendors have good software and provide ROI, not all solutions are built to meet your needs, and the marketing hype would have you believe otherwise.

Are you a vendor? Please feel free to comment, to counter any of my opinions with facts. I'd love to learn more about your specific solution.

Are you a customer of an MDM "tool"? Please feel free to comment, share your experiences - anonymously if desired. I want to see where the tools have worked for you, and where they have not.

**** DISCLAIMER ******
I Have not received vendor demonstrations from any of the vendors, all I have done is read through their web-sites and looked at what they "claim", and looked for supporting information on their sites to see if they've included people, business process, governance, compliance, integration, master data, and metadata management. Please take all this advice with a grain of salt. The purpose of this entry is to raise awareness of the customer base, and to have you ask questions of the vendors so that your business expectations can be set appropriately.

First off, MDM is a business process, what the vendors are really selling is a piece of Master Data Management, the mechanical part of integrating, cleansing, and quality checking the master data. Most do not offer Business Rules Integration, applied Data Mining, registries, web services (to meet SOA), EAI, EII, ETL/ELT, and RDBMS. I've listed (as a comment) to my last entry, a slew of vendor URL's that I've looked at. For reference, I'll list them again here:

www.hyperion.com/products/bi_platform/ core_data_integration/mdm_index.cfm

In no particular order. Now let's look at a few of their definitions to see just what they say MDM is to them:


Creating a master data environment enables organizations to provide a single source of truth around which enterprise systems can be synchronized...Reusable business rules clean, standardize and enhance data as it moves into the master reference file so all information is accurate.


IBM WebSphere Information Integration is the master data integration offering that delivers authoritative master data for any industry or business function…support the full master data lifecycle…IBM defines MDM as the set of disciplines, technologies, and solutions used to create and maintain consistent, complete, contextual and accurate business data for all stakeholders


KALIDO 8M is an enterprise-wide master data management software solution for harmonizing, storing and managing master data over time…The master data management software produces a master data warehouse from which "golden-copy" master data can be distributed to enterprise applications and business people throughout the organization

Now I've listed a few vendors, let's talk about the pros and cons of each vendor (taking from additional inormation on their web sites).
* They have an embedded data mining capability, and are best of breed for data mining (separate module)
* Embedded ETL engine (if you purchase this module)
* GUI integration (separate module)
* Reporting Engine (separate Module)
* They handle large scale data sets

Cons: (according to industry analyst groups)
* They are not best of breed when it comes to web-services
* They are not best of breed when it comes to answering SOA
* They are not best of breed when it comes to ETL
* They do not have EAI or EII embedded (or so it seems)
* They are a code driven solution
* They are not best of breed for Enterprise Master Metadata
* They do not appear to have a pure business rules processing/management engine like ILOG or JRULES
* Their web site does not provide enough surrounding information to describe their implementation methodology regarding the "data management" and "governance" processes needed to fully implement MDM.
* They want you to believe that "ETL" is Master Data Management, they are touting old-tools under a new skin without including Data Mining, Information Quality, Business Process Management, Compliance, Governance, EAI, and EII as a part of their solution.

* BIG COMPANY, Lots of information (freely available) on how they handle Governance, Data Management, Metadata and Enterprise Metadata Management.
* Implementation methodology documented (in overview form) for Master Data Management
* Include EAI (websphere / MQ series) as their solution
* Include Ascential Quality Stage for data quality/scrubbing and consolidation
* Include Ascential Meta Stage for Metadata
* Include Web Services (websphere) for Production and handling of master data
* Include Registries (websphere) for production and handling of master data

* In one diagram and implementation methodology claim that WebSphere and EAI is the entire solution for MDM, in another diagram and description, they describe additional needs for Quality Stage and Meta Stage
* They seem to be in conflict with themselves, no clear story as to the "Complete" vision of MDM. One document leads you to five or ten others that discuss governance, EII, EAI, Reference Tables, Information Quality, and so on. Different authors have different ideas as to what MDM really is.
* They would have you believe that Data mining, and business rules are not a part of Master Data Management (until you dig deeper into their implementation methodologies).
* Their solution seems to be: buy the entire SUITE of products from IBM, then buy all of their consulting to get the full and complete MDM solution. This is good if you have TONS of money to throw at the problem, and several (3 to 5 years) to solve the problem.
* No single tool seems to be the shining star for helping tackle the Master Data issues.

Kalido 8M:
* Easy integration
* Easy Logical Data Model Changes
* Contains an ETL tool
* Contains a data modeling tool
* Vendor says: it contains Data Quality, Profiling, ETL, EAI, EII, Web Services, and Business workflows.

* It is not best of breed (according to industry analyst groups) in: EII, ETL, EAI, Web Services, and Business Process Management (ILOG, JRULES).
* The vendor makes it seem like they are the SINGLE tool for the entire suite of MDM - yet they don't document how the rest of Data Management takes place.
* They are missing methodology definition, compliance, governance procedures, implementation best practices, definition of scalability into the 50TB+ range.

No single tool can be-all-end-all for the MDM architecture. Again, Master Data is one thing, Master Data Management encompasses master data, and data management. The entire MDM is an enterprise initiative involving people, process, compliance, governance, data, systems, and a variety of best of breed tool sets.

Buyer-be-ware, do your homework, interview your vendors. Sponsor your MDM initiative at the executive level, apply best practices for Project Management, SEI/CMM, follow standards for including your data warehouse as a part of your Master Data Management Initiative, and proceed. Soon, I'll have an MDM Vendor Scorecard available on our web-site at: http://www.MyersHolum.com

Are you a vendor? Please feel free to comment, to counter any of my opinions with facts. I'd love to learn more about your specific solution.

Are you a customer of an MDM "tool"? Please feel free to comment, share your experiences - anonymously if desired. I want to see where the tools have worked for you, and where they have not.

Thank-you kindly for your time,
Daniel Linstedt, CTO, Myers-Holum, Inc

Posted April 14, 2006 6:12 AM
Permalink | 2 Comments |
PREV 1 2 3 4

Search this blog
Categories ›
Archives ›
Recent Entries ›