Blog: Dan E. Linstedt« March 2006 | Main | May 2006 » April 29, 2006RFID Tracking and Nanotechnology for BIImagine, a smart RFID (radio frequency identifier tag) - in other words, not just one that bounces a signal that was received by a transmitter, but one that emanates a unique number (like a RIN (RFID identification number) - like a VIN only for RFID's. I realize they already have RTLS (real-time locator systems) with this technology embedded, but imagine it at a smaller scale. RTLS are currently very large (compared to RFID tags). How would this affect BI? What if it could use Nanotechnology and an embedded power source (like Nanotech reports is possible) to power a unique signal? What would happen to the supply chain for example? This entry is just a thought experiment. Well, I was thinking about my can of soda; yes I drink soda, and it's usually Pepsi, uhhh I mean Coke... Oh Well, I drink both - but anyhow, what if the can's paint could contain a RID and a modified RFID that generated signals? What if Coke/Pepsi cared about geographic location of the can? It is possible to send a satellite signal to each MRFID (modified RFID). This would have to be done using Nanotech, for an internal power source, and a transmitter would have to be embedded, or an encoding device. In other words, since the power source is usually too weak to respond to a satellite signal, it would have to record where it was (latitude and longitude). Every hour it would record it's lat/long in a DNA computing style by folding DNA elements. Yea, so what, what if Pepsi/Coke could track the can, and what difference would it make? If the cola manufacturer could figure out how to open a store closer to that location, they might have a boost in sales, or even a dispenser machine or they drive a truck up to the location to sell or promotionally give away their product; all in the sake of loyalty. But hey, we're talking tracking of the products that we purchase. This raises serious invasion of privacy concerns. I may not want my cola / pants / T-Shirt producer tracking my activities and locations. They'd quickly find out that I'm not worth tracking - moving around the country to present at IT meetings, and working at home most of the time. On the other hand, think of what Law enforcement could do from a business intelligence perspective - a criminal purchases a set of pants or a mask that's tagged with MRFID, and all of the sudden the FBI has a fix on their location... Hey maybe it's good for tracking wanted individuals. But we'll leave that alone. What I'm suggesting is the following: Care to share some ideas? Thoughts? April 28, 2006Meta Integration and CapabilitiesIf you haven't heard of this company, and you're looking for Master Metadata Management, you'll want to check them out. They are a privately held firm based in Silicon Valley, and are OEM'ing their core technology to just about every major vendor out there. That said, they do sell their complete package with all kinds of cool features for a reasonable cost. The business reasons to use their software? Read on - let's try to shed some light on this... If you're involved in Metadata Implementation, master metadata initiatives or in fact, data management, data administration, change management, of process flows, and other such items, then this tool can fill the bill. Meta Integration (http://www.MetaIntegration.com) is: a metadata repository that contains versioning, and over 50 different bridges for BOTH import / export of technical metadata. But it goes deeper than that. It manages process metadata, data model metadata, logical metadata, ETL / ELT flow metadata, BI Query and database view metadata, and so on. The bridges can connect to many different tools and pull in the metadata from all of these - and through a process called stitching can overlap the metadata together (where appropriate). From a business perspective their tool provides drill down and FULL HISTORY DATA LINEAGE from the source to the target, and can provide this lineage across the business, in a web-based browser to the business users. The repository can also handle registries of business metadata information, and allow the business users to maintain copies of the metadata (those who have privileges and access may upload the new metadata to Meta Integration as a new version). It is quite a tool, it can save time and money by providing PROFILING of the existing FLOW of the data set across the systems involved in data integration, this is a massive time-saving device, even if the tool is not used for metadata management. Here are a few of the connectivity options it has: ER-Win, Informatica, Ascential, Teradata, Oracle, DB2 UDB, SAP, PeopleSoft, Oracle Financials, SAS, and so on. We have built metadata management best practices with roles and responsibilities, business definitions, and execution strategies around their tool. We can really speed up or enhance your data management / data administration and metadata initiatives. Thanks, April 26, 2006SQLServer2005 and Integration ServicesI recently had the chance to work with the Beta program of SQLServer2005, and Integration services. Let me tell you, it was amazing. There are several highlights in the technology that I'd like to share. This entry will go through these items. There are only a couple of things that I'd like to see improved. My basic thoughts are: this will give customers who are tight on budget, or not moving huge volumes per load-cycle, an opportunity to really begin building out a true data warehousing/BI solution. SQLServer2005: * Database/Table partitioning. At long last, SQLServer has implemented range-partitioning. This will be a HUGE boost to performance on larger data sets, as long as the partitions are architected properly across the existing disk sets. If you're not familiar with partitioning, now's your chance to read up on it - do all the reading you can before you upgrade. Partitioning is not something you simply want to "turn-on". However, SQLQueries that use clustered indexes can finally avoid table scanning when partitions are invoked, why? next feature... Other types of partitioning (like hash, and sub-partitions) are said to be in the works for a future release. * Query Parallelism. Query parallelism has been implemented to take advantage of the database and table partitioning. The queries can and will be broken into their respective components to take advantage of the range partitions. Queries also take advantage of new performance of the SQLServer2005. * Performance. SQLServer2005 has extended its performance by optimizing to 8k block sizes, windows operating system hooks, and shares RAM caches with I/O buffers. They've streamlined the I/O from SQLServer2005 to the disk, and as I understand it - bypassed the page-file to bring data into RAM (although page-file is still used for locking and swapping). The performance of SQLServer2005 is much higher than 2000, and can process millions of rows per minute, even without clustering. * Clustering. There are new mechanisms for clustering SQLServer2005, and as I understand it, it can take shape of multiple clusters on a network of nodes (SMP under an MPP look and feel). This brings new meaning to low-cost high power computing. This notion is still challenging to setup, and make happen with SQLServer2005, but will get easier as Microsoft continues to release their product. Integration Services: * Tons of transformations. Comes out of the box - lots of transformations, as I mentioned just a minute ago, one of them is a neural network. It also houses the parallel processing properties for splitting data (in parallel) down multiple transformation paths. They also have the notion of an error control and error target, which you can control the rules for. * Workflow style transformation - all transformations are workflow style, meaning they are "runnable designs". You can see the row count increase as the data passes through the transformations along the way. Improvements I'd like to see: * Import/Export of transformation logic via XML. There apparently is no import/export of the transformation designs, therefore there is no current capability for sharing the transformation logic across servers. * Reusable transformation "workflows" - they do not have the concept of reusability down yet. I've heard they are working on this. * Biggest one: Connectivity options. Today, if you want Oracle Data, DB2 UDB, DB2 AS/400, Teradata, Sybase, MySQL, or Informix you have three options: use BRIDGE software provided by the other vendor (which is slow), use ODBC and OLE*DB drivers from Microsoft, some are fast, some are slow, or write your own for native connectivity using their SDK. Bottom line: if you're a Microsoft only shop, SQLServer2005 and Integration services is a HUGE boost to productivity over DTS and SQLServer2000 - well worth the upgrade cost. If you're a small shop that deals with mostly flat-file transfers from other databases, it will help as well. If you’re into volume - and native connectivity, I suggest looking elsewhere, and letting Microsoft grow into this space. More to come, did I miss something? Post a reply! Cheers, April 21, 2006MDM: Local Copies versus Localized CopiesWhat do I mean by Local versus Localized copies of Master Data? In this entry I will try to explain my definition for each. This is a short entry, and as always, comments are welcome. I did not say "local copies" should not exist, what I did suggest is that no "localized" copies should exist. If I did actually refer to "no local copies" then shame on me, I made a mistake. Local Copies of the data replicated Master Lists across geographically dispersed regions for fast access times, this is fine. Localized copies of the data I hope this clears things up. April 15, 2006Welcome to BI VendorsThis is a new category for me, typically I don't focus on "vendor specific" items, but in this category I will begin doing so. I may blog on Microsoft SQLServer2005 and it's intelligence services, or Teradata, Netezza, DatAllegro, Ipedo, MetaMatrix, Oracle, DB2 UDB, IBM Ascential, IBM Websphere or any number of other vendors that I think are doing something interesting in the market space as it pertains to Business Intelligence. All entries here will be vendor specific. Feel free to send me questions about specific vendors. Thank-you! Hope to see you out here. Demystifying SoR (System of Record) and MDMWhen Claudia Imhoff and Shawn Rogers and I got together for lunch the other day, we discussed this notion of SoR - it's a very interesting take. SoR has long been held as a single definition, and has been defined as residing in the source systems. Today, there are multiple definitions (3 to be exact) of SoR. Particularly since MDM evokes new notions of what SoR means to the business, as does a compliant and auditable enterprise warehouse. In this entry I'll walk through the multiple definitions of SoR. In my MDM night course in August at TDWI (2006, San Diego) I'll be discussing many of these things. Let's start with the three types of definitions: The first definition is the widely accepted definition, based on the origination point of data (source systems). The second definition is not so well known, but for those of you with a normalized and AUDITABLE enterprise warehouse, you'll be happy to see the second definition. The third definition is based on the ever-present "single view of today's enterprise". Many people and vendors call this: "Single Version of the Truth", but truth is subjective to EACH individual user, so there's no possible way (in the purist sense) that truth actually exists!! SoR Definition 1: The data that exists in the source system, in other words, where the data is entered, or originates for the first time. It contains a record of entry or creation or origination for the information it houses. Hopefully this system is auditable, and compliant. The data might not be "clean, quality checked, or integrated" unless it sits in some sort of master list (master data set). Most of this data is shipped across the enterprise to other systems, and to an enterprise data warehouse for integration. SoR Definition 2: This data resides in a NORMALIZED enterprise data warehouse, and is auditable and compliant. This data is RAW Data that is INTEGRATED by business keys (See Data Vault Data Modeling). This data is NOT cleansed, altered, modified, or quality checked - but is auditable, meaning that an auditor can trace the data back to the source system from whence it came. The integration point is in fact the master-key (business key) that horizontally integrates data across the enterprise. Duplicates reside within this data set, dirty data resides within this data set, the data that resides in the normalized warehouse is captured by type of data and rate of change, and is the only place in the entire enterprise where this integrated version of uncleansed raw data exists. This data is often used by business to FIND broken business processes. If you don't have source system raw data in an integrated fashion, it will be harder to spot the trends, and the broken business processes will continue to go un-noticed. I've seen save companies millions of dollars. This is a RAW system of record, it is the only place in the enterprise where this integrated raw version exists. SoR definition 3: Master Data, or Conformed Dimensions - data that has been cleansed, quality checked, duplicates removed - and is seen and used by the business as "single version of today's truth", it is covered by master metadata, and is understood by the organization to have meaning. In the master data set there is only a SINGLE copy of each (customer / part / work order / supplier etc..) item. This is an SoR by business standards because it represents value to the business in eliminating duplicates and understanding how the business looks "TODAY". It's a snapshot of the current consistent, and quality cleansed information that feeds the rest of the source systems. So, as you can see - there is value to each type of System-of-Record. So what then is exactly, a system of record itself? I would tend to suggest that a System of record has the following characteristics: If you have questions or comments I'd love to hear them, please post them below. Thank-you April 14, 2006MDM: Deciphering Vendor HypeI've been blogging about MDM for a while now, and in my last entry I defined what Master Data and Master Metadata should be. By the way, both of those definitions along with the entry has been certified by Bill Inmon, and Clive Finkelstein as the standard definitions for MDM. In my sense of adventure I decided to take a look at 10 different vendors, what they claim MDM to be, how they define it (if they define it), and how they claim to implement it. What I discovered is not that shocking, MDM SOFTWARE: BUYER BE-WARE!! WARNING: THIS ENTRY IS NOT FOR THE FAINT OF HEART, IT IS MY BIASED OPINION ON WHAT MDM REALLY IS VERSUS WHAT VENDORS CLAIM IT TO BE. I'm not saying that vendors are all wrong or bad, quite the contrary - I'm saying that while Master data vendors have good software and provide ROI, not all solutions are built to meet your needs, and the marketing hype would have you believe otherwise. Are you a vendor? Please feel free to comment, to counter any of my opinions with facts. I'd love to learn more about your specific solution. Are you a customer of an MDM "tool"? Please feel free to comment, share your experiences - anonymously if desired. I want to see where the tools have worked for you, and where they have not. **** DISCLAIMER ****** First off, MDM is a business process, what the vendors are really selling is a piece of Master Data Management, the mechanical part of integrating, cleansing, and quality checking the master data. Most do not offer Business Rules Integration, applied Data Mining, registries, web services (to meet SOA), EAI, EII, ETL/ELT, and RDBMS. I've listed (as a comment) to my last entry, a slew of vendor URL's that I've looked at. For reference, I'll list them again here: LIST OF VENDORS SELLING MDM SOFTWARE SOLUTIONS: In no particular order. Now let's look at a few of their definitions to see just what they say MDM is to them: SAS: Creating a master data environment enables organizations to provide a single source of truth around which enterprise systems can be synchronized...Reusable business rules clean, standardize and enhance data as it moves into the master reference file so all information is accurate. IBM: IBM WebSphere Information Integration is the master data integration offering that delivers authoritative master data for any industry or business function…support the full master data lifecycle…IBM defines MDM as the set of disciplines, technologies, and solutions used to create and maintain consistent, complete, contextual and accurate business data for all stakeholders Kalido: KALIDO 8M is an enterprise-wide master data management software solution for harmonizing, storing and managing master data over time…The master data management software produces a master data warehouse from which "golden-copy" master data can be distributed to enterprise applications and business people throughout the organization Now I've listed a few vendors, let's talk about the pros and cons of each vendor (taking from additional inormation on their web sites). Cons: (according to industry analyst groups) IBM: Cons: Kalido 8M: Cons: No single tool can be-all-end-all for the MDM architecture. Again, Master Data is one thing, Master Data Management encompasses master data, and data management. The entire MDM is an enterprise initiative involving people, process, compliance, governance, data, systems, and a variety of best of breed tool sets. Buyer-be-ware, do your homework, interview your vendors. Sponsor your MDM initiative at the executive level, apply best practices for Project Management, SEI/CMM, follow standards for including your data warehouse as a part of your Master Data Management Initiative, and proceed. Soon, I'll have an MDM Vendor Scorecard available on our web-site at: http://www.MyersHolum.com Are you a vendor? Please feel free to comment, to counter any of my opinions with facts. I'd love to learn more about your specific solution. Are you a customer of an MDM "tool"? Please feel free to comment, share your experiences - anonymously if desired. I want to see where the tools have worked for you, and where they have not. Thank-you kindly for your time, April 7, 2006Clarifying MDM - Setting StandardsI've had a lot of great feedback on the MDM blogs that I've been adding lately, and one kind individual sent me an email asking for a couple of things, including a definition, a practical criteria, a practical taxonomy, and to keep the picture simple enough for organizations to use. In this entry I will do my best to offer my *opinion* on the subject, I am open to comments, corrections, and thoughts from all of you - again, this will be only my opinion. Please note that my opinion is biased towards compliance, accountability of data, traceability, and accountability of business users and arises from my experiences with SEI/CMM, PMP, Six Sigma, TQM, BPR, Lean Initiatives and Cycle Time Reduction. I can't wait to hear back from you. + First, a definition of master data that "holds true" across all scenarios. Ok, I'll take a crack at it - in my last blog entry I showed multiple definitions on Master Data Management from various friends. Here are my thoughts on Master Data itself: Master Data - Information housed in a single, consistent, quality cleansed reference table; located at a single location. All elements except the surrogate key in the master data set are defined by Master Metadata at a global (enterprise, or sometimes industry wide) level. Master Data is not localized (with multiple copies), does not contain duplicates, and may reside anywhere in the corporation, but is exposed through SOA to the enterprise and possibly outside the enterprise. Master Data and Master Metadata are both delivered through a service in the SOA, Master Metadata is exposed to the service layer for interrogation of meaning of the Master Data. In other words, Master Data are reference tables that exist in a single copy somewhere in the enterprise, but with enterprise level visibility. Master Metadata - Information describing the elements / Attributes, utilization of those attributes, which make up the Master Data Structure. These metadata are agreed upon by the business users to be universal in definition. However if localized utilization descriptions are required, then the executive staff must buy-off on the changes and overrides, and be capable of defining the impact of the change to the bottom line (financial figures). Master Metadata is enabled to be edited and maintained by business users in accordance with Governance policies created at the enterprise level. Master Metadata is also exposed through SOA to the enterprise, and possibly outside the enterprise. Master Metadata is delivered every time Master Data is delivered - they are co-joined in order to increase the level of understanding and provide context at the time of interrogation of the Master Data Set. * By the way, Bob Seiner and I had a wonderful discussion yesterday about this topic, I owe quite a bit of this information to Bob's thought processes. He and I agree that Master Data and Master Metadata require Governance policies and procedures, along with Data Management approaches, and Data Administration roles and responsibilities. It will be nearly impossible to have a successful Enterprise Master Data / Master Metadata Management effort without the Governance embedded in the processes. You can see my first article on Governance here, on B-Eye, Bill Inmon's Newsletter. + Second, practical criteria for determining whether a piece of information "qualifies" as master data. This one is a bit harder to qualify, my opinion here is based on my personal experience 12 years ago in a government organization where "master lists" where a huge part of lean initiatives, cycle time reduction, and business process understanding. In order to understand the deterministics of Master Data we should remember that frequently there exist multiple copies of this data across our source systems when we start our efforts. We should also remember that the bottom line is: it's up to the business users, NOT I.T. to fully approve master data sets and master "systems". Deterministics for Master Data Qualification: If Master Data is left in hierarchical format with a number of business keys embedded, chances are that over time the master list will Lose value (quickly, possibly on an exponential scale), because editing meaning, and maintaining definitive "lists" of individual master data sets will become impossible. For example a bill of material containing work-order numbers, contract numbers, and parts numbers MUST be broken into separate components, each containing their own master data set: Work Orders Master, Contracts Master, Parts Master, etc.. Then you can re-examine the locality of where EACH of these master data sets come from, where they need to reside, and what parts of the business they affect - along with the individual definitions of Master Metadata. You can also then examine the nature of the business process and really begin to understand where these "business keys" originate, how they change in the business processes, who uses which keys, and whom changes the keys as they course through the business. Again, Master Data is all about BPR and Lean Initiatives (overhead reduction, quality improvement, cycle time reduction). There is a lot more I can say about this, and I will. I have a book on the Data Vault data modeling architecture that will be published here on B-Eye Network soon, the first two chapters are dedicated to the Business, and all the notions I'm discussing here, along with the impact of Data Architecture, and the value of clear definitions. + Third, a practical taxonomy (or matrix) of classes of data, for which "master data" is just one class...what are all the other classes of data...and are their inter-relations truly hierarchical, or are they matrixed (or "multi-matrixed")? Ok we've come to the third of three questions. I began (in the last section) to discuss the normalization of hierarchical data structures according to true business keys. The business users are a great source of knowledge on the "keys" they use to track, file, and relate to the data, as are the applications they use on a daily basis. Business keys are NOT necessarily surrogate keys, or sequence numbers (although in some cases, these sequences and surrogates have been presented to the business for utilization). In order to understand the taxonomy or matrix of master data / master metadata, we need to first understand two things: WBS (work breakdown structure), and OBS (organizational breakdown structure). Each of these structures must then be cross-hatched to create a lattice of where the work overlays the organization. Below the lattice (in each cell), are the business processes - which must be outlined, highlighted, and understood. The business processes dictate the flow of data, the origination points, and essentially the "master systems", or what should be the master systems anyhow... The taxonomy of the data follows the taxonomy of the business, and the business processes. For example: I have an organizational structure: CEO->(VP)->(Director)->Management->Line Manager->Worker Let's say that I have the following parts of the organization: Sales, Contracts, Finance, Procurement, HR, Planning, and Manufacturing. Let's say that this organization sells very large widgets that cost billions of dollars. Let's say we discover that the business process flow is as follows: This is a huge example of a business process re-engineering effort I was involved in, we created taxonomies of master data starting with "as-contracted, as-financed, as-manufactured, as-built, as-planned, etc..." each of these had different systems identified where the data flowed, was constructed and was processed. One of the things the business discovered was that EACH silo of business was using their "own" version of contract number, which made it nearly impossible to trace the entire contract through the full-cycle of the business. Our data warehouse integrated all the contract numbers, and showed where they changed, what systems changed them, and demonstrated the "consistent patterns of change" across the business keys. Our taxonomy flowed from there - understanding the business processes (even across sectors, and across business units) is absolutely VITAL to creating taxonomies of Master Data. Discover the Business Keys, and the business key FLOW, and the taxonomy of Master Data will become clear quite quickly. + Trying to keep the picture simple enough, or at least organized enough, so everyone (top-to-bottom in the enterprise) can understand...and we can develop an effective information management strategy and set of approaches. We ended up with a single PowerPoint presentation consisting of 10 slides for the executive staff and board of directors at this corporation (consisting at that time of 7 sectors, 23 companies and 150,000 employees) which showed just this one portion of business from top to bottom. Using the WBS, OBS, and BPS (business process flow), we showed the taxonomy of the business, and therefore the taxonomy of the Master Data origination points. We then organized the Master Data to "test" the hypothesis: that in fact, what were supposed to be the "master systems" actually were responsible for producing the master data. In some cases we found that the planning system was actually producing new contract numbers when the business had told us "that is the contracts systems' job, planning should never be producing new contract numbers." We found the data problem, were able to demonstrate when in the business process this was happening, and thus the business was able to fix the problem by issuing change requests to both systems - saving time and money in the long run. Whew, sorry about the long explanation, but I felt it necessary. If you have thoughts, or comments - I'd love to hear from you. This process works at ALL levels of data integration for master data, even government. More questions? Send them in... April 6, 2006A tribute to my mentorsI wouldn't be here, blogging for Bill today - writing articles, and teaching at conferences if it weren't for my various mentors along the way. All of whom have helped me tremendously. I want to stop for a moment, and give credit where credit is due. There are many many individuals who have helped me get to where I am today, and please forgive me if I forget to mention your name, drop me a reply or a comment - on what we did together, so that it is included as a part of this tribute. I needed to stop and say: THANK-YOU to all those who are helping me on my journey. First, thank-you to Claudia Imhoff. She has been a true inspiration throughout these years. She was the first to review the Data Vault Data Modeling architecture over 5 years ago, and stood up and said: "This is exactly what Bill and I have been looking for, for over 10 years." Of course she's helped me with my articles on www.TDAN.com, B-Eye-Network, she got me involved with TDWI, and encouraged me to write. She also is responsible for introducing me to Bill Inmon. Bill Inmon, back in 1999 I was unknown - just an upstart with a lot of green - of course I still have a lot of green, and a lot to learn (always learning from all my mentors). Bill not only reviewed the Data Vault architecture, but helped me learn the true meaning of Data Warehousing. I've had the pleasure of visiting Bill at his house several times, and on numerous occasions he's graciously given me time to ask un-informed questions. Bill then let me write for his original Bill Inmon Newsletter, on Nanotechnology of all things, occasionally he entertained my articles on Data Warehousing, Data Integration and Business Intelligence. Bill and I do not always agree on what is and what isn't in the world of DW, BI and data integration - but that doesn't stop a wonderful friendship. Bob Terdeman. I saw Bob speak at a TDWI conference in 1999, and was fascinated by his attention to detail. Back then he was the CTO for EMC. Bob is a wonderful individual and always had time to help me with the ideas of what makes a very large data store (be it a warehouse or otherwise) successful. He reviewed my ideas, and helped me discover new things about the world of smart storage and the business of presenting. Tammy Henderson and I worked together as employees of a very large government contracting corporation. Tammy and I but heads all the time, but in a good way - she founded the center of excellence at this particular corporation, and gave me resolve to really work on the compliance, accountability, project planning tracking and oversight portions of the business. Tammy showed me around the company and introduced me to the business users - giving me access and insight to business which I would otherwise never have experienced. She challenged my ideas. Steve Koons. Another individual at this particular organization, showed me the ropes and handled all of the classified interactions with our data warehousing effort. He took our data and our integration processes to a whole new level, clearing the way for our data model and our best practices methodology to be utilized across the organization. Steve helped me formulate the mechanisms for turning our group of IT into a profit center for our business users. He assisted us in getting more contracts than we could possibly fulfill. Hans Hultgren. Hans asked for my help in formulating a Masters Degree program for DW/BI at Daniels College of Business, Denver University. He graciously asked me to sit in on the academic advisory board, and help him recruit many of my mentors and friends in the industry. He has given me the opportunity to give back to the community which has provided me with so much success. I thoroughly enjoy guest lecturing at DU, and our world-class board members shows it: Bill Inmon, Lowell Fryman, Hans Hultgren, Claudia Imhoff, (me), Tammy Henderson, Maureen Clarry, Kent Graziano, and a few others. Kent Graziano, speaking of which - has assisted me in numerous ways. He's currently assisting in the editing and co-authoring of my Data Vault Data Modeling book, due out on B-Eye Network in PDF format shortly. He's been a great guide, and advocate of the Data Vault architecture since the beginning. He's also helped me with the Unversal Data Models that he, Bill and Len Silverston have made famous. Grady Booch. Grady and I have had several conversations about the world of object oriented programming, UML class design, architecture, and where the Data Vault data modeling fits in. Grady has assisted me with many different facets of understanding in the business requirements world. Richard Hackathorn. Richard and I go way back. Claudia introduced me to Richard many years ago, and I had the opportunity to discuss cycle time reduction, business process "squeeze" in time, and a range of topics relating to business requirements, the market space and developing interests. Stephen Brobst and Kim Stanick. I saw Stephen teach at TDWI on high performance data warehousing techniques. Soon after I was asking him questions about parallelism, partitioning, and of course Teradata. He also spent some time reviewing the Data Vault data modeling architecture. Needless to say, he and Kim Stanick got me involved in the "Friends of Teradata Network", of which I am most greatful to be a part of. Clive Finkelstein, one of the originators with John Zachman of the Zachman Framework - helped me see clearly on the nature of the framework, differences between architecture and implementation, and what some of the components in the framework were. Clive has also reviewed the Data Vault Data Modeling architecture, and has given his approval for it as a good architecture for implementation of one of the cells within the Zachman framework. Wayne Eckerson. Wayne first gave me the opportunity to teach at TDWI, and has repeatedly helped me with slide edits, comments, documentation, teaching styles and improvements. Wayne has been a true inspiration for me in the speaker’s circuit world. Wayne and Richard Winter gave me the opportunity to teach VLDW at TDWI in the beginning, needless to say, Richard Winter has also provided me with valuable feedback and good friendship on what makes a good VLDW - and how they work. Lately I've been helped tremendously by Dave Wells - he's also provided wonderful feedback throughout the years, and continues to do so. All the folks at TDWI deserve a huge thank-you. Bob Seiner - a wonderful friend and mentor. Bob has helped me with my publications along the way - edits, changes, reviews, and of course the articles on http://www.TDAN.com He's always given me good feedback and great content, lately he's been assisting me with the Knowledge Management, Metadata Management, and Master Data Management components. Calla Knopman - the one friend who stuck by me during the development phases of the Data Vault way back in 1995 through 2000. She pushed me to write about it, share it, finish the thought processes. She's always been a good sounding board for me, and continues to be one of the best friends I have. Shawn Rogers, Ron Powell, Ketherine Drewek, and all the good people at B-Eye Network. They continue to provide me with world-class opportunity to publish, write, and espouse my opinions. They are a huge part of my publishing success today. There are many, many more people who've helped me along the way, including: Dan Chatten, Jeff Hild, Barb Darrow, Bill Wesley Brown, James O'Bannon, Peter Aiken, David Marco, Dan Sullivan, Harm Van Der Lek, Chris Busch, Juan Jose Van Der Linden, Scott Crownover, Randy Law, Kevin Goodfellow, Ben Isenhour, Matt Duncan, Kim Dossey, Frank Sparacino, Henry Morris, Jill Dyche, Jeff Jonas, and all my friends at different vendors. Last but not least, I must thank my current employer Myers Holum, Inc for supporting me - allowing me to continue my writing career, while earning a living. Thank-you for allowing me to provide you with this tribute to my life journey. April 3, 2006Understanding MDM: Rough Seas and Muddy WaterI recently blogged on this subject, and then I asked my friends in the industry what they thought MDM meant to them. Not surprisingly, I received a number of different answers. I'll post their answers in this entry, and then discuss what I believe MDM to be. David Loshin recently wrote a new article on MDM and the basics of MDM, some of which I agree with, other parts I disagree with - I'll post here what my thoughts are on this subject as well. In the next blog entry we'll dive a little deeper and work at defining specifics for Master Data. Bill Inmon on MDM: I have always thought of MDM as reference tables. You can extend the definition a little and include other kinds of metadata I suppose. But I think people are stretching MDM way out of proportion. Stephen Brobst: I would extend MDM beyond just dimensional data. Claudia Imhoff: Master Data Management should be thought of as a process, a business process, not as a tool set or something that a vendor sells. Master Data exists as reference tables throughout the enterprise, be-it on source systems, data marts, data warehouses, staging areas, or operational data stores. It doesn't matter where the tables reside, just that they do reside within the organization as a centralized master list of information containing both the KEY and the DESCRIPTIVE components of what that key represent today. David Loshin: Master data objects are those core business objects that are used in the different applications across the organization, along with their associated metadata, attributes, definitions, roles, connections and taxonomies. Master data objects are those "things" that we care about - the things that are logged in our transaction systems, measured and reported on in our reporting systems, and analyzed in our analytical systems. Common examples of master data include: Ok, I can see where all the confusion comes from. The father of data warehousing and my good friend Bill Inmon has said quite plainly that MDM is nothing more than REFERENCE TABLES, which can include references to anything you want. Centralized information, centralized metadata, and centralized definitions - all quality checked, integrated and providing singularity for data attributes by their corresponding key. MDM is NOT: MDM IS: Yes, MDM includes Governance, Data Administration, and parts of Data Management. It also includes the data itself as the "master list" of reference material. We've had these concepts for years, especially back on the mainframes. I remember 12 to 15 years ago working with "master parts lists, master contracts lists, master work-orders lists, master employees lists", and of course "master systems" that are supposed to dictate the creation and management of the data set, dispersing it to "child systems" - of course this is all part of what lead to integration mass confusion (nightmarish over-night synchronization processes that were supposed to keep the master lists in synch across the different systems). I just read a Knightsbridge paper that stated that "localized copies of master data are OK." I beg to differ. Today's networks are fast enough and global definitions (metadata synchronization) is better done these days, good enough to create and locate a SINGLE MDM data source for reference information. The point is to avoid the future "fights" over who's right and who's wrong with their "copies" of the data sets. Integration NOT DIS-INTEGRATION is what we are trying to achieve with MDM. If you need to warehouse your Master Lists, then do so, but fight the good fight to consolidate the metadata definitions, and consolidate the data sets across the same semantically defined layers (see Data Vault Data Modeling). At the end of your MDM initiative you should be able to identify true MASTER data stores that are not localized, that can be utilized horizontally and vertically across your organization without fear. Quality data and single copies of information are what we want. Utilizing the Data Vault Data Modeling architectures, you can easily encompass the MDM definition of all four individuals above. In my next blog entry I'll go through a few things like creating a single definition for MDM, How are you implementing MDM? Which of these definitions matches your business? Come to TDWI in August, San Diego, share your MDM experience in my MDM night course. Thanks, |