Blog: Claudia Imhoff« February 2007 | Main | April 2007 » March 22, 2007Data Integration Does NOT Equal Data WarehouseI have read with interest a number of blogs and articles, and listened to a number of presentations about data integration technologies from a number of people. It seems to me that we need to clear the air regarding the role of data integration and what projects use it in our enterprises... Granted data integration technologies have been around for longer than data warehousing but they really first got our attention through the advent of data warehouse and data mart construction. At that time, we called the technologies that helped integrate disparate pieces of data and create snapshots of that data -- "ETL" -- for Extraction, Transformation, and Load. Perhaps we should have called these technologies EIL -- Extract, Integrate, and Load -- but that's another story. For good or bad, these projects were also the first ones to recognize a number of other, tangentially related data problems and initiatives -- improving the quality of the data being integrated, creating sets of integrated master or reference data (MDM), implementing repositories of current data for management and operational purposes (ODS), and so on. I think this is where the wheels began to fall off. Since data warehouse projects first recognized the need for solutions to these other problems, the implementers and vendors alike assumed that they must belong to the data warehouse environment and the data warehouse team took on responsibility for a much broader, more diverse set of projects. My comment to this expansion of the data warehouse team's responsibilities is "Back off!" None of these three initiatives has anything to do with strategic, tactical or operational BI implementations. Don't get me wrong -- They certainly make building a data warehouse or mart easier but you should not confuse easier construction with total control. That is a no-win situation. Data quality is a massive, enterprise-wide effort to bring quality processes and metrics into the entire IT environment -- not just the BI one. Its function should not be buried within the data warehouse function or it will never garner the proper enterprise level support mandatory for successful quality initiatives. Now let's look at MDM -- again many people come at MDM from a data warehouse perspective. They believe it's just another data integration project using many of the same techniques and technologies used in building our BI environments. Therefore, the leap of logic is to put it under the domain of the data warehouse function. Wrongo! Just like the data quality initiative, MDM is a massive enterprise-wide initiative that affects all aspects of IT -- not just the data warehouse. It has its own set of policies, procedures, applications, and technologies specific to integrating reference data. Once again, the BI environment is greatly benefited by having such an environment in place but so is the business transaction environment. Finally there is the operational data store (ODS). No wonder people don't understand it or -- worse -- define it as a staging area for the data warehouse. It is a current repository of operational data. If you have a mature MDM environment in place, then the ODS could contain only integrated current transactional data (an operational transaction data store or OTDS) and uses the current master data from the MDM environment. None of these is a data warehouse project. They should stand on their own two feet as independent initiatives that just happen to make data warehouse construction and BI in general much easier to implement. Just don't make the mistake of assuming that if the project deals with data integration in some fashion, then it must be a data warehouse project. Data integration has expanded WAY beyond data warehousing! Yours in BI success, Claudia |