Blog: Dan E. Linstedt« September 2006 | Main | November 2006 » October 28, 2006MDaaS - Master Data as a ServiceHave you ever thought about Master Data as a Service? Well, some companies are thinking this way. If this happens, a major paradigm shift will occur. This entry looks at MDaaS - and it's possibilities for changing the way we do business entirely. Who knows, maybe EII vendors could play in this space very very well. After all, they are the ones with Niche technology that really fits this space to begin with. I'll blog on Master Data, the hype - the shenanigans, and the fears in my next entry. For now, realize that master data is important to the enterprise for many reasons. Master Data means a number of things to a number of people, I'm no exception - master data to me are literally the keys to standardized kingdom. The cycle repeats itself in everything we build, first there's a new idea, then everyone implements their interpretation of this new idea as a gain, a benefit, the fact that they are different seemingly gives them "an-edge." Then some of these edges fail, best practices and lessons learned emerge, and then all the smart industry implementers begin to follow what really works - common ground, standardization, convergence of thoughts - then the real players emerge. This is what is happening with Master Data Management solutions. However I think there are a couple of companies who are thought leaders in this space who are making a difference today. One (of course) is my company, Myers-Holum, Inc. Another is IBM Professional Services, another is Cap-Gemini, Intelligent Solutions, Object River, Kalido, and of course my respective peers (here on B-Eye Network) like David Loshin who write about MDM implementations. But something caught my eye the other day, Cap-Gemini was saying that as a best practice, they take their customers' master data and house it off-site, so that the customer is not impacted by the machines, hardware, extra support for master data. They enable the master-data set with web-services for their customer, and they surround it with Enterprise Metadata (or my term: Master Metadata). When I first saw this, I thought: no, not possible that a company would release their intellectual capital (which master data really is like golden keys to a kingdom when implemented properly), and allow it to be stored off-site. Then I started thinking about differentiation and then about standardization. I realized very quickly the same thing applies to master data that applies to SaaS - standardization of particular parts, geographical locations, customers, and so on - as long as the data can be "secured", treated with integrity, delivered on time, standardized and made available - why not put it out as a service? Data Warehouses as a Service never really took off, and I'm not sure it ever will (maybe one day), but MD as a service, that's different - why? It's operational data when we look at it, we deal with transactional based information, now information - small numbers of rows going through a web-service request. What a gold mine! Now imagine you get common data from Dunn & Bradstreet, you clean it up, and you standardize it over a web-service request, then you get common local census data (like the post-office does), and address data, and you intermix these as master data sets, then release them as MDaaS, you've got an interesting solution for the industry. Suppose you load company profiles, SIC codes, and other public information - what happens? You can serve many different customers at the same time with the same data (master data that is standardized). A "virtually compressed" image of the data, because you don't have to store different copies for each implementation that is built. Voilla - keeping costs down for the customers of the service, the master data is updated, and pushed when changed to the customers who have signed an SLA with you. I think Cap-Gemini takes this one-step further, by offering MDaaS for ALL the data sets the customer has, in agreement to keep certain company information confidential. Of course if Cap-Gemini or any other MDaaS system is compromised there will be a lot of stirr in the ethics community, and compliance will become an issue. Cap-Gemini must abide by in-sourcing, and different country rules, particularly with a global enterprise. I think transactional Master Data as a Service is one wave of the future that I would ride. It's potentially a huge wave if it can be implemented properly, and security concerns can be addressed with encryption, compression, and proper data access. After all, the true nature of SOA is services, regardless of whether or not they are in-house or out-of-house, the true nature of Master Data is consolidation and standardization, regardless of company utilizing that information. If you have any thoughts on why this would work, or wouldn't work - or what you think it would take to make it successful, I'd love to hear from you. Cheers for now, October 19, 2006Oracle and Sunopsis - my Technical Half CentEveryone's blogged on this and quite a few more have offered their two cents on the topic. After reading about all of this, I figured maybe my voice could add a half-cent value to the noise out there. This is my opinion of what this acquisition means, and what Oracle really needs to do to solve their ailing sales issues in the EDW / ADW space. I recently wrote an article on Bill's newsletter about Oracle and Clustering that explains this core issue in a round-about manner. This entry is more direct. Many companies are finding out that volume is a huge deal for them, as is real-time loads into an already overloaded and overwhelmed RDBMS engine. The problem is that companies are left with several key decisions to make: 1. Should we invest in a hardware solution? Is a single big box better than many small boxes? we want to consolidate... lower cost, etc... Big boxes are expensive. Then there's the everlasting questions: It's all a shell-game, pushing the volume off legacy or mainframe, or collecting the volume off the web or gaining the volume from customer growth are all problems that lead to the same conclusion: Upgrade and change the architecture. But which part? What changes? And what does Oracle & Sunopsis have to do with this? Ok - so that's what I've been hearing in the market space (amongst other things). Oracle has been under-fire from Teradata, Netezza, IBM, DatAllegro, Sand, and now Microsoft with SQLServer2005. Oracle in the context of OLTP (transaction processing) is an awesome database, and clustering for OLTP works wonders, especially in large clustered systems - but it drives up cost of support. Oracle has had problem after problem with their Oracle Warehouse Builder (so many have discussed this, I don't feel I should give it time) - ranging from Metadata management, to user interfaces, to lack of complex transformation ability. But the real-killer to using Oracle as an EDW / ADW has been Volume issues. Processing Volume in the context of history (already with volume in the database - I'm not talking about loading an empty DB here), coupled with complex Transformation requirements has basically put Oracle out on a limb. In my opinion, the problem stems from their clustering, rather than an MPP solution. Now, it's been said that they never had a "world-class" data integration / data migration ETL / ELT (ETLT) engine - so what did they do? They bought an up-and-comer who had a strong relationship with Teradata. Sunopsis metadata leaves a lot to be desired, but being a young engine in the market place, they effectively leveraged their ability to generate highly specific Teradata SQL code to run transformations in the database (as ELT) to gain market share and visibility; particularly among the volume crowd in big systems. If the same customer were to put that level of volume on Oracle, then try to run ELT through Sunopsis on it, it would be Oracle's engine that would choke. Now keep this in mind: Oracle EDW on a SINGLE BIG IRON BOX (without clustering) hums along just fine at a multi-terabyte level, and probably would meet this need just fine. Also, big iron keeps getting bigger and faster (Single SMP, or Mainframe with LPAR) we're talking 64 to 128 CPU's with 64 GB to 248 GB of RAM... I'm not talking about 500GB of data here, I'm talking about 50 TB or more - of historical, and incoming information at 2TB a day - something that ELT is necessary for due to volume constraints. Bottom line: I'm not going to speculate as to why Oracle thought buying Sunopsis would save it's EDW market, or gain market share for it... But I will say: Sunopsis is a life-preserver the size of an individual, when the whole ocean liner appears to be sinking. Oracle needs to wake up! They MUST REWRITE THEIR DATA WAREHOUSE CORE ENGINE to be an MPP enabled solution, drop this cluster stuff (except for OLTP) - they need two separate core engines, one for data warehousing and one for transaction processing. Again, I'm talking enterprise class engines where this makes a difference - then and only then will the engine begin to support the volumes needed (at the performance needed) to make the "T" in ELT work properly, at that point they need to add a host of OLAP functions to operate within SQL (some they already have), and everything needs to run in parallel ALL the time on the enterprise platform. When this has happened, they could be a formidable force to be reckoned with - and they just might be able to regain a foothold by producing a single suite of products (like IBM). They need to add metadata foundations to the Sunopsis product, add more MPP like functionality and parallel processing ability to the core-engine, and so on. But I ramble. I realize this is fairly one-sided, but after years of Very Large Data Warehousing experience at executive level projects, I feel it had to be said. I'd love to hear any counter-view points, or other thoughts you might have, If you work for Oracle - and are willing to share, let me know... Thanks, October 18, 2006Between Inmon and Kimball - Data Modeling Part 2This is a long overdue entry from way back when. I've recently received some good questions on my original entry for this discussion on a data modeling technique that sits squarely between Inmon and Kimball, or should I say, squarely between 3NF and star schema. This technique is called the Data Vault Data Modeling architecture. In this entry we'll dive a little deeper and answer some of the questions posted on the last entry. You can find my first entry (and the questions) here. You can find my web site on the Data Vault - here. Are there healthcare and/or insurance companies that have successfully implemented an enterprise or patient care data vault? (Question from a reader). Yes. Blue-Cross Blue-Shield (Wellpoint in Saint Louis, MO) is just such a company, and they have (last I'd heard) 18TB or more in their Data Vault on an IBM DB2 UDB platform, using Informatica as an ETL tool, and IBM Websphere for EAI purposes. What are some of the Best Practices around a hybrid hub-and-spoke approach (i.e. data vault with supporting data marts)? Please keep in mind that standard enterprise data integration / data warehousing best practices always apply, no matter which data modeling technique you're choosing. Some of the best practices include the following: Is there a matrix for when do you which data warehouse modeling technique? I know there is a lot around Inmon vs Kimball but nothing which includes data vault. What business questions need to be answered to determine the technique? Business / Technical questions that will lead you to the Data Vault: There are quite a few more, this is just a start. If you have some of these issues, and wish to share anonymously, that would be wonderful. I'm looking for alternative and opposing views as well. Thank-you kindly, Feel free to ask more questions, I'll try to post the answers. October 17, 2006ELT Part 2 - Don't lose sight of MetadataI've just blogged on ETL and ELT - the basics of some of the differences. While it starts with volumes of data within the system, it doesn't end there. While the power of SQL in the DBMS hardware can really show it's muscle, it's the metadata at the end of the day that makes all the difference in the maintenance world down the road. Knowing SQL is critical, understanding each RDBMS specific set of Functionality is also critical, but more than that - building the solution within a tool that can capture the metadata is vital to the long-term success of handling huge volumes. Why is metadata so important? Because coding in SQL for ELT can quickly get out of hand, especially on large scale projects where hundreds of jobs are necessary to perform the proper transformation. Also, to help in speeding time to delivery, as well as taking advantage of best practices and lessons learned in the industry. If we aren't allowed to leverage our knowledge in such a manner that is fitting to the metadata, then why use a tool at all? ETLT (as I've described in articles 3 years ago) is a requirement - the ability to do both ELT and ETL depending on the needs of the design. However, designing the transformation or data flow in a single stream, then pushing that into SQL within the database, or even within the RDBMS engine, as SSIS (from Microsoft, SQLServer2005 does) is a necessary task for future use. See my articles on Teradata Magazine: A Symphony of Two, Additional thoughts #1, Additional Thoughts Part 2, In order for ELT to work, the power and functionality must be there within the RDBMS. As hardware becomes stronger, and is bundled with RDBMS software to become an "appliance", it becomes critical that these vendors begin to cross-implement functionality. We (as an enterprise) will have multiple levels of appliances from different vendors across the playing field in order to support different requirements. The "T" or transformations that are available should follow some standard like ANSI-SQL in their functions. Today, each vendor offers individual functions specific to their own database environments which makes it nearly impossible to construct ELT in all environments from a single metadata source. The vendor that begins to implement cross-functionality will have a jump on the other vendors, because their "appliance" will fit in with the rest of the enterprise and can be better leveraged by existing Data Flow Diagramming tools (ETL/ELT) that exist. Remember, metadata is king - proper use of metadata (IN ANY TOOL) allows me to build systems 3 times faster than hand-coding. Lastly, don't be fooled by RDBMS vendors, some would have you believe they are powerful enough to handle Transformation (ELT) within the database when joining 100M rows to 100M rows to 100M rows, unless the vendor can prove it, AND they can do it while returning a number of queries, and running a batch load sequence - then they are not worth their weight. Today, all these things are happening within the RDBMS while transformation is taking place. The larger the data set, the more active the systems. Thoughts? |