Blog: Dan E. Linstedt« December 2005 | Main | February 2006 » January 30, 2006Active and Right-Time Data Warehousing DefinedRecently I've been asked about Active Data Warehousing, and (Real-time) Right Time Data Warehousing, what do these mean to the enterprise? In this short blog entry, I offer my opinion on the definition of each. In future entries I will define the basics of building one, the questions to ask, and potential value to the enterprise. I lead an effort in Active Data Warehousing, and Right Time Data Warehousing for Myers-Holum, Inc. We have best practices surrounding these efforts, and will soon offer tips and tricks for free on our site. Too often, we are confused by marketing literature and vendor hype. I'm going to set the line and offer my opinion in DEFINING just what an Active Data Warehouse is, and just what a Right Time Data Warehouse is. Are they different? Yes, why? Well, we'll get to that in a minute. For now, this is the way I define each: Active Data Warehousing (ADW) Right-Time Data Warehouse (RTDW) (Not REAL-TIME) RTDW: If the business needs to answer a question at the end of the day, every day - then a RTDW would refresh on a daily basis. If the business needs to answer a question every 30 minutes, then an RTDW would refresh every 30 minutes - assuming the data is available. Is an RTDW an Active Data Warehouse? Is an ADW also an RTDW? So then, what is a Real-Time Data Warehouse? Is the business definition of Real-Time different than Right-Time? Please don't mince words when going forward. Develop best-practices, metadata definitions, and terminology standards (business metadata). Too often our businesses are confused by all the vendor hype and marketing material that "throws a term in" just because it sounds cool. I'd love to hear your thoughts on this topic, even if you disagree. How do you define Real-Time, Right-Time, and Active DW? Thanks, January 29, 2006Step closer to nanotech hardwareI've written several articles here in the past about Nanotech, the time-lines, and nanohousing(tm). About a year or two ago I wrote about the fact that IC chip manufacturers needed to get on board. We'll, looks like they've done so. In this brief entry I'll discuss their foray into nano scaled transistors and logic gates on computer chips. It is all very interesting, and I'll speculate on what it might mean going forward. Here's the news story: Computer World They have produced a "fingernail sized memory chip, about 45 nanometers wide -- about 1,000 times smaller than a red blood cell." What makes this interesting is how much memory they can put inside a memory stick. I've read other nanotech based articles recently which discuss advances to memory that (theoretically) will make "disk drives" obsolete. The nanomemory being experimented with can actually hold-state, and be supplied by an internal power source. This particular memory that Intel has produced doesn't discuss the specifications, but they do say that power consumption is greatly reduced, that means computers running cooler. The first question that comes to mind is: What happens to my computer? What happens to my RDBMS? (these are my predictions - opinion only) Remember the article on the DoD (department of defense) and DNA computing? Nanotech based computing with carbon nanotubes and other such devices is catching up, they'll be able to store hundreds of terabytes on what is equivalent to a memory stick today. In fact, I predict that manufacturers will completely remove "storage" from their internal offerings, and produce a "plug an play" storage device interface that is highly parallel, and will scale to access the terabytes of nanomemory. You'll be able to "take your information with you" in your shirt pocket. "Laptops" will become stationary, and these memory devices will plug and play with the next generation "Ipod" or "Windows CE devices". In fact audio, and video equipment will be adapted - just plug in the memory (all your "hard-drive information") and select your functionality, away you go. Of course this gives a whole new meaning to the term: SCOPE CREEP, and Spread-Marts - every "storage device" will essentially be a spread mart of corporate information. Which in turn states that corporations must begin NOW thinking about how to manage, and regulate these storage components. Security will get harder, not easier. I'd love to hear your whimsicle thoughts about where this could take us. Please post your comments. See you next time, January 26, 2006System of Registry (SoR) what does it mean?In the interest of SOA, and on my search for governance lately, I've been looking at System Of Registry (SoR) and what it means. If you've got an SOA project, or would like to build one, or maybe you're looking at Master Data Management (MDM) or metadata stewardship, or data stewardship then you might be interested in understanding basic registries and systems of registries. In the SOA/EII world there has been a lot of buzz about SoR and what it can provide, some vendors offer software to answer this call, and state that their SoR software helps you build a complete solution when it's integrated with other efforts. What does this mean? How can an SoR help you? Why would you want one? SoR (System of Registry) can be likened to a Taxonomy or classification of information. There is a good reference definition for UDDI and Web Services SoR here. For example, given a Ferrari, or a Mustang, or a Chevy Malibu, these are all in a class called "cars." If you were to construct a location (search) service, you might setup a class for searching cars, trucks, motorcycles, and so on. Each of these classes is it's own SoR, and when combined under one label: Search for Motorized Vehicles, you've got the start of a master taxonomy (master classification). By adding specific makes and models, and production years you can break each class into respective and locatable physical items. This helps complete the taxonomy (aka: System of Registry). It's very much like the Windows "Registry" that keeps all of your programs running and organized in such a manner that Microsoft Windows can reference them. What does a SoR do for EII and SOA, and ETL, and EAI or any integration platform for that matter? These are just a few of the basic components embedded in an SoR (specifically for SERVICES). Integration Services should provide these basic components no matter what; it's all metadata, but important metadata. Some of these elements are "protected" metadata and can only be accessed by authorized parties. Of course there's always the HISTORY of the SoR - when it was accessed, who accessed it, what the inputs and outputs were. So how can an SoR help you? Why would you want one? So what about an SoR for SOA/Web-Services? As web-services are vast and numerous, and the grain can vary from service to service, it is wise to develop a classification (taxonomy) to manage, organize and maintain the web-services, the SoR can help with that. A central SoR should be: Whether you're starting an SOA "project", building an MDM strategy, creating compliance and governance initiatives or integrating your Enterprise Warehouse with your active data warehousing strategies, it is considered a best practice to build an SoR along the way. My company specializes in bringing best-practices surrounding these types of efforts, and can help you kick start your projects. Avoid the pitfalls, and don't re-invent the wheels. What are your thoughts about SoR? Do you have any comments about specific software in this area? What do you like / dislike about the software? See you next time, January 24, 2006System of Registries and your Master DataIt is vital in any EII implementation to MANAGE YOUR METADATA. Well, what the heck does that mean? That's a big definition, but it certainly encompasses the ability to manage your services from a GUI perspective, manage the interaction of the API's under the covers, and the accessibility of the EII queries. At a process level it may mean to handle your web-services with ease. Systinet has been doing this for a while now, and they've gotten good at it. There are a number of software resources out there in this "young" market for managing registries, but Systinet was well known among them. In particular they've been utilized by a number of EII vendors in the market space. As with any advancing technology it is important to have a plan, and implementation strategy, and a set of best practices which utilize the best of breed tools going forward. Well, the good news is that Systinet provides this kind of thing. The not-so-good news (for EII vendors who partnered with them) is that Systinet has been purchased by Mercury Interactive for $105M. http://www.newratings.com/analyst_news/article_1175005.html Good for Mercury Interactive, bad for EII vendors who use their tool set. Once upon a time there were lots of ETL vendors, all these vendors and several other data movement players were using Striva to access their data. Striva got HOT, so hot that Informatica purchased them, and thus ends the story - the other vendors had been "Striva'd"... if you can turn that into a verb. The last thing any EII vendor needs today is to have this scenario play out again, but it just has. In order to make EII a better business proposition, a system of registries is recommended. I would suggest that any EII vendor out there who's listening take heed: it's time to roll your own, this is product functionality that will add to the bottom line valuation of your company, along with the business proposition - and to have an integrated GUI from which to manage it all would be wondrous. Of course, hold the horses a bit - because if an enterprise already has a System of Registries package, they'll want to integrate. If you roll your own - be sure to include an API that can exchange the information bi-directionally. If you are NOT a vendor, and are looking at implementing an EII solution, I would strongly urge you to take a look at the success stories spelled out in CIO magazine, most of these recommend a system-of-registries component be in place as a part of the critical architecture. Do you have a "story" about a system-of-registries and EII interaction? Let us hear it! Cheers, Appliances are coming to EDWI've blogged several times about how appliances are arriving on the scene, and how eventually (I believe) will hold a place as an EDW - appliance. Appliances are making forays into many areas of OLTP and data capture which is the first step on this journey to creating an "appliance based warehouse." In one of my posts I went so far as to state that I believe the future of warehousing rests squarely in the appliance hands, and of course - not everyone agrees (which is fine). In this entry I'll take a look at the reasons why I believe appliances will be the EDW of the future, and why they will contain all the software elements we take for granted today. Of course the nature and definition of EDW is shifting as we speak, and tomorrow it won't just be your parents warehouse anymore. CRN: 01/23/05 There is no question that appliances are becoming attractive because of the price/performance and functionality they contain. These appliances are increasing the competitive nature of the OLTP and data capture market. They do more and more processing within the appliance, and of course they've added storage, business rules, monitoring capabilities and API or service based accessibility. When it comes to getting the whole package today, an appliance just seems right - as long as you can find one for your specific business need. When it comes to warehousing today, the appliances are in their infancy. That doesn't mean (however) that they can't or won't grow up. What I mean is today, we've got vendors like DatAllegro, and Netezza, and a few others who play in the partial-appliance-for-data-warehousing-world. They've got a firm grasp of the notions that RDBMS integrated with hardware is the definite means to scale, and address performance problems. They've added self-tuning hardware (software on hardware), they've integrated Operating Systems, and firmware, and they've begun to tackle the load/unload issues of large data sets in parallel with partitioning. There is reason to advance these appliances and add more features like: Transformation engines, GUI development, monitoring, maintenance, and on-board data mining capabilities, BI capabilities, and cube building systems. It just makes sense from an offering perspective for these hardware builders to team up with the software industry and bury the software into the device - producing a PLUG AND PLAY WAREHOUSE capable of saving cost, reducing installation and maintenance time, and increasing productivity. Yes, there are a lot of bridges to cross with this type of approach, because the device must scale to large business, it must also meet the needs of SMB's - that’s where volume sales make up the majority of the profits (especially with reduced pricing). Imagine, if one simple feature such as ETL were built into the hardware device we'd have an easier time of establishing plug and play components. Teradata is beginning to do this, and Microsoft SQLServer 2005 with it's SSIS has started to do this, in fact these vendors have optimized their ETL / ELT mechanisms to work with their RDBMS natively on their platform. While they may not be "best of breed" per-say, they will certainly make a dent in the market place, particularly when bundled together with the hardware, and the RDBMS engine. In order to stay competitive, larger vendors of BI, data mining, metadata management, ETL/ELT, EII and EAI would be wise to begin partnering with the appliance vendors, and possibly jumping on board to provide bundled solutions. An Exabyte ad in CRN shows this to be interesting and plausible: "VXA-320 PacketLoader 1x10 1U. With a reasonable price tag, small and midsize businesses can take full advantage of the device... able to combine the 1U hardware device with almost any third-party backup software... is an attractive option". It may be that service oriented appliances are the next wave, but who wants to run around and try to "integrate" all these disparate software programs with separate hardware and RDBMS and storage systems? The pendulum has begun to swing back to consolidation. Thoughts? Governance is Muddy Water.I've been doing a lot of research lately on the nature of Governance. There are a lot of misconstrued definitions in the market place, and a lot of vendors throwing around terms that they don't define. It seems like I've found definitions for Corporate Governance, and IT Governance, and even a definition for "Country" Governance, but finding definitions of SOA Governance and Data Governance leads to muddy water. "It's like trying to catch smoke with your bare hands!" (Harry Potter, The Prisoner of Azkaban). I recently ran a search on "IT Governance"+Defined, it came up with some interesting ideas, I then did the same for Data Governance. Surprisingly, many vendors claim they have or support Data Governance, but nowhere do they define exactly what that means, or how it applies to business. In fact, in some cases, they define Data Governance as "Governance over Data" - extremely redundant. In light of this confusion, I've decided to write a new series for Bill Inmons newsletter on Governance, and what it means to business. Included in this series are snippets of best practices which my consulting company has developed and implements on site. Surprisingly, there are many consultancies who claim to be able to help implement Governance at an enterprise, but very few actually can define just what it is, what its' deliverables, are, and how to make it work within the corporate organization. After reading through the series, feel free to comment HERE on the articles, use this as the feedback for the series on governance. OR if you have thoughts about what works and what companies SHOULD implement, please comment here as well. Happy reading! January 19, 2006Giant Jellyfish Rip Nets in JapanThis isn't a B-Eye type story, and normally I don't post things like this, but I figured this is interesting enough to post, apparently there are giant jellyfish, the size of sumo wrestlers, busting nets and hurting the Japanese fishing industry. Check it out... http://www.cnn.com/2006/WORLD/asiapcf/01/19/japan.jellyfish.reut/index.html The picture with the diver in the background is most interesting. I wonder what happens if we draw a parallel to the business of IT?? January 6, 2006Redfining the EDW and ODSThis was a hot topic for most of you, with compliance breathing down our necks and the government hot on the auditing trail we have to do something. And something we shall do! In fact, the nature and notion of EDW and ODS is changing, as I blogged in my most recent entry in this category. I made a statement: "Flip the coin, and store RAW data as-it-stood on the source system, but in an integrated fashion in your data warehouse; now what have you got? A solid architecture (if modeled properly) which allows data to be auditable from that time period before the change. The Data Warehouse has now become a system-of-record." and a comment was made, that this sounded like an oxymoron - I was asked to elaborate. In this entry I'll attempt to explain what I mean by this statement. It's very possible that I didn't state it quite "correctly".... Ok, here are the facts, just the facts... I believe our data warehouses must return to storing data "as it stood" in the source system - that is, snapshot copies of the good, the bad, and the ugly - all in the warehouse all at the same time. But this brings with it one major problem: the Data Warehouse still must provide some layer of integration horizontally across the enterprise. What I mean is: The articles I've written walk through integration of source systems surrounding raw data for compliance reasons. One of the KEY notions is that in this example, the business uses CUSTOMER KEY to access CUSTOMER records - it doesn't matter which system they are accesing, they need some form of KEY to get the data out. No key? can't find the data... it is lost forever in the source system. Let's assume the semantic definition for Customer states that All source systems capture customer at an INDIVIDUAL level, not a CORPORATION level (or if they do capture CORPORATION) they hopefully assign different keys, and place the CORPORATION in a different source table. Ok, we've established INDIVIDUAL as the semantic layer, and CUSTOMER KEY as the horizontal integration point. Based on this notion: we must design the warehouse around the BUSINESS KEY known as CUSTOMER_KEY, and thus INTEGRATE the information horizontally into a single table called HUB_CUSTOMER. In loading HUB_CUSTOMER we use the maximum space for the largest data type, and record the load date and record source (which source system the KEY came in from), but we have an integrated list at the end of the day which provides the business with a single FULL source of customer keys that exist across ALL our source systems. Let me back up a minute and define what I mean by INTEGRATION: One could argue that we are changing the byte representation for some data (changing integer representation to character or Unicode to fit in the warehouse CUSTOMER_KEY column) but for all intensive purposes the data is still traceable, and the value of the data is preserved. So to recap the quote at the top: I mean: copy the data without transforming it, into the data warehouse. Place the same data at the same grain into the same structure (regardless of source system). In other words, customer keys for individual are placed into HUB_CUSTOMER, and customer data for individuals is placed in a satellite structure, dependant on the key - SAT_CUSTOMER - and the satellite contains data over time snapshots which establish a CRC/audit trail for information change. I hope I've cleared this up. I have a book on the Data Vault in the works which will be available Q2-2006 on B-Eye Network. I welcome all thoughts, questions, and concerns. |