Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

January 2006 Archives

Recently I've been asked about Active Data Warehousing, and (Real-time) Right Time Data Warehousing, what do these mean to the enterprise? In this short blog entry, I offer my opinion on the definition of each. In future entries I will define the basics of building one, the questions to ask, and potential value to the enterprise. I lead an effort in Active Data Warehousing, and Right Time Data Warehousing for Myers-Holum, Inc. We have best practices surrounding these efforts, and will soon offer tips and tricks for free on our site.

Too often, we are confused by marketing literature and vendor hype. I'm going to set the line and offer my opinion in DEFINING just what an Active Data Warehouse is, and just what a Right Time Data Warehouse is. Are they different? Yes, why? Well, we'll get to that in a minute. For now, this is the way I define each:

Active Data Warehousing (ADW)
The technical ability to capture transactions when they change, and integrate them in to the warehouse - along with maintaining batch or scheduled cycle refreshes.

Right-Time Data Warehouse (RTDW) (Not REAL-TIME)
The ability to answer a specific justifiable business question at the time in which it is asked. In other words: a pre-designed business question that requires heaps of pre-integrated data (from the warehouse) in order to answer the question. The answer to the question drives a competitive business decision or a decision that already has a cost / benefit analysis tied to the answer (in other words: quantifiable answer).

RTDW: If the business needs to answer a question at the end of the day, every day - then a RTDW would refresh on a daily basis. If the business needs to answer a question every 30 minutes, then an RTDW would refresh every 30 minutes - assuming the data is available.

Is an RTDW an Active Data Warehouse?
In my opinion: not always. The way I see it: ADW refreshes WHEN TRANSACTIONS CHANGE (they also combine pre-scheduled batch cycles). An RTDW is an ADW when the two are in sync - in other words, if transactions change every 10 minutes, and are captured and integrated into the warehouse as they change, and I have a business question that must be answered every 10 minutes, then I have an ADW and an RTDW.

Is an ADW also an RTDW?
Not necessarily, although 99% of the time, yes - it should be. Why? Because it costs a lot of money to build and feed an ADW, the ADW shouldn't be constructed without solid quantifiable business questions, and in doing so - thus answer the RTDW criteria.

So then, what is a Real-Time Data Warehouse?
I've blogged before on my interpretation of Real-Time data warehousing, I still maintain that it does not exist, and that timing just can't be fast enough (due to laws of physics) to make Real-Time decisions. That of course is the technical definition.

Is the business definition of Real-Time different than Right-Time?
Yes, there is a distinction between the two. Real-Time is more geared towards what I've defined as ADW (here), than it is Right-Time (as defined here).

Please don't mince words when going forward. Develop best-practices, metadata definitions, and terminology standards (business metadata). Too often our businesses are confused by all the vendor hype and marketing material that "throws a term in" just because it sounds cool.

I'd love to hear your thoughts on this topic, even if you disagree. How do you define Real-Time, Right-Time, and Active DW?

Thanks,
Dan L


Posted January 30, 2006 12:47 PM
Permalink | 5 Comments |

I've written several articles here in the past about Nanotech, the time-lines, and nanohousing(tm). About a year or two ago I wrote about the fact that IC chip manufacturers needed to get on board. We'll, looks like they've done so. In this brief entry I'll discuss their foray into nano scaled transistors and logic gates on computer chips. It is all very interesting, and I'll speculate on what it might mean going forward.

Here's the news story: Computer World

They have produced a "fingernail sized memory chip, about 45 nanometers wide -- about 1,000 times smaller than a red blood cell." What makes this interesting is how much memory they can put inside a memory stick. I've read other nanotech based articles recently which discuss advances to memory that (theoretically) will make "disk drives" obsolete. The nanomemory being experimented with can actually hold-state, and be supplied by an internal power source. This particular memory that Intel has produced doesn't discuss the specifications, but they do say that power consumption is greatly reduced, that means computers running cooler. The first question that comes to mind is:

What happens to my computer?
1. it runs much much cooler
2. Everything becomes RAM based.
3. No more personal lap heaters
4. Smaller batteries can be used - by the way, there's another article in the recent Scientific American about nanobatteries... this is a HUGE advancement; particularly if the two are coupled together... Imagine the power.
5... the list goes on.

What happens to my RDBMS? (these are my predictions - opinion only)
1. One step closer to a nanohouse, while nanohousing may not be "true to life" (in other words, a nano-scaled data warehouse complete with software/hardware mixed together), the RDBMS will begin using nanomemory.
2. DATA PARTITIONING WILL GO BY THE WAY-SIDE
3. Arguments over VLDW/VLDB and MPP vs SMP will dissipate.
4. Performance and tuning will become highly specialized, and finally "disappear"
5. Data Layout and data modeling will become more abstracted, as freedom to experiment will take place because of the faster RAM storage.

Remember the article on the DoD (department of defense) and DNA computing? Nanotech based computing with carbon nanotubes and other such devices is catching up, they'll be able to store hundreds of terabytes on what is equivalent to a memory stick today.

In fact, I predict that manufacturers will completely remove "storage" from their internal offerings, and produce a "plug an play" storage device interface that is highly parallel, and will scale to access the terabytes of nanomemory. You'll be able to "take your information with you" in your shirt pocket. "Laptops" will become stationary, and these memory devices will plug and play with the next generation "Ipod" or "Windows CE devices".

In fact audio, and video equipment will be adapted - just plug in the memory (all your "hard-drive information") and select your functionality, away you go. Of course this gives a whole new meaning to the term: SCOPE CREEP, and Spread-Marts - every "storage device" will essentially be a spread mart of corporate information. Which in turn states that corporations must begin NOW thinking about how to manage, and regulate these storage components. Security will get harder, not easier.

I'd love to hear your whimsicle thoughts about where this could take us. Please post your comments.

See you next time,
Dan L


Posted January 29, 2006 5:20 PM
Permalink | No Comments |

In the interest of SOA, and on my search for governance lately, I've been looking at System Of Registry (SoR) and what it means. If you've got an SOA project, or would like to build one, or maybe you're looking at Master Data Management (MDM) or metadata stewardship, or data stewardship then you might be interested in understanding basic registries and systems of registries.

In the SOA/EII world there has been a lot of buzz about SoR and what it can provide, some vendors offer software to answer this call, and state that their SoR software helps you build a complete solution when it's integrated with other efforts. What does this mean? How can an SoR help you? Why would you want one?

SoR (System of Registry) can be likened to a Taxonomy or classification of information. There is a good reference definition for UDDI and Web Services SoR here. For example, given a Ferrari, or a Mustang, or a Chevy Malibu, these are all in a class called "cars." If you were to construct a location (search) service, you might setup a class for searching cars, trucks, motorcycles, and so on. Each of these classes is it's own SoR, and when combined under one label: Search for Motorized Vehicles, you've got the start of a master taxonomy (master classification).

By adding specific makes and models, and production years you can break each class into respective and locatable physical items. This helps complete the taxonomy (aka: System of Registry). It's very much like the Windows "Registry" that keeps all of your programs running and organized in such a manner that Microsoft Windows can reference them.

What does a SoR do for EII and SOA, and ETL, and EAI or any integration platform for that matter?
The SoR is responsible for housing (at a minimum) the following elements:
* Name
* Class
* Business Description
* Date/Time of registration
* Active / Inactive
* Accessibility (Security/control/allowed access)
* Encrypted
* Keyed
* Defined Inputs
* Defined Outputs
* Availability (when is it available)
* Last time it was updated
* Version

These are just a few of the basic components embedded in an SoR (specifically for SERVICES). Integration Services should provide these basic components no matter what; it's all metadata, but important metadata. Some of these elements are "protected" metadata and can only be accessed by authorized parties. Of course there's always the HISTORY of the SoR - when it was accessed, who accessed it, what the inputs and outputs were.

So how can an SoR help you? Why would you want one?
An SoR can help you organize, classify, and manage the metadata across your organization. By the way, the ultimate working Requirements Document is in fact - a System of Registry!! Interesting how that works out, it just so happens that in order to meet governance and compliance - an SoR of business requirements can be of tremendous help, see my upcoming series on Governance in the Enterprise for more information. The SoR helps manage a multitude of access points and data elements from within an easy to use classification system. It saves on the bottom line of implementation costs, rework costs, management and risk reduction, along with a few others.

So what about an SoR for SOA/Web-Services?
Well, SOA is architecture - an abstraction, not a product. There are products within the SOA definition, but a product is NOT an SOA. See my article here for more info. SoR for Web-Services will track the above list and more. It is vital (if not critical) to combine the SoR implementation with compliance and governance initiatives, this will provide longevity to the SoR and keep it from "going out of date" or falling out of interest with the executive staff of the organization.

As web-services are vast and numerous, and the grain can vary from service to service, it is wise to develop a classification (taxonomy) to manage, organize and maintain the web-services, the SoR can help with that. A central SoR should be:
* Web Enabled (management, updates, and additions)
* Flexible (based on security for management)
* Autonomic in Discovery of undocumented web-services across the enterprise
* Visible and understandable all the way to the board of directors (they may choose to look at only the classifications)
* GUI Driven with Icons, and definitions - single click drill down
* BONUS: Visualization of the classification, number of elements registered, and number of elements "active" with frequency graphs.

Whether you're starting an SOA "project", building an MDM strategy, creating compliance and governance initiatives or integrating your Enterprise Warehouse with your active data warehousing strategies, it is considered a best practice to build an SoR along the way. My company specializes in bringing best-practices surrounding these types of efforts, and can help you kick start your projects. Avoid the pitfalls, and don't re-invent the wheels.

What are your thoughts about SoR? Do you have any comments about specific software in this area? What do you like / dislike about the software?

See you next time,
Dan L


Posted January 26, 2006 5:33 AM
Permalink | No Comments |

It is vital in any EII implementation to MANAGE YOUR METADATA. Well, what the heck does that mean? That's a big definition, but it certainly encompasses the ability to manage your services from a GUI perspective, manage the interaction of the API's under the covers, and the accessibility of the EII queries. At a process level it may mean to handle your web-services with ease.

Systinet has been doing this for a while now, and they've gotten good at it. There are a number of software resources out there in this "young" market for managing registries, but Systinet was well known among them. In particular they've been utilized by a number of EII vendors in the market space. As with any advancing technology it is important to have a plan, and implementation strategy, and a set of best practices which utilize the best of breed tools going forward.

Well, the good news is that Systinet provides this kind of thing. The not-so-good news (for EII vendors who partnered with them) is that Systinet has been purchased by Mercury Interactive for $105M.

http://www.newratings.com/analyst_news/article_1175005.html

Good for Mercury Interactive, bad for EII vendors who use their tool set. Once upon a time there were lots of ETL vendors, all these vendors and several other data movement players were using Striva to access their data. Striva got HOT, so hot that Informatica purchased them, and thus ends the story - the other vendors had been "Striva'd"... if you can turn that into a verb.

The last thing any EII vendor needs today is to have this scenario play out again, but it just has. In order to make EII a better business proposition, a system of registries is recommended. I would suggest that any EII vendor out there who's listening take heed: it's time to roll your own, this is product functionality that will add to the bottom line valuation of your company, along with the business proposition - and to have an integrated GUI from which to manage it all would be wondrous. Of course, hold the horses a bit - because if an enterprise already has a System of Registries package, they'll want to integrate. If you roll your own - be sure to include an API that can exchange the information bi-directionally.

If you are NOT a vendor, and are looking at implementing an EII solution, I would strongly urge you to take a look at the success stories spelled out in CIO magazine, most of these recommend a system-of-registries component be in place as a part of the critical architecture.

Do you have a "story" about a system-of-registries and EII interaction? Let us hear it!

Cheers,
Dan L


Posted January 24, 2006 1:40 PM
Permalink | No Comments |

I've blogged several times about how appliances are arriving on the scene, and how eventually (I believe) will hold a place as an EDW - appliance. Appliances are making forays into many areas of OLTP and data capture which is the first step on this journey to creating an "appliance based warehouse." In one of my posts I went so far as to state that I believe the future of warehousing rests squarely in the appliance hands, and of course - not everyone agrees (which is fine).

In this entry I'll take a look at the reasons why I believe appliances will be the EDW of the future, and why they will contain all the software elements we take for granted today. Of course the nature and definition of EDW is shifting as we speak, and tomorrow it won't just be your parents warehouse anymore.

CRN: 01/23/05
Tons of headlines in this media magazine indicate the rise of appliance based devices:
"Smartphones get Smarter"
"Top selling hardware products are Networking Devices"
"Symantec rolls out small-biz mail appliance"
"3COM: v6000 supports VCX call control, call center, IP messaging software"

There is no question that appliances are becoming attractive because of the price/performance and functionality they contain. These appliances are increasing the competitive nature of the OLTP and data capture market. They do more and more processing within the appliance, and of course they've added storage, business rules, monitoring capabilities and API or service based accessibility. When it comes to getting the whole package today, an appliance just seems right - as long as you can find one for your specific business need.

When it comes to warehousing today, the appliances are in their infancy. That doesn't mean (however) that they can't or won't grow up. What I mean is today, we've got vendors like DatAllegro, and Netezza, and a few others who play in the partial-appliance-for-data-warehousing-world. They've got a firm grasp of the notions that RDBMS integrated with hardware is the definite means to scale, and address performance problems. They've added self-tuning hardware (software on hardware), they've integrated Operating Systems, and firmware, and they've begun to tackle the load/unload issues of large data sets in parallel with partitioning.

There is reason to advance these appliances and add more features like: Transformation engines, GUI development, monitoring, maintenance, and on-board data mining capabilities, BI capabilities, and cube building systems. It just makes sense from an offering perspective for these hardware builders to team up with the software industry and bury the software into the device - producing a PLUG AND PLAY WAREHOUSE capable of saving cost, reducing installation and maintenance time, and increasing productivity. Yes, there are a lot of bridges to cross with this type of approach, because the device must scale to large business, it must also meet the needs of SMB's - that’s where volume sales make up the majority of the profits (especially with reduced pricing).

Imagine, if one simple feature such as ETL were built into the hardware device we'd have an easier time of establishing plug and play components. Teradata is beginning to do this, and Microsoft SQLServer 2005 with it's SSIS has started to do this, in fact these vendors have optimized their ETL / ELT mechanisms to work with their RDBMS natively on their platform. While they may not be "best of breed" per-say, they will certainly make a dent in the market place, particularly when bundled together with the hardware, and the RDBMS engine.

In order to stay competitive, larger vendors of BI, data mining, metadata management, ETL/ELT, EII and EAI would be wise to begin partnering with the appliance vendors, and possibly jumping on board to provide bundled solutions.

An Exabyte ad in CRN shows this to be interesting and plausible: "VXA-320 PacketLoader 1x10 1U. With a reasonable price tag, small and midsize businesses can take full advantage of the device... able to combine the 1U hardware device with almost any third-party backup software... is an attractive option".

It may be that service oriented appliances are the next wave, but who wants to run around and try to "integrate" all these disparate software programs with separate hardware and RDBMS and storage systems? The pendulum has begun to swing back to consolidation.

Thoughts?
Dan L


Posted January 24, 2006 7:25 AM
Permalink | 2 Comments |
PREV 1 2