Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

April 2008 Archives

It seems people have taken the term "Dynamic Data Warehousing" and abused it. They've made it out to be about "Dynamic Data" and completely ignored "Dynamic Modeling", or dynamic restructuring as the case may be. Automorphic means self-changing, self-adapting. In this entry we'll talk about different capabilities of Dynamic Data Warehousing and the changes to data models as they grow.

First, let's define what we mean by Dynamic Data Warehousing:

My definition of DDW has come to mean:
* Data models that can adapt to the incoming data based on A.I. rule sets, learnt data patterns, linguistics, metadata, and associativity.
* Load patterns that are driven by the changes, to move data from "point a" (the source) directly to point b (the target, or the dynamic model)
* Indexes that shift, are created and dropped on the fly based on load patterns and query patterns
* Learning systems attached to the firm-ware of the device to watch, and learn about the metadata - tying the metadata together is an important step.
* Adaptable cubes - dynamic cubes or in-memory aggregations based on "temperature data" (hot, cold, warm). In other words, in-memory ROLAP solutions that are built based on metadata and cube structures, attribution of data sets, and queries or the questions being asked.

Now this may all sound really interesting, and extremely future based - but I can assure you - it's not. I am currently working on solutions in my lab which entail the execution of portions of these elements. The hardest one (you might think) is the dynamic modeling, or dynamic restructuring of the database... well, let me tell you - nope! That’s not the hardest piece (when the Data Vault modeling architecture is used)...

Keep in mind that the DDW is one or two steps beyond the Operational Data Warehouse, which I've just begun writing about. Also remember that the term: DDW retains all the responsibilities of the "Data Warehouse", as in: time-variant, non-volatile, granular, etc... That said, the question for Dynamic Data Modeling then becomes: how do you keep history on massive volumes of information without losing value, and without "reorganizing" or altering existing structures?

The answer is to come later, but with the Data Vault modeling methodology it CAN be done...

So how about 3rd normal form?
Sorry, it seems to be incapable of handling dynamic structure change. Why? The same reason that it fails as a Data Warehousing architecture in the first place. Parent-Child relationships embedded in the tables, and then: placing the structure over time. changing the structure of a 3NF DW is bad enough, let alone trying to alter it on-the-fly during loading and maintain existing history. This requires super-human strength, massive amounts of disk (to copy the elements), and sometimes changes the MEANING of the data when the structure changes.

"Danger Will Robinson!" (quote from a U.S. T.V. show ... lost in space, from the 1960's) http://en.wikipedia.org/wiki/Danger,_Will_Robinson

Ok, so what about Star Schema's?
Well, if you read through the definitions of Star Schemas AS ENTERPRISE DATA WAREHOUSES you quickly find that it's not the right fit, hence the new Generation 2, DW2.0(tm), and other new modeling concepts like the Data Vault.

Have you ever tried to change the structure of a conformed dimension? Does it indeed get harder as the system grows, and/or the more conformity it has? Does it slow down your development efforts?

Yes to all of these (at least from my personal experience). Does that make Star Schema bad? NO! Star Schemas are AWESOME, WONDERFUL, and the ONLY solution to work for OLAP, and Drill Down... Do they have a place in the DDW? YES! ABSOLUTELY! well then, where?

They have a place as adaptable cubes. Something funny happens when Star Schemas are used as SINGULAR STARS to LOGICALLY define VIRTUAL marts. They work extremely well, and as long as they are logical (not physically implemented), then dynamic memory cubes can become a reality. That's right! IN-MEMORY CUBING, it's happening already in certain DB engines, but it's not yet dynamic.

However, as a DDW foundational structure, we need something else. The Data Vault Model seems to be (today) the only other choice available that is actually capable of executing on this dream. We'll talk more about this in my Data Vault blog on http://www.BetterDataModel.com

Cheers for now,
Dan Linstedt
DanL@DanLinstedt.com


Posted April 26, 2008 7:56 AM
Permalink | 1 Comment |

To follow on with our series, we'll dive in now and explore some of the elements needed for a repeatable, consistent, and redundant project. These are components that make the project book completely usable - without these pieces, the project methodology usually sits on a shelf and gathers dust. What we are aiming at is: the hope of reducing overhead costs, reducing errors, increasing productivity, and increasing agility of I.T.

Welcome to the next entry in the series. In this entry we'll dive in to some of the physical components you want in your project binder to help make it work. Despite what some believe (that this is cumbersome, overly documented, pushed to the edge, too many standards) etc...

Once followed, it actually makes project work repeatable, consistent, and reliable. It puts you and your team in a position of continuous process improvement, and continuous success. There are a few key deliverables that absolutely must be put in to the project binder.

1) Service Level Agreements - SLA's. These are a phenomenal way to communicate acceptance of everything from default values to problems and issues that arise during project implementation. Without SLA's, the project can easily be "blamed" for not meeting goals, not building what was expected, not getting it right.

The hard part? Getting the business to realize that this doesn't mean "it's set in stone" - it simply means that they understand the level of service currently being implemented, and they agree with the output _at that time_. Once we've achieved this level of understanding, and teach the business that they can change their minds (by signing a new SLA) then we can take steps forward.

SLA's function as change requests, issues and mitigation choices, and implementation direction controls. Standard SLA's should be a part of every project, and should be carried around by the project lead in their back pocket.

They don't have to be long (usually not more than one paragraph) but they do need signature lines: Project manager, one of the sponsors, and a business technical lead.

2) Business Technical Lead. What is this? This role is usually assigned to someone on the business side of the house. Someone who then interfaces with the project manager in I.T. on a daily basis. They are business driven, but have a knack for I.T. and a deep understanding of how to work the technology. They are responsible for setting up all meetings with the business users, coordinating the rooms, writing the user help manual, generating the first cut of metadata and definitions, and being a part of the immediate "testing team" which provides results / runs reports against your production data.

They are the ones who call emergency meetings when a hard-stop is reached in the project. They also work in tandem with the business itself to ensure the project doesn't fall off-track, or off-focus. they communicate with the Project manager about pending business changes. These resources are invaluable. If you don't have one on your project, your likelihood of _long term_ success is limited.

3) Expert Knowledge. All too frequently consulting companies bring in the expert to help close the sale, I.T. departments are guilty of the SAME THING! Then, they wisk the expert away back to some production fire which is not the project at hand, never to be seen again. This doesn't work for successful projects. Successful projects _need_ the expert on a consistent basis, and if an expert isn't available locally - get one from outside. Hire one as a consultant, bring them in as a sub-contractor, etc.. Any way you can. These experts should have the following skill sets:
* Have handled large projects
* Have built (actually technically implemented hands-on) with large volumes of data (25 TB+)
* Have expertise in data mining, understanding of data patterns
* Have multi-database engine experience (not just a single database engine)
* Understand business requirements
* Have a grasp of _multiple_ ETL, and BI tool suites
* Have working knowledge of 3rd Normal Form, Data Vault Modeling, and Star Schemas
* Understand ODS, Stage, Data Warehouse, and Star Schemas
* Be able to articulate definitions of compliance, accountability and governance.

It's a plus if they've worked on real-time data warehouses, or have had government experience, and gained SEI/CMMI Level 3 experience.

4) Have a subject matter expert: a SME who's also well versed in Cobol (if there are Cobol sources to pull from). Someone who understands the business from a data and systems side. Someone who can help identify exactly what you have, and what you don't have available to source from, or target to. Someone who can identify the problems with the current source system, and share the existing change requests that the system is undergoing. This person should have daily interaction with the Operational Systems and the teams running those applications.

Ok, so in this entry I diverged a bit. I'll get back to the templates in the next entry in the series. In this entry I went back to rule #1: identify roles and responsibilities - these are key roles and responsibilities that make projects successful out of the gate. Even small projects at mom & pop shops, these roles and responsibilities are important.

Hope this helps,
Dan Linstedt
DanL@DanLinstedt.com


Posted April 25, 2008 5:35 AM
Permalink | No Comments |