Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

June 2005 Archives

DDW (Dynamic Data Warehousing) is a couple years off (at least 3 to 5), but I still like to look into the future to find out what kinds of things we might create, and what the value of DDW might be. This entry is an exploratory entry - meant for discussion purposes, so please comment on what you think about DDW, it's feasibility, the timelines and anything else that may come to mind.

In my mind, to get to DDW we must take baby steps. Unless of course some engineer has a tremendous breakthrough and can create the necessary components quickly and easily. I think it may be possible to build a DDW today, at least for experimental purposes.

If I look at the raw components available today, I would suggest starting with a DW Appliance like Netezza, or Datallegro. Then, I would negotiate a deal with a data mining company to place their software on a hard-ware card that can plug and play with the Appliance. Next, we would engage the data mining during load streams as well as scheduling the neural net to mine the data in near-real time for associations and meanings. The twist would be the neural nets’ focus: which would be directed to work on structure and relationships rather than data itself. It would use the data to measure confidence levels for structure.

It would be as though we’d have a Structural Quality Engine built into firm-ware and hooked to super-fast I/O and high speed parallel systems; thus resulting in a semi-smart device. I would recommend starting off with simple rules and a 1, 2, 3 rating (as discussed in my previous posting). Where 1 would be Manual intervention required before structure change is made, 2 = warning, but change is made in place, 3 = notification that a change has occurred. Based on confidence level thresholds we would see this system stratify changes over time.

So what does DDW bring to the table?
An incredible ability to adapt structure of the Data Warehouse or historical integration store on the fly – that’s the technical side of it anyhow. From a business perspective it can translate to much lower maintenance costs, much lower response times from IT for adjusting to changes, much lower “new project” costs. “What If” questions could be asked of the production architecture without actually making changes. Rates of Structure changes could be monitored, projected, and gauged based on business partners whom are interacting with the SOA interfaces.

It would mean a better way to estimate “changes” to the data warehousing system, with hard-fast numbers and confidence levels to back it up. On the other hand, we’d have to pay for the engineering somehow, so up front costs would rise in order to lower TCO over time.

It means easier access to changes, more dynamic and flexible changes in the hands of the business users, and if the DDW is hooked to a business rules engine – or SOA Business Process Workflow, it can be “dialed in” to the changes that are coming down the pike (expected at the data level).
I’d love to hear what your thoughts are on this topic, all comments are welcome.


Posted June 30, 2005 12:44 PM
Permalink | No Comments |

To all, I apologize if I've provided misinformation in my article titled Nanotechnology Basics Defined. I am deeply sorry that I may not have defined carbon nanotubes properly, or that I haven't provided readers with accurate information regarding carbon molecules. I will be updating the article within the next two months. Please be aware that the purposes for my exploration in Nanotech are to...

a. discuss concepts and thought processes in reference to the application of nanotech to a data warehousing and business intelligence world.
b. explore and hypothesize about new possibilities for software and engineering.
c. better understand it's purpose within every day business applications.
d. Expand the bounds of which types of nanotech will work in the future world.

If there is anyone out there who wishes to comment, it would be nice to hear from you. Particularly if you would be so kind as to provide links for scientific research, and to point me to resources that think outside the box (in terms of speculation) - I will take the time to read through the information.

Thank-you again.

Sincerely,
Dan Linstedt


Posted June 8, 2005 4:44 PM
Permalink | No Comments |

This is a very short entry to let you know, we have a FREE Metadata Tool scorecard available for your downloading. What does this scorecard contain?

This scorecard contains over 300 ranking elements for the evaluation of a metadata tool within your environment. Everything from Cost to availability of consultants, to cost of consultants per hour - including metadata on queries, business rules, and data itself.

Please keep in mind that this is a future looking scorecard. There are a LOT of features and functionality that no metadata tool has to offer today, however the metadata itself is available and can be captured with a little bit of manipulation. For instance, Teradata captures query timings, execution cycles, where clauses, the SQL that was executed, who ran it, and how long it took. No metadata tool that I'm aware of picks this up today.

On the other hand there is additional metadata buried in business rules engines like Logist, or business process engines like WebMethods - again there is a question as to how deep these metadata tools go, and can they integrate this with technical or structured metadata. There are also areas of unstructured metadata retrieval (as typically found in Rational Suite of products), the metadata tools today don't do a good job of connecting to these engines either.

There are many different types of metadata rules, and CWM doesn't cover them all. Just because a metadata tool states CWM compliance doesn't mean they've implemented the CWM interface in a standards type of format. So you see, the need to use a metadata tool scorecard has come. The need to push metadata vendors into evolving the products (out of the current complacency they have) is necessary.

Love to have your feedback, thoughts and comments - particularly if there's something I shouldn't include in the scorecard, or if there's something I forgot to include in the scorecard.

Scorecard is at: Myers-Holum, Inc

Cheers,
Dan L
CTO, Myers-Holum, Inc.


Posted June 2, 2005 6:15 AM
Permalink | No Comments |

Accountability (or lack thereof) has long been a problem with integration projects including data warehouses. Business users tell IT: the warheouse is wrong, and you know what? They're right! Why? Because IT implemented business rules to merge data, to far up-stream. They've lost traceability, and therefor accountability for the facts.

Not only is accountability a problem, but technically speaking - handling real-time data along with the batch data is also a problem. Most modeling architectures don't support this, thus resulting in cost increases to re-architect to move from an EDW to an ADW.

Here's a real-life case about accountability that I ran into over 10 years ago. Manufacturing costs had never matched finance costs. For some reason, financial reports of build processes had been off by X dollars for over 15 years. When we built the warehouse we went to the following paradigm:

Raw data pull, Raw data integrate (no change to the grain, no merging), then we implemented business rules to merge, mix and match going FROM the warehouse into two places: Error Marts and "real-marts" or the business version of the truth. Some business users were stating that the "real-marts" were right, others were complaining that the reports they were getting were wrong - and placing the blame incorrectly on the Data Warehouse project.

This came to a head quickly when executive decision makers got wind that the business believed the warehouse to be wrong because it didn't match the operational reports they were getting. The executives hired an auditor to come in and see if the warehouse was wrong (this was long before SOX and Basil II). The auditor came to us and said: show me how these financial reports are produced, what the data looks like BEFORE it was changed into business usage, and what it looks like after it was changed - furthermore can you tell me when it was changed?

We walked through the separation of data in the Error Marts, and the Financial Marts for the auditor, we showed them the business rules that the business users had asked for - in the cycle of loading both Error Marts and Financial Marts. We traced the data through the aggregation and integration, and cleansing components. He validated and verified our information.

Then, we showed him the compliant STATEMENT OF FACT in our warehouse, each row with a record source and a load stamp - where it came from, when it came in from that source and the fact that the data was not MERGED, nor was it cleansed - it was a raw snapshot of what it looked like in the source system AT THAT TIME. He then independently verified (for current data) that we indeed were loading raw data that matched the operational systems.

He walked over to the operational report on the operational system, looked at the aggregations and merging - and delcared that the warehouse was right, and the operational report was wrong - furthermore it had been wrong for 15 years! On that day, the business learned the value of true warehousing (statement of Fact), and separation of data from information - where the business rules were placed into building of the marts.

They found out that 1) if the warehouse team had loaded and matched to the operational report it would really be wrong (would have propogated the accounting error), and 2) that they had mis-billed their client for 15 years, 3) that the only reason they could discover this error was because we had a single statement of fact in our warehouse, and that we left the version of truth to building the marts. Needless to say they corrected the problem, reported it to their customer - and were subsequently awarded more contracts because they were finding and fixing their own BUSINESS FLAWS.

For the first time in 15 years we saw a tremendous rise of Change Requests on the source system to begin fixing Business Process Problems on the data capture side. We also saw change requests and investigations into the physical business processes for the same reason - they could see and audit the BAD DATA right along side the Good Data. We had itemized visibility into our data supply chain.

End users were now accountable for business problems, explaining why they happened and then recommending fixes to either source systems or business processes or both. Error marts helped separate raw data from truly usable information. This kept the Queries clean, and the reports accurate.

Next, we'll finish up this series with a discussion on real-time data sets. By now you should be getting the feeling that Data Modeling is very much a part of the success or "failure" to use information in the warehouse correctly. It should also be evident that traceability, statement of facts, and separation of perceived "good" information from "bad" data is necessary for auditability, compliance and Business accountability. No where is this more evident than with the source system data pulls, and with the data warehouse modeling techniques - along with the shift of the business rules to the OUTPUT from the warehouse sides.

Business Rule #6:
In order to remain compliant, auditable, and prove beyond a shadow of a doubt that the warehouse is NOT wrong - the correct architecture and data modeling must be put in place.

You can read more about the data modeling constructs called the Data Vault at: www.DanLinstedt.com, in fact you can ask questions of customers who are using these techniques today. See you next time.

Comments?
Dan L
CTO, Myers-Holum, Inc. http://www.myersholum.com


Posted June 2, 2005 5:34 AM
Permalink | No Comments |