Time for the Wednesday (or thereabouts) âWhatâ (what I have learnedâŚ). OK, I seem to be endlessly prompted in my client work with these learnings so thereâs no shortage of them, but sometimes I donât have an elegant preamble to a blog entry. So, Iâll just say it.
Youâve got to tie that warehouse data back to source or users will cry foul. It doesnât matter how dirty the source data is. If you want to change the data en route to the warehouse to clean it, fine, change it, but bring the original data as well in a different set of columns in order to prove your tie-out.
Tie-out should make you more comfortable with your ETL as well. It sometimes involves adding pre-extract queries to the source data and post-load queries to the warehouse data. It sometimes involves âspotâ query checks, which can get tricky. I.e., the method used to pick your spot data can come under scrutiny. It also gets tricky when the ETL is run intra-day or real-time, when ETL cycles are at an absolute premium. However, you still need to do it IMO. These tie-out results go in your operational metadata.
Tie-out is part of weaning users from their old ways to the new way (the data warehouse way). Itâs part of the bottoms-up approach to a successful data warehouse rollout. Ask key users what they will use to deem the warehouse effort successful â and do that and more. Remember, users are from Missouri - the show-me state - and IT is from Mars (according to many users I have dealt with.) And if they donât ask about tie-out, do it anyway!
Technorati tags: data warehouse, ETL